qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
@ 2018-08-09  4:21 Richard Henderson
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
                   ` (22 more replies)
  0 siblings, 23 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

This is my current set of patches for running SVE in system mode.

The first half deal with the system registers that affect SVE.
I recall that Peter has said he'd like the first patch to be
done a different way, but we haven't had a chance to talk about
what form it should take.  I've left it as-is since it does what
I need for now.

The second half re-implement the SVE memory operations.
The FF and NF loads had been stubbed out.  Getting those to work
requires some infrastructure that can be reused to speed up normal
loads -- one guest-to-host tlb lookup can be reused for the rest
of the page.


r~


Based-on: <20180809034033.10579-1-richard.henderson@linaro.org>
Richard Henderson (20):
  target/arm: Set ISAR bits for -cpu max
  target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
  target/arm: Define ID_AA64ZFR0_EL1
  target/arm: Adjust sve_exception_el
  target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
  target/arm: Fix arm_current_el for user-only
  target/arm: Fix is_a64 for user-only
  target/arm: Pass in current_el to fp and sve_exception_el
  target/arm: Handle SVE vector length changes in system mode
  target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
  target/arm: Clear unused predicate bits for LD1RQ
  target/arm: Rewrite helper_sve_ld1*_r using pages
  target/arm: Rewrite helper_sve_ld[234]*_r
  target/arm: Rewrite helper_sve_st[1234]*_r
  target/arm: Split contiguous loads for endianness
  target/arm: Split contiguous stores for endianness
  target/arm: Rewrite vector gather loads
  target/arm: Rewrite vector gather stores
  target/arm: Rewrite vector gather first-fault loads
  target/arm: Pass TCGMemOpIdx to sve memory helpers

 target/arm/cpu.h           |   47 +-
 target/arm/helper-sve.h    |  385 +++++--
 target/arm/internals.h     |    5 +
 target/arm/cpu.c           |   24 +-
 target/arm/cpu64.c         |   93 +-
 target/arm/helper.c        |  237 +++--
 target/arm/op_helper.c     |    1 +
 target/arm/sve_helper.c    | 2062 +++++++++++++++++++++++++-----------
 target/arm/translate-a64.c |    8 +-
 target/arm/translate-sve.c |  670 ++++++++----
 10 files changed, 2453 insertions(+), 1079 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE " Richard Henderson
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

For the supported extensions, fill in the appropriate bits in
ID_ISAR5, ID_ISAR6, ID_AA64ISAR0, ID_AA64ISAR1.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.c   | 24 +++++++++++++++++-------
 target/arm/cpu64.c | 36 ++++++++++++++++++++++++++++--------
 2 files changed, 45 insertions(+), 15 deletions(-)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index b25898ed4c..71daa39e86 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1802,19 +1802,29 @@ static void arm_max_initfn(Object *obj)
         kvm_arm_set_cpu_features_from_host(cpu);
     } else {
         cortex_a15_initfn(obj);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_AES);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 4, 4, 2);
+        set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 8, 4, 1);
+        set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 12, 4, 1);
+        set_feature(&cpu->env, ARM_FEATURE_CRC);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 16, 4, 1);
+        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 24, 4, 1);
+        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 28, 4, 1);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
+        cpu->id_isar6 = deposit32(cpu->id_isar6, 4, 4, 1);
+
 #ifdef CONFIG_USER_ONLY
         /* We don't set these in system emulation mode for the moment,
          * since we don't correctly set the ID registers to advertise them,
          */
         set_feature(&cpu->env, ARM_FEATURE_V8);
-        set_feature(&cpu->env, ARM_FEATURE_V8_AES);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
         set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
-        set_feature(&cpu->env, ARM_FEATURE_CRC);
-        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
-        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
-        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
 #endif
     }
 }
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 800bff780e..4d629bb99b 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -254,6 +254,34 @@ static void aarch64_max_initfn(Object *obj)
         kvm_arm_set_cpu_features_from_host(cpu);
     } else {
         aarch64_a57_initfn(obj);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_SHA512);
+        cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 12, 4, 2);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_ATOMICS);
+        cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 20, 4, 2);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+        cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 28, 4, 1);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 24, 4, 1);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_SHA3);
+        cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 32, 4, 1);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_SM3);
+        cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 36, 4, 1);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
+        cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 40, 4, 1);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
+        cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 44, 4, 1);
+        cpu->id_isar6 = deposit32(cpu->id_isar6, 4, 4, 1);
+
+        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
+        cpu->id_aa64isar1 = deposit64(cpu->id_aa64isar1, 16, 4, 1);
+        cpu->id_isar5 = deposit32(cpu->id_isar5, 28, 4, 1);
+
 #ifdef CONFIG_USER_ONLY
         /* We don't set these in system emulation mode for the moment,
          * since we don't correctly set the ID registers to advertise them,
@@ -261,15 +289,7 @@ static void aarch64_max_initfn(Object *obj)
          * whereas the architecture requires them to be present in both if
          * present in either.
          */
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA512);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SHA3);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SM3);
-        set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
-        set_feature(&cpu->env, ARM_FEATURE_V8_ATOMICS);
-        set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
-        set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
         set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
-        set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
         set_feature(&cpu->env, ARM_FEATURE_SVE);
         /* For usermode -cpu max we can use a larger and more efficient DCZ
          * blocksize since we don't have to follow what the hardware does.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

This it a hair out of spec in that we have and advertise, support
for fp16 in aarch64 mode, but do not have nor advertise the same
in aarch32 mode.  Rationale as commented.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu64.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 4d629bb99b..ae650b608e 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -282,15 +282,24 @@ static void aarch64_max_initfn(Object *obj)
         cpu->id_aa64isar1 = deposit64(cpu->id_aa64isar1, 16, 4, 1);
         cpu->id_isar5 = deposit32(cpu->id_isar5, 28, 4, 1);
 
-#ifdef CONFIG_USER_ONLY
-        /* We don't set these in system emulation mode for the moment,
-         * since we don't correctly set the ID registers to advertise them,
-         * and in some cases they're only available in AArch64 and not AArch32,
-         * whereas the architecture requires them to be present in both if
-         * present in either.
+        /* TODO: This is not yet implemented for AArch32, whereas the
+         * architecture requires a feature to be present in both if
+         * it is present in either.  However, it is required by SVE,
+         * so we don't want to leave it out of AArch64 state.
+         *
+         * Practically, the Linux kernel does not query the MVFR1 bit
+         * nor expose this as a HWCAP bit to AArch32 userland.  Thus
+         * userland, if it wanted to use fp16, would have to probe for
+         * support by executing an insn and checking for SIGILL.
+         * At which point it will get the correct answer: unsupported.
          */
         set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
+        cpu->id_aa64pfr0 = deposit64(cpu->id_aa64pfr0, 20, 4, 1);
+
         set_feature(&cpu->env, ARM_FEATURE_SVE);
+        cpu->id_aa64pfr0 = deposit64(cpu->id_aa64pfr0, 32, 4, 1);
+
+#ifdef CONFIG_USER_ONLY
         /* For usermode -cpu max we can use a larger and more efficient DCZ
          * blocksize since we don't have to follow what the hardware does.
          */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE " Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-17 15:50   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Given that the only field defined for this new register may only
be 0, we don't actually need to change anything except the name.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index c24c66d43e..61a79e4c44 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4956,9 +4956,10 @@ void register_cp_regs_for_features(ARMCPU *cpu)
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 3,
               .access = PL1_R, .type = ARM_CP_CONST,
               .resetvalue = 0 },
-            { .name = "ID_AA64PFR4_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+            { .name = "ID_AA64ZFR0_EL1", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 4,
               .access = PL1_R, .type = ARM_CP_CONST,
+              /* At present, only SVEver == 0 is defined anyway.  */
               .resetvalue = 0 },
             { .name = "ID_AA64PFR5_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
               .opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 5,
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (2 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-17 15:57   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Check for EL3 before testing CPTR_EL3.EZ.  Return 0 when the exception
should be routed via AdvSIMDFPAccessTrap.  Mirror the structure of
CheckSVEEnabled more closely.

Fixes: 5be5e8eda78
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 96 ++++++++++++++++++++++-----------------------
 1 file changed, 46 insertions(+), 50 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 61a79e4c44..26e9098c5f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4338,67 +4338,63 @@ static const ARMCPRegInfo debug_lpae_cp_reginfo[] = {
     REGINFO_SENTINEL
 };
 
-/* Return the exception level to which SVE-disabled exceptions should
- * be taken, or 0 if SVE is enabled.
+/* Return the exception level to which exceptions should be taken
+ * via SVEAccessTrap.  If an exception should be routed through
+ * AArch64.AdvSIMDFPAccessTrap, return 0; fp_exception_el should
+ * take care of raising that exception.
+ * C.f. the ARM pseudocode function CheckSVEEnabled.
  */
 static int sve_exception_el(CPUARMState *env)
 {
 #ifndef CONFIG_USER_ONLY
     unsigned current_el = arm_current_el(env);
 
-    /* The CPACR.ZEN controls traps to EL1:
-     * 0, 2 : trap EL0 and EL1 accesses
-     * 1    : trap only EL0 accesses
-     * 3    : trap no accesses
+    if (current_el <= 1) {
+        bool disabled = false;
+
+        /* The CPACR.ZEN controls traps to EL1:
+         * 0, 2 : trap EL0 and EL1 accesses
+         * 1    : trap only EL0 accesses
+         * 3    : trap no accesses
+         */
+        if (!extract32(env->cp15.cpacr_el1, 16, 1)) {
+            disabled = true;
+        } else if (!extract32(env->cp15.cpacr_el1, 17, 1)) {
+            disabled = current_el == 0;
+        }
+        if (disabled) {
+            /* route_to_el2 */
+            return (arm_feature(env, ARM_FEATURE_EL2)
+                    && !arm_is_secure(env)
+                    && (env->cp15.hcr_el2 & HCR_TGE) ? 2 : 1);
+        }
+
+        /* Check CPACR.FPEN.  */
+        if (!extract32(env->cp15.cpacr_el1, 20, 1)) {
+            disabled = true;
+        } else if (!extract32(env->cp15.cpacr_el1, 21, 1)) {
+            disabled = current_el == 0;
+        }
+        if (disabled) {
+            return 0;
+        }
+    }
+
+    /* CPTR_EL2.  Since TZ and TFP are positive,
+     * they will be zero when EL2 is not present.
      */
-    switch (extract32(env->cp15.cpacr_el1, 16, 2)) {
-    default:
-        if (current_el <= 1) {
-            /* Trap to PL1, which might be EL1 or EL3 */
-            if (arm_is_secure(env) && !arm_el_is_aa64(env, 3)) {
-                return 3;
-            }
-            return 1;
+    if (current_el <= 2 && !arm_is_secure_below_el3(env)) {
+        if (env->cp15.cptr_el[2] & CPTR_TZ) {
+            return 2;
         }
-        break;
-    case 1:
-        if (current_el == 0) {
-            return 1;
+        if (env->cp15.cptr_el[2] & CPTR_TFP) {
+            return 0;
         }
-        break;
-    case 3:
-        break;
     }
 
-    /* Similarly for CPACR.FPEN, after having checked ZEN.  */
-    switch (extract32(env->cp15.cpacr_el1, 20, 2)) {
-    default:
-        if (current_el <= 1) {
-            if (arm_is_secure(env) && !arm_el_is_aa64(env, 3)) {
-                return 3;
-            }
-            return 1;
-        }
-        break;
-    case 1:
-        if (current_el == 0) {
-            return 1;
-        }
-        break;
-    case 3:
-        break;
-    }
-
-    /* CPTR_EL2.  Check both TZ and TFP.  */
-    if (current_el <= 2
-        && (env->cp15.cptr_el[2] & (CPTR_TFP | CPTR_TZ))
-        && !arm_is_secure_below_el3(env)) {
-        return 2;
-    }
-
-    /* CPTR_EL3.  Check both EZ and TFP.  */
-    if (!(env->cp15.cptr_el[3] & CPTR_EZ)
-        || (env->cp15.cptr_el[3] & CPTR_TFP)) {
+    /* CPTR_EL3.  Since EZ is negative we must check for EL3.  */
+    if (arm_feature(env, ARM_FEATURE_EL3)
+        && !(env->cp15.cptr_el[3] & CPTR_EZ)) {
         return 3;
     }
 #endif
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (3 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-17 16:02   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Unlike aa32, endianness cannot be adjusted by userland in aa64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9526ed27cb..2d6d7d03aa 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2709,8 +2709,6 @@ static inline bool arm_sctlr_b(CPUARMState *env)
 /* Return true if the processor is in big-endian mode. */
 static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
 {
-    int cur_el;
-
     /* In 32bit endianness is determined by looking at CPSR's E bit */
     if (!is_a64(env)) {
         return
@@ -2729,15 +2727,24 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
             arm_sctlr_b(env) ||
 #endif
                 ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
+    } else {
+#ifdef CONFIG_USER_ONLY
+        /* AArch64 does not have a SETEND instruction; endianness
+         * for usermode is fixed at compile-time.
+         */
+# ifdef TARGET_WORDS_BIGENDIAN
+        return true;
+# else
+        return false;
+# endif
+#else
+        int cur_el = arm_current_el(env);
+        if (cur_el == 0) {
+            return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
+        }
+        return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
+#endif
     }
-
-    cur_el = arm_current_el(env);
-
-    if (cur_el == 0) {
-        return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
-    }
-
-    return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
 }
 
 #include "exec/cpu-all.h"
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (4 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-17 16:03   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Saves about 12k code size in qemu-aarch64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2d6d7d03aa..aedaf2631e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1958,6 +1958,9 @@ static inline bool arm_v7m_is_handler_mode(CPUARMState *env)
  */
 static inline int arm_current_el(CPUARMState *env)
 {
+#ifdef CONFIG_USER_ONLY
+    return 0;
+#else
     if (arm_feature(env, ARM_FEATURE_M)) {
         return arm_v7m_is_handler_mode(env) ||
             !(env->v7m.control[env->v7m.secure] & 1);
@@ -1984,6 +1987,7 @@ static inline int arm_current_el(CPUARMState *env)
 
         return 1;
     }
+#endif
 }
 
 typedef struct ARMCPRegInfo ARMCPRegInfo;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (5 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-17 16:03   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Saves about 8k code size in qemu-aarch64.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index aedaf2631e..ed51a2f5aa 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -918,7 +918,15 @@ void aarch64_sync_64_to_32(CPUARMState *env);
 
 static inline bool is_a64(CPUARMState *env)
 {
+#ifdef CONFIG_USER_ONLY
+# ifdef TARGET_AARCH64
+    return true;
+# else
+    return false;
+# endif
+#else
     return env->aarch64;
+#endif
 }
 
 /* you can call this signal handler from your SIGBUS and SIGSEGV
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (6 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-09 18:01   ` Alex Bennée
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
                   ` (14 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

We are going to want to determine whether sve is enabled
for EL than current.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 26e9098c5f..290b1a849e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4344,12 +4344,10 @@ static const ARMCPRegInfo debug_lpae_cp_reginfo[] = {
  * take care of raising that exception.
  * C.f. the ARM pseudocode function CheckSVEEnabled.
  */
-static int sve_exception_el(CPUARMState *env)
+static int sve_exception_el(CPUARMState *env, int el)
 {
 #ifndef CONFIG_USER_ONLY
-    unsigned current_el = arm_current_el(env);
-
-    if (current_el <= 1) {
+    if (el <= 1) {
         bool disabled = false;
 
         /* The CPACR.ZEN controls traps to EL1:
@@ -4360,7 +4358,7 @@ static int sve_exception_el(CPUARMState *env)
         if (!extract32(env->cp15.cpacr_el1, 16, 1)) {
             disabled = true;
         } else if (!extract32(env->cp15.cpacr_el1, 17, 1)) {
-            disabled = current_el == 0;
+            disabled = el == 0;
         }
         if (disabled) {
             /* route_to_el2 */
@@ -4373,7 +4371,7 @@ static int sve_exception_el(CPUARMState *env)
         if (!extract32(env->cp15.cpacr_el1, 20, 1)) {
             disabled = true;
         } else if (!extract32(env->cp15.cpacr_el1, 21, 1)) {
-            disabled = current_el == 0;
+            disabled = el == 0;
         }
         if (disabled) {
             return 0;
@@ -4383,7 +4381,7 @@ static int sve_exception_el(CPUARMState *env)
     /* CPTR_EL2.  Since TZ and TFP are positive,
      * they will be zero when EL2 is not present.
      */
-    if (current_el <= 2 && !arm_is_secure_below_el3(env)) {
+    if (el <= 2 && !arm_is_secure_below_el3(env)) {
         if (env->cp15.cptr_el[2] & CPTR_TZ) {
             return 2;
         }
@@ -12318,11 +12316,10 @@ uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
 /* Return the exception level to which FP-disabled exceptions should
  * be taken, or 0 if FP is enabled.
  */
-static inline int fp_exception_el(CPUARMState *env)
+static int fp_exception_el(CPUARMState *env, int cur_el)
 {
 #ifndef CONFIG_USER_ONLY
     int fpen;
-    int cur_el = arm_current_el(env);
 
     /* CPACR and the CPTR registers don't exist before v6, so FP is
      * always accessible
@@ -12385,11 +12382,12 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
                           target_ulong *cs_base, uint32_t *pflags)
 {
     ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
-    int fp_el = fp_exception_el(env);
+    int current_el = arm_current_el(env);
+    int fp_el = fp_exception_el(env, current_el);
     uint32_t flags;
 
     if (is_a64(env)) {
-        int sve_el = sve_exception_el(env);
+        int sve_el = sve_exception_el(env, current_el);
         uint32_t zcr_len;
 
         *pc = env->pc;
@@ -12404,7 +12402,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         if (sve_el != 0 && fp_el == 0) {
             zcr_len = 0;
         } else {
-            int current_el = arm_current_el(env);
             ARMCPU *cpu = arm_env_get_cpu(env);
 
             zcr_len = cpu->sve_max_vq - 1;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (7 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-17 16:22   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
                   ` (13 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

SVE vector length can change when changing EL, or when writing
to one of the ZCR_ELn registers.

For correctness, our implementation requires that predicate bits
that are inaccessible are never set.  Which means noticing length
changes and zeroing the appropriate register bits.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h       |   4 ++
 target/arm/cpu64.c     |  42 --------------
 target/arm/helper.c    | 127 ++++++++++++++++++++++++++++++++++++-----
 target/arm/op_helper.c |   1 +
 4 files changed, 119 insertions(+), 55 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index ed51a2f5aa..18b3c92c2e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -910,6 +910,10 @@ int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
 int aarch64_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
 int aarch64_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
 void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
+void aarch64_sve_change_el(CPUARMState *env, int old_el, int new_el);
+#else
+static inline void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq) { }
+static inline void aarch64_sve_change_el(CPUARMState *env, int o, int n) { }
 #endif
 
 target_ulong do_arm_semihosting(CPUARMState *env);
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index ae650b608e..16272f1358 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -439,45 +439,3 @@ static void aarch64_cpu_register_types(void)
 }
 
 type_init(aarch64_cpu_register_types)
-
-/* The manual says that when SVE is enabled and VQ is widened the
- * implementation is allowed to zero the previously inaccessible
- * portion of the registers.  The corollary to that is that when
- * SVE is enabled and VQ is narrowed we are also allowed to zero
- * the now inaccessible portion of the registers.
- *
- * The intent of this is that no predicate bit beyond VQ is ever set.
- * Which means that some operations on predicate registers themselves
- * may operate on full uint64_t or even unrolled across the maximum
- * uint64_t[4].  Performing 4 bits of host arithmetic unconditionally
- * may well be cheaper than conditionals to restrict the operation
- * to the relevant portion of a uint16_t[16].
- *
- * TODO: Need to call this for changes to the real system registers
- * and EL state changes.
- */
-void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
-{
-    int i, j;
-    uint64_t pmask;
-
-    assert(vq >= 1 && vq <= ARM_MAX_VQ);
-    assert(vq <= arm_env_get_cpu(env)->sve_max_vq);
-
-    /* Zap the high bits of the zregs.  */
-    for (i = 0; i < 32; i++) {
-        memset(&env->vfp.zregs[i].d[2 * vq], 0, 16 * (ARM_MAX_VQ - vq));
-    }
-
-    /* Zap the high bits of the pregs and ffr.  */
-    pmask = 0;
-    if (vq & 3) {
-        pmask = ~(-1ULL << (16 * (vq & 3)));
-    }
-    for (j = vq / 4; j < ARM_MAX_VQ / 4; j++) {
-        for (i = 0; i < 17; ++i) {
-            env->vfp.pregs[i].p[j] &= pmask;
-        }
-        pmask = 0;
-    }
-}
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 290b1a849e..fb79b27cf6 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4399,11 +4399,44 @@ static int sve_exception_el(CPUARMState *env, int el)
     return 0;
 }
 
+/*
+ * Given that SVE is enabled, return the vector length for EL.
+ */
+static uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+{
+    ARMCPU *cpu = arm_env_get_cpu(env);
+    uint32_t zcr_len = cpu->sve_max_vq - 1;
+
+    if (el <= 1) {
+        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
+    }
+    if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
+    }
+    if (el < 3 && arm_feature(env, ARM_FEATURE_EL3)) {
+        zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
+    }
+    return zcr_len;
+}
+
 static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
                       uint64_t value)
 {
+    int cur_el = arm_current_el(env);
+    int old_len = sve_zcr_len_for_el(env, cur_el);
+    int new_len;
+
     /* Bits other than [3:0] are RAZ/WI.  */
     raw_write(env, ri, value & 0xf);
+
+    /*
+     * Because we arrived here, we know both FP and SVE are enabled;
+     * otherwise we would have trapped access to the ZCR_ELn register.
+     */
+    new_len = sve_zcr_len_for_el(env, cur_el);
+    if (new_len < old_len) {
+        aarch64_sve_narrow_vq(env, new_len + 1);
+    }
 }
 
 static const ARMCPRegInfo zcr_el1_reginfo = {
@@ -8100,8 +8133,11 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
     unsigned int new_el = env->exception.target_el;
     target_ulong addr = env->cp15.vbar_el[new_el];
     unsigned int new_mode = aarch64_pstate_mode(new_el, true);
+    unsigned int cur_el = arm_current_el(env);
 
-    if (arm_current_el(env) < new_el) {
+    aarch64_sve_change_el(env, cur_el, new_el);
+
+    if (cur_el < new_el) {
         /* Entry vector offset depends on whether the implemented EL
          * immediately lower than the target level is using AArch32 or AArch64
          */
@@ -12402,18 +12438,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
         if (sve_el != 0 && fp_el == 0) {
             zcr_len = 0;
         } else {
-            ARMCPU *cpu = arm_env_get_cpu(env);
-
-            zcr_len = cpu->sve_max_vq - 1;
-            if (current_el <= 1) {
-                zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
-            }
-            if (current_el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
-                zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
-            }
-            if (current_el < 3 && arm_feature(env, ARM_FEATURE_EL3)) {
-                zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
-            }
+            zcr_len = sve_zcr_len_for_el(env, current_el);
         }
         flags |= zcr_len << ARM_TBFLAG_ZCR_LEN_SHIFT;
     } else {
@@ -12467,3 +12492,79 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
     *pflags = flags;
     *cs_base = 0;
 }
+
+#ifdef TARGET_AARCH64
+/*
+ * The manual says that when SVE is enabled and VQ is widened the
+ * implementation is allowed to zero the previously inaccessible
+ * portion of the registers.  The corollary to that is that when
+ * SVE is enabled and VQ is narrowed we are also allowed to zero
+ * the now inaccessible portion of the registers.
+ *
+ * The intent of this is that no predicate bit beyond VQ is ever set.
+ * Which means that some operations on predicate registers themselves
+ * may operate on full uint64_t or even unrolled across the maximum
+ * uint64_t[4].  Performing 4 bits of host arithmetic unconditionally
+ * may well be cheaper than conditionals to restrict the operation
+ * to the relevant portion of a uint16_t[16].
+ */
+void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
+{
+    int i, j;
+    uint64_t pmask;
+
+    assert(vq >= 1 && vq <= ARM_MAX_VQ);
+    assert(vq <= arm_env_get_cpu(env)->sve_max_vq);
+
+    /* Zap the high bits of the zregs.  */
+    for (i = 0; i < 32; i++) {
+        memset(&env->vfp.zregs[i].d[2 * vq], 0, 16 * (ARM_MAX_VQ - vq));
+    }
+
+    /* Zap the high bits of the pregs and ffr.  */
+    pmask = 0;
+    if (vq & 3) {
+        pmask = ~(-1ULL << (16 * (vq & 3)));
+    }
+    for (j = vq / 4; j < ARM_MAX_VQ / 4; j++) {
+        for (i = 0; i < 17; ++i) {
+            env->vfp.pregs[i].p[j] &= pmask;
+        }
+        pmask = 0;
+    }
+}
+
+/*
+ * Notice a change in SVE vector size when changing EL.
+ */
+void aarch64_sve_change_el(CPUARMState *env, int old_el, int new_el)
+{
+    int old_len, new_len;
+
+    /* Nothing to do if no SVE.  */
+    if (!arm_feature(env, ARM_FEATURE_SVE)) {
+        return;
+    }
+
+    /* Nothing to do if FP is disabled in either EL.  */
+    if (fp_exception_el(env, old_el) || fp_exception_el(env, new_el)) {
+        return;
+    }
+
+    /*
+     * When FP is enabled, but SVE is disabled, the effective len is 0.
+     * ??? How should sve_exception_el interact with AArch32 state?
+     * That isn't included in the CheckSVEEnabled pseudocode, so is the
+     * host kernel required to explicitly disable SVE for an EL using aa32?
+     */
+    old_len = (sve_exception_el(env, old_el)
+               ? 0 : sve_zcr_len_for_el(env, old_el));
+    new_len = (sve_exception_el(env, new_el)
+               ? 0 : sve_zcr_len_for_el(env, new_el));
+
+    /* When changing vector length, clear inaccessible state.  */
+    if (new_len < old_len) {
+        aarch64_sve_narrow_vq(env, new_len + 1);
+    }
+}
+#endif
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index f728f25e4b..b9f920b3c4 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -1068,6 +1068,7 @@ void HELPER(exception_return)(CPUARMState *env)
                       "AArch64 EL%d PC 0x%" PRIx64 "\n",
                       cur_el, new_el, env->pc);
     }
+    aarch64_sve_change_el(env, cur_el, new_el);
 
     qemu_mutex_lock_iothread();
     arm_call_el_change_hook(arm_env_get_cpu(env));
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (8 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-17 16:35   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
                   ` (12 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Use the existing helpers to determine if (1) the fpu is enabled,
(2) sve state is enabled, and (3) the current sve vector length.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h           | 4 ++++
 target/arm/helper.c        | 6 +++---
 target/arm/translate-a64.c | 8 ++++++--
 3 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 18b3c92c2e..33d06f2340 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -920,6 +920,10 @@ target_ulong do_arm_semihosting(CPUARMState *env);
 void aarch64_sync_32_to_64(CPUARMState *env);
 void aarch64_sync_64_to_32(CPUARMState *env);
 
+int fp_exception_el(CPUARMState *env, int cur_el);
+int sve_exception_el(CPUARMState *env, int cur_el);
+uint32_t sve_zcr_len_for_el(CPUARMState *env, int el);
+
 static inline bool is_a64(CPUARMState *env)
 {
 #ifdef CONFIG_USER_ONLY
diff --git a/target/arm/helper.c b/target/arm/helper.c
index fb79b27cf6..64ff71b722 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4344,7 +4344,7 @@ static const ARMCPRegInfo debug_lpae_cp_reginfo[] = {
  * take care of raising that exception.
  * C.f. the ARM pseudocode function CheckSVEEnabled.
  */
-static int sve_exception_el(CPUARMState *env, int el)
+int sve_exception_el(CPUARMState *env, int el)
 {
 #ifndef CONFIG_USER_ONLY
     if (el <= 1) {
@@ -4402,7 +4402,7 @@ static int sve_exception_el(CPUARMState *env, int el)
 /*
  * Given that SVE is enabled, return the vector length for EL.
  */
-static uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
 {
     ARMCPU *cpu = arm_env_get_cpu(env);
     uint32_t zcr_len = cpu->sve_max_vq - 1;
@@ -12352,7 +12352,7 @@ uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
 /* Return the exception level to which FP-disabled exceptions should
  * be taken, or 0 if FP is enabled.
  */
-static int fp_exception_el(CPUARMState *env, int cur_el)
+int fp_exception_el(CPUARMState *env, int cur_el)
 {
 #ifndef CONFIG_USER_ONLY
     int fpen;
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b29dc49c4f..4a0ca8c906 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -166,11 +166,15 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
         cpu_fprintf(f, "\n");
         return;
     }
+    if (fp_exception_el(env, el) != 0) {
+        cpu_fprintf(f, "    FPU disabled\n");
+        return;
+    }
     cpu_fprintf(f, "     FPCR=%08x FPSR=%08x\n",
                 vfp_get_fpcr(env), vfp_get_fpsr(env));
 
-    if (arm_feature(env, ARM_FEATURE_SVE)) {
-        int j, zcr_len = env->vfp.zcr_el[1] & 0xf; /* fix for system mode */
+    if (arm_feature(env, ARM_FEATURE_SVE) && sve_exception_el(env, el) == 0) {
+        int j, zcr_len = sve_zcr_len_for_el(env, el);
 
         for (i = 0; i <= FFR_PRED_NUM; i++) {
             bool eol;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (9 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-23 15:21   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
                   ` (11 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

The 16-byte load only uses 16 predicate bits.  But while
reusing the other load infrastructure, we find other bits
that are set and trigger an assert.  To avoid this and
retain the assert, zero-extend the predicate that we pass
to the LD1 helper.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-sve.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d27bc8c946..bef6b8242d 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4765,12 +4765,33 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
     unsigned vsz = vec_full_reg_size(s);
     TCGv_ptr t_pg;
     TCGv_i32 desc;
+    int poff;
 
     /* Load the first quadword using the normal predicated load helpers.  */
     desc = tcg_const_i32(simd_desc(16, 16, zt));
-    t_pg = tcg_temp_new_ptr();
 
-    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
+    poff = pred_full_reg_offset(s, pg);
+    if (vsz > 16) {
+        /*
+         * Zero-extend the first 16 bits of the predicate into a temporary.
+         * This avoids triggering an assert making sure we don't have bits
+         * set within a predicate beyond VQ, but we have lowered VQ to 1
+         * for this load operation.
+         */
+        TCGv_i64 tmp = tcg_temp_new_i64();
+#ifdef HOST_WORDS_BIGENDIAN
+        poff += 6;
+#endif
+        tcg_gen_ld16u_i64(tmp, cpu_env, poff);
+
+        poff = offsetof(CPUARMState, vfp.preg_tmp);
+        tcg_gen_st_i64(tmp, cpu_env, poff);
+        tcg_temp_free_i64(tmp);
+    }
+
+    t_pg = tcg_temp_new_ptr();
+    tcg_gen_addi_ptr(t_pg, cpu_env, poff);
+
     fns[msz](cpu_env, t_pg, addr, desc);
 
     tcg_temp_free_ptr(t_pg);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (10 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-10  9:13   ` Alex Bennée
  2018-08-23 16:01   ` Peter Maydell
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
                   ` (10 subsequent siblings)
  22 siblings, 2 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Uses tlb_vaddr_to_host for correct operation with softmmu.
Optimize for accesses within a single page or pair of pages.

Perf report comparison for cortex-strings test-strlen
with aarch64-linux-user:

before:
   1.59%  qemu-aarch64  qemu-aarch64  [.] do_sve_ld1bb_r
   0.86%  qemu-aarch64  qemu-aarch64  [.] do_sve_ldff1bb_r
after:
   0.09%  qemu-aarch64  qemu-aarch64  [.] helper_sve_ldff1bb_r
   0.01%  qemu-aarch64  qemu-aarch64  [.] sve_ld1bb_host

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 839 ++++++++++++++++++++++++++++++++--------
 1 file changed, 675 insertions(+), 164 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index e03f954a26..4ca9412e20 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1688,6 +1688,45 @@ static void swap_memmove(void *vd, void *vs, size_t n)
     }
 }
 
+/* Similarly for memset of 0.  */
+static void swap_memzero(void *vd, size_t n)
+{
+    uintptr_t d = (uintptr_t)vd;
+    uintptr_t o = (d | n) & 7;
+    size_t i;
+
+    if (likely(n == 0)) {
+        return;
+    }
+#ifndef HOST_WORDS_BIGENDIAN
+    o = 0;
+#endif
+    switch (o) {
+    case 0:
+        memset(vd, 0, n);
+        break;
+
+    case 4:
+        for (i = 0; i < n; i += 4) {
+            *(uint32_t *)H1_4(d + i) = 0;
+        }
+        break;
+
+    case 2:
+    case 6:
+        for (i = 0; i < n; i += 2) {
+            *(uint16_t *)H1_2(d + i) = 0;
+        }
+        break;
+
+    default:
+        for (i = 0; i < n; i++) {
+            *(uint8_t *)H1(d + i) = 0;
+        }
+        break;
+    }
+}
+
 void HELPER(sve_ext)(void *vd, void *vn, void *vm, uint32_t desc)
 {
     intptr_t opr_sz = simd_oprsz(desc);
@@ -3927,32 +3966,438 @@ void HELPER(sve_fcmla_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc)
 /*
  * Load contiguous data, protected by a governing predicate.
  */
-#define DO_LD1(NAME, FN, TYPEE, TYPEM, H)                  \
-static void do_##NAME(CPUARMState *env, void *vd, void *vg, \
-                      target_ulong addr, intptr_t oprsz,   \
-                      uintptr_t ra)                        \
-{                                                          \
-    intptr_t i = 0;                                        \
-    do {                                                   \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            TYPEM m = 0;                                   \
-            if (pg & 1) {                                  \
-                m = FN(env, addr, ra);                     \
-            }                                              \
-            *(TYPEE *)(vd + H(i)) = m;                     \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += sizeof(TYPEM);                         \
-        } while (i & 15);                                  \
-    } while (i < oprsz);                                   \
-}                                                          \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    do_##NAME(env, &env->vfp.zregs[simd_data(desc)], vg,   \
-              addr, simd_oprsz(desc), GETPC());            \
+
+/* Load elements into VD, controlled by VG, from HOST+MEM_OFS.
+ * Memory is valid through MEM_MAX.  The register element indicies
+ * are inferred from MEM_OFS, as modified by the types for which
+ * the helper is built.  Return the MEM_OFS of the first element
+ * not loaded (which is MEM_MAX if they are all loaded).
+ *
+ * For softmmu, we have fully validated the guest page.  For user-only,
+ * we cannot fully validate without taking the mmap lock, but since we
+ * know the access is within one host page, if any access is valid they
+ * all must be valid.  However, it may be that no access is valid and
+ * they have all been predicated false.
+ */
+typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
+                                 intptr_t mem_ofs, intptr_t mem_max);
+
+/* Load one element into VD+REG_OFF from (ENV,VADDR,RA).
+ * The controlling predicate is known to be true.
+ */
+typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
+                            target_ulong vaddr, int mmu_idx, uintptr_t ra);
+
+/*
+ * Generate the above primitives.
+ */
+
+#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
+static intptr_t sve_##NAME##_host(void *vd, void *vg, void *host,           \
+                                  intptr_t mem_off, const intptr_t mem_max) \
+{                                                                           \
+    intptr_t reg_off = mem_off * (sizeof(TYPEE) / sizeof(TYPEM));           \
+    uint64_t *pg = vg;                                                      \
+    while (mem_off + sizeof(TYPEM) <= mem_max) {                            \
+        TYPEM val = 0;                                                      \
+        if (likely((pg[reg_off >> 6] >> (reg_off & 63)) & 1)) {             \
+            val = HOST(host + mem_off);                                     \
+        }                                                                   \
+        *(TYPEE *)(vd + H(reg_off)) = val;                                  \
+        mem_off += sizeof(TYPEM), reg_off += sizeof(TYPEE);                 \
+    }                                                                       \
+    return mem_off;                                                         \
 }
 
+#ifdef CONFIG_SOFTMMU
+#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
+                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+{                                                                           \
+    TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
+    TYPEM val = TLB(env, addr, oi, ra);                                     \
+    *(TYPEE *)(vd + H(reg_off)) = val;                                      \
+}
+#else
+#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB)                  \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
+                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+{                                                                           \
+    TYPEM val = HOST(g2h(addr));                                            \
+    *(TYPEE *)(vd + H(reg_off)) = val;                                      \
+}
+#endif
+
+DO_LD_TLB(ld1bb, H1, uint8_t, uint8_t, ldub_p, 0, helper_ret_ldub_mmu)
+
+#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
+    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
+    DO_LD_TLB(NAME, H, TE, TM, ldub_p, 0, helper_ret_ldub_mmu)
+
+DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
+DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
+DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
+DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
+DO_LD_PRIM_1(ld1bdu,     , uint64_t, uint8_t)
+DO_LD_PRIM_1(ld1bds,     , uint64_t,  int8_t)
+
+#define DO_LD_PRIM_2(NAME, end, MOEND, H, TE, TM, PH, PT)  \
+    DO_LD_HOST(NAME##_##end, H, TE, TM, PH##_##end##_p)    \
+    DO_LD_TLB(NAME##_##end, H, TE, TM, PH##_##end##_p,     \
+              MOEND, helper_##end##_##PT##_mmu)
+
+DO_LD_PRIM_2(ld1hh,  le, MO_LE, H1_2, uint16_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hsu, le, MO_LE, H1_4, uint32_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hss, le, MO_LE, H1_4, uint32_t,  int16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hdu, le, MO_LE,     , uint64_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hds, le, MO_LE,     , uint64_t,  int16_t, lduw, lduw)
+
+DO_LD_PRIM_2(ld1ss,  le, MO_LE, H1_4, uint32_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sdu, le, MO_LE,     , uint64_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sds, le, MO_LE,     , uint64_t,  int32_t, ldl, ldul)
+
+DO_LD_PRIM_2(ld1dd,  le, MO_LE,     , uint64_t, uint64_t, ldq, ldq)
+
+DO_LD_PRIM_2(ld1hh,  be, MO_BE, H1_2, uint16_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hsu, be, MO_BE, H1_4, uint32_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hss, be, MO_BE, H1_4, uint32_t,  int16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hdu, be, MO_BE,     , uint64_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hds, be, MO_BE,     , uint64_t,  int16_t, lduw, lduw)
+
+DO_LD_PRIM_2(ld1ss,  be, MO_BE, H1_4, uint32_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sdu, be, MO_BE,     , uint64_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sds, be, MO_BE,     , uint64_t,  int32_t, ldl, ldul)
+
+DO_LD_PRIM_2(ld1dd,  be, MO_BE,     , uint64_t, uint64_t, ldq, ldq)
+
+#undef DO_LD_TLB
+#undef DO_LD_HOST
+#undef DO_LD_PRIM_1
+#undef DO_LD_PRIM_2
+
+/*
+ * Special case contiguous loads of bytes to accellerate strings.
+ *
+ * The assumption is that the governing predicate will be mostly true.
+ * When it is not all true, it has been set by whilelo and so has a
+ * block of true elements followed by a block of false elements.
+ * Thus anything we can do to handle as many bytes as possible in one
+ * step will pay dividends.
+ *
+ * Because of how vector registers are represented in CPUARMState,
+ * each block of 8 can be read with a little-endian load to be stored
+ * into the vector register in host-endian order.
+ *
+ * TODO: For LE host and LE guest (by far the most common combination),
+ * the only difference for other non-extending loads is the controlling
+ * predicate.  Even for other combinations, it might be fastest to use
+ * this primitive to block load all of the data and then reorder the
+ * bytes afterward.
+ */
+
+/* For user-only, conditionally load and mask from HOST, returning 0
+ * if the predicate is false.  This is required because, as described
+ * above, we have not fully validated the page, and faults are not
+ * permitted when the predicate is false.
+ * For softmmu, we never arrive here with invalid host memory; just mask.
+ */
+static inline uint64_t ldq_le_pred_b(uint8_t pg, void *host)
+{
+#ifdef CONFIG_USER_ONLY
+    if (pg == 0) {
+        return 0;
+    }
+#endif
+    return ldq_le_p(host) & expand_pred_b(pg);
+}
+
+static inline uint8_t ldub_pred(uint8_t pg, void *host)
+{
+#ifdef CONFIG_USER_ONLY
+    return pg & 1 ? ldub_p(host) : 0;
+#else
+    return ldub_p(host) & -(pg & 1);
+#endif
+}
+
+static intptr_t sve_ld1bb_host(void *vd, void *vg, void *host,
+                               intptr_t off, const intptr_t max)
+{
+    uint64_t *d = vd;
+    uint8_t *g = vg;
+
+    /* Assuming OFF and MAX may be misaligned, but also the most common
+     * case is an entire vector register: OFF == 0, MAX % 16 == 0.
+     */
+    if (likely(off + 8 <= max)) {
+        const intptr_t max_div_8 = max >> 3;
+        intptr_t off_div_8 = off >> 3;
+        uint64_t data;
+
+        if (unlikely(off & 63)) {
+            /* Align for a loop-of-8.  We know from the range check
+             * above that we have enough remaining to load 8 bytes.
+             */
+            if (unlikely(off & 7)) {
+                int off_7 = off & 7;
+                uint8_t pg = g[H1(off_div_8)] >> off_7;
+
+                off_7 *= 8;
+                data = ldq_le_pred_b(pg, host + off);
+                data = deposit64(d[off_div_8], off_7, 64 - off_7, data);
+                d[off_div_8] = data;
+
+                off_div_8 += 1;
+            }
+
+            /* If there are not sufficient bytes to align for 64
+             * and also execute that loop at least once, skip to tail.
+             */
+            if (ROUND_UP(off_div_8, 8) + 8 > max_div_8) {
+                goto skip_64;
+            }
+
+            /* Align for the loop-of-64.  */
+            if (unlikely(off_div_8 & 7)) {
+                do {
+                    uint8_t pg = g[off_div_8];
+                    data = ldq_le_pred_b(pg, host + off_div_8 * 8);
+                    d[off_div_8] = data;
+                } while (++off_div_8 & 7);
+            }
+        }
+
+        /* While we have blocks of 64 remaining, we can perform tests
+         * against large blocks of predicates at once.
+         */
+        for (; off_div_8 + 8 <= max_div_8; off_div_8 += 8) {
+            uint64_t pg = *(uint64_t *)(g + off_div_8);
+            if (likely(pg == -1ULL)) {
+#ifndef HOST_WORDS_BIGENDIAN
+                memcpy(d + off_div_8, host + off_div_8 * 8, 64);
+#else
+                intptr_t j;
+                for (j = 0; j < 8; j++) {
+                    data = ldq_le_p(host + (off_div_8 + j) * 8);
+                    d[off_div_8 + j] = data;
+                }
+#endif
+            } else if (pg == 0) {
+                memset(d + off_div_8, 0, 64);
+            } else {
+                intptr_t j;
+                for (j = 0; j < 8; j++) {
+                    data = ldq_le_pred_b(pg >> (j * 8),
+                                         host + (off_div_8 + j) * 8);
+                    d[off_div_8 + j] = data;
+                }
+            }
+        }
+
+ skip_64:
+        /* Final tail or a copy smaller than 64 bytes.  */
+        for (; off_div_8 < max_div_8; off_div_8++) {
+            uint8_t pg = g[H1(off_div_8)];
+            data = ldq_le_pred_b(pg, host + off_div_8 * 8);
+            d[off_div_8] = data;
+        }
+
+        /* Restore using OFF.  */
+        off = off_div_8 * 8;
+    }
+
+    /* Final tail or a really small copy.  */
+    if (unlikely(off < max)) {
+        do {
+            uint8_t pg = g[H1(off >> 3)] >> (off & 7);
+            ((uint8_t *)vd)[H1(off)] = ldub_pred(pg, host + off);
+        } while (++off < max);
+    }
+
+    return max;
+}
+
+/* Skip through a sequence of inactive elements in the guarding predicate VG,
+ * beginning at REG_OFF bounded by REG_MAX.  Return the offset of the active
+ * element >= REG_OFF, or REG_MAX if there were no active elements at all.
+ */
+static intptr_t find_next_active(uint64_t *vg, intptr_t reg_off,
+                                 intptr_t reg_max, int esz)
+{
+    uint64_t pg_mask = pred_esz_masks[esz];
+    uint64_t pg = (vg[reg_off >> 6] & pg_mask) >> (reg_off & 63);
+
+    /* In normal usage, the first element is active.  */
+    if (likely(pg & 1)) {
+        return reg_off;
+    }
+
+    if (pg == 0) {
+        reg_off &= -64;
+        do {
+            reg_off += 64;
+            if (unlikely(reg_off >= reg_max)) {
+                /* The entire predicate was false.  */
+                return reg_max;
+            }
+            pg = vg[reg_off >> 6] & pg_mask;
+        } while (pg == 0);
+    }
+    reg_off += ctz64(pg);
+
+    /* We should never see an out of range predicate bit set.  */
+    tcg_debug_assert(reg_off < reg_max);
+    return reg_off;
+}
+
+/* Return the maximum offset <= MEM_MAX which is still within the page
+ * referenced by BASE+MEM_OFF.
+ */
+static intptr_t max_for_page(target_ulong base, intptr_t mem_off,
+                             intptr_t mem_max)
+{
+    target_ulong addr = base + mem_off;
+    intptr_t split = -(intptr_t)(addr | TARGET_PAGE_MASK);
+    return MIN(split, mem_max - mem_off) + mem_off;
+}
+
+static inline void set_helper_retaddr(uintptr_t ra)
+{
+#ifdef CONFIG_USER_ONLY
+    helper_retaddr = ra;
+#endif
+}
+
+static inline bool test_host_page(void *host)
+{
+#ifdef CONFIG_USER_ONLY
+    return true;
+#else
+    return likely(host != NULL);
+#endif
+}
+
+/*
+ * Common helper for all contiguous one-register predicated loads.
+ */
+static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
+                      uint32_t desc, const uintptr_t retaddr,
+                      const int esz, const int msz,
+                      sve_ld1_host_fn *host_fn,
+                      sve_ld1_tlb_fn *tlb_fn)
+{
+    void *vd = &env->vfp.zregs[simd_data(desc)];
+    const int diffsz = esz - msz;
+    const intptr_t reg_max = simd_oprsz(desc);
+    const intptr_t mem_max = reg_max >> diffsz;
+    const int mmu_idx = cpu_mmu_index(env, false);
+    ARMVectorReg scratch;
+    void *host, *result;
+    intptr_t split;
+
+    set_helper_retaddr(retaddr);
+
+    host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
+    if (test_host_page(host)) {
+        split = max_for_page(addr, 0, mem_max);
+        if (likely(split == mem_max)) {
+            /* The load is entirely within a valid page.  For softmmu,
+             * no faults.  For user-only, if the first byte does not
+             * fault then none of them will fault, so Vd will never be
+             * partially modified.
+             */
+            host_fn(vd, vg, host, 0, mem_max);
+            set_helper_retaddr(0);
+            return;
+        }
+    }
+
+    /* Perform the predicated read into a temporary, thus ensuring
+     * if the load of the last element faults, Vd is not modified.
+     */
+    result = &scratch;
+#ifdef CONFIG_USER_ONLY
+    host_fn(vd, vg, host, 0, mem_max);
+#else
+    memset(result, 0, reg_max);
+    for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
+         reg_off < reg_max;
+         reg_off = find_next_active(vg, reg_off, reg_max, esz)) {
+        intptr_t mem_off = reg_off >> diffsz;
+
+        split = max_for_page(addr, mem_off, mem_max);
+        if (msz == 0 || split - mem_off >= (1 << msz)) {
+            /* At least one whole element on this page.  */
+            host = tlb_vaddr_to_host(env, addr + mem_off,
+                                     MMU_DATA_LOAD, mmu_idx);
+            if (host) {
+                mem_off = host_fn(result, vg, host - mem_off, mem_off, split);
+                reg_off = mem_off << diffsz;
+                continue;
+            }
+        }
+
+        /* Perform one normal read.  This may fault, longjmping out to the
+         * main loop in order to raise an exception.  It may succeed, and
+         * as a side-effect load the TLB entry for the next round.  Finally,
+         * in the extremely unlikely case we're performing this operation
+         * on I/O memory, it may succeed but not bring in the TLB entry.
+         * But even then we have still made forward progress.
+         */
+        tlb_fn(env, result, reg_off, addr + mem_off, mmu_idx, retaddr);
+        reg_off += 1 << esz;
+    }
+#endif
+
+    set_helper_retaddr(0);
+    memcpy(vd, result, reg_max);
+}
+
+#define DO_LD1_1(NAME, ESZ) \
+void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
+                            target_ulong addr, uint32_t desc)  \
+{                                                              \
+    sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, 0,            \
+              sve_##NAME##_host, sve_##NAME##_tlb);            \
+}
+
+/* TODO: Propagate the endian check back to the translator.  */
+#define DO_LD1_2(NAME, ESZ, MSZ) \
+void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
+                            target_ulong addr, uint32_t desc)  \
+{                                                              \
+    if (arm_cpu_data_is_big_endian(env)) {                     \
+        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
+                  sve_##NAME##_be_host, sve_##NAME##_be_tlb);  \
+    } else {                                                   \
+        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
+                  sve_##NAME##_le_host, sve_##NAME##_le_tlb);  \
+    }                                                          \
+}
+
+DO_LD1_1(ld1bb,  0)
+DO_LD1_1(ld1bhu, 1)
+DO_LD1_1(ld1bhs, 1)
+DO_LD1_1(ld1bsu, 2)
+DO_LD1_1(ld1bss, 2)
+DO_LD1_1(ld1bdu, 3)
+DO_LD1_1(ld1bds, 3)
+
+DO_LD1_2(ld1hh,  1, 1)
+DO_LD1_2(ld1hsu, 2, 1)
+DO_LD1_2(ld1hss, 2, 1)
+DO_LD1_2(ld1hdu, 3, 1)
+DO_LD1_2(ld1hds, 3, 1)
+
+DO_LD1_2(ld1ss,  2, 2)
+DO_LD1_2(ld1sdu, 3, 2)
+DO_LD1_2(ld1sds, 3, 2)
+
+DO_LD1_2(ld1dd,  3, 3)
+
+#undef DO_LD1_1
+#undef DO_LD1_2
+
 #define DO_LD2(NAME, FN, TYPEE, TYPEM, H)                  \
 void HELPER(NAME)(CPUARMState *env, void *vg,              \
                   target_ulong addr, uint32_t desc)        \
@@ -4037,52 +4482,40 @@ void HELPER(NAME)(CPUARMState *env, void *vg,              \
     }                                                      \
 }
 
-DO_LD1(sve_ld1bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
-DO_LD1(sve_ld1bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
-DO_LD1(sve_ld1bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
-DO_LD1(sve_ld1bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
-DO_LD1(sve_ld1bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
-DO_LD1(sve_ld1bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
-
-DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
-DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int16_t, H1_4)
-DO_LD1(sve_ld1hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
-DO_LD1(sve_ld1hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
-
-DO_LD1(sve_ld1sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
-DO_LD1(sve_ld1sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
-
-DO_LD1(sve_ld1bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
 DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
 DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
 DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
 
-DO_LD1(sve_ld1hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
 DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
 DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
 DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
 
-DO_LD1(sve_ld1ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
 DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
 DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
 DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
 
-DO_LD1(sve_ld1dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
 DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
 DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
 DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
 
-#undef DO_LD1
 #undef DO_LD2
 #undef DO_LD3
 #undef DO_LD4
 
 /*
  * Load contiguous data, first-fault and no-fault.
+ *
+ * For user-only, one could argue that we should hold the mmap_lock during
+ * the operation so that there is no race between page_check_range and the
+ * load operation.  However, unmapping pages out from under operating thread
+ * is extrodinarily unlikely.  This theoretical race condition also affects
+ * linux-user/ in its get_user/put_user macros.
+ *
+ * TODO: Construct some helpers, written in assembly, that interact with
+ * handle_cpu_signal to produce memory ops which can properly report errors
+ * without racing.
  */
 
-#ifdef CONFIG_USER_ONLY
-
 /* Fault on byte I.  All bits in FFR from I are cleared.  The vector
  * result from I is CONSTRAINED UNPREDICTABLE; we choose the MERGE
  * option, which leaves subsequent data unchanged.
@@ -4092,147 +4525,225 @@ static void record_fault(CPUARMState *env, uintptr_t i, uintptr_t oprsz)
     uint64_t *ffr = env->vfp.pregs[FFR_PRED_NUM].p;
 
     if (i & 63) {
-        ffr[i / 64] &= MAKE_64BIT_MASK(0, i & 63);
+        ffr[i >> 6] &= MAKE_64BIT_MASK(0, i & 63);
         i = ROUND_UP(i, 64);
     }
     for (; i < oprsz; i += 64) {
-        ffr[i / 64] = 0;
+        ffr[i >> 6] = 0;
     }
 }
 
-/* Hold the mmap lock during the operation so that there is no race
- * between page_check_range and the load operation.  We expect the
- * usual case to have no faults at all, so we check the whole range
- * first and if successful defer to the normal load operation.
- *
- * TODO: Change mmap_lock to a rwlock so that multiple readers
- * can run simultaneously.  This will probably help other uses
- * within QEMU as well.
+/*
+ * Common helper for all contiguous first-fault loads.
  */
-#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H)                             \
-static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg,    \
-                               target_ulong addr, intptr_t oprsz,       \
-                               bool first, uintptr_t ra)                \
-{                                                                       \
-    intptr_t i = 0;                                                     \
-    do {                                                                \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
-        do {                                                            \
-            TYPEM m = 0;                                                \
-            if (pg & 1) {                                               \
-                if (!first &&                                           \
-                    unlikely(page_check_range(addr, sizeof(TYPEM),      \
-                                              PAGE_READ))) {            \
-                    record_fault(env, i, oprsz);                        \
-                    return;                                             \
-                }                                                       \
-                m = FN(env, addr, ra);                                  \
-                first = false;                                          \
-            }                                                           \
-            *(TYPEE *)(vd + H(i)) = m;                                  \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);                   \
-            addr += sizeof(TYPEM);                                      \
-        } while (i & 15);                                               \
-    } while (i < oprsz);                                                \
-}                                                                       \
-void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg,                \
-                             target_ulong addr, uint32_t desc)          \
-{                                                                       \
-    intptr_t oprsz = simd_oprsz(desc);                                  \
-    unsigned rd = simd_data(desc);                                      \
-    void *vd = &env->vfp.zregs[rd];                                     \
-    mmap_lock();                                                        \
-    if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) {        \
-        do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC());            \
-    } else {                                                            \
-        do_sve_ldff1##PART(env, vd, vg, addr, oprsz, true, GETPC());    \
-    }                                                                   \
-    mmap_unlock();                                                      \
-}
+static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
+                        uint32_t desc, const uintptr_t retaddr,
+                        const int esz, const int msz,
+                        sve_ld1_host_fn *host_fn,
+                        sve_ld1_tlb_fn *tlb_fn)
+{
+    void *vd = &env->vfp.zregs[simd_data(desc)];
+    const int diffsz = esz - msz;
+    const intptr_t reg_max = simd_oprsz(desc);
+    const intptr_t mem_max = reg_max >> diffsz;
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t split, reg_off, mem_off;
+    void *host;
 
-/* No-fault loads are like first-fault loads without the
- * first faulting special case.
- */
-#define DO_LDNF1(PART)                                                  \
-void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg,                \
-                             target_ulong addr, uint32_t desc)          \
-{                                                                       \
-    intptr_t oprsz = simd_oprsz(desc);                                  \
-    unsigned rd = simd_data(desc);                                      \
-    void *vd = &env->vfp.zregs[rd];                                     \
-    mmap_lock();                                                        \
-    if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) {        \
-        do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC());            \
-    } else {                                                            \
-        do_sve_ldff1##PART(env, vd, vg, addr, oprsz, false, GETPC());   \
-    }                                                                   \
-    mmap_unlock();                                                      \
-}
+    set_helper_retaddr(retaddr);
 
+    split = max_for_page(addr, 0, mem_max);
+    if (likely(split == mem_max)) {
+        /* The entire operation is within one page.  */
+        host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
+        if (test_host_page(host)) {
+            mem_off = host_fn(vd, vg, host, 0, mem_max);
+            tcg_debug_assert(mem_off == mem_max);
+            set_helper_retaddr(0);
+            return;
+        }
+    }
+
+    /* Skip to the first true predicate.  */
+    reg_off = find_next_active(vg, 0, reg_max, esz);
+    if (unlikely(reg_off == reg_max)) {
+        /* The entire predicate was false; no load occurs.  */
+        set_helper_retaddr(0);
+        memset(vd, 0, reg_max);
+        return;
+    }
+    mem_off = reg_off >> diffsz;
+
+#ifdef CONFIG_USER_ONLY
+    /* The page(s) containing this first element at ADDR+MEM_OFF must
+     * be valid.  Considering that this first element may be misaligned
+     * and cross a page boundary itself, take the rest of the page from
+     * the last byte of the element.
+     */
+    split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
+    mem_off = host_fn(vd, vg, g2h(addr), mem_off, split);
+
+    /* After any fault, zero any leading predicated false elts.  */
+    swap_memzero(vd, reg_off);
+    reg_off = mem_off << diffsz;
 #else
+    /* Perform one normal read, which will fault or not.
+     * But it is likely to bring the page into the tlb.
+     */
+    tlb_fn(env, vd, reg_off, addr + mem_off, mmu_idx, retaddr);
 
-/* TODO: System mode is not yet supported.
- * This would probably use tlb_vaddr_to_host.
- */
-#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H)                     \
-void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg,        \
-                  target_ulong addr, uint32_t desc)             \
-{                                                               \
-    g_assert_not_reached();                                     \
-}
-
-#define DO_LDNF1(PART)                                          \
-void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg,        \
-                  target_ulong addr, uint32_t desc)             \
-{                                                               \
-    g_assert_not_reached();                                     \
-}
+    /* After any fault, zero any leading predicated false elts.  */
+    swap_memzero(vd, reg_off);
+    mem_off += 1 << msz;
+    reg_off += 1 << esz;
 
+    /* Try again to read the balance of the page.  */
+    split = max_for_page(addr, mem_off - 1, mem_max);
+    if (split >= (1 << msz)) {
+        host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
+        if (host) {
+            mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
+            reg_off = mem_off << diffsz;
+        }
+    }
 #endif
 
-DO_LDFF1(bb_r,  cpu_ldub_data_ra, uint8_t, uint8_t, H1)
-DO_LDFF1(bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
-DO_LDFF1(bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
-DO_LDFF1(bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
-DO_LDFF1(bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
-DO_LDFF1(bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
-DO_LDFF1(bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
+    set_helper_retaddr(0);
+    record_fault(env, reg_off, reg_max);
+}
 
-DO_LDFF1(hh_r,  cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
-DO_LDFF1(hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
-DO_LDFF1(hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4)
-DO_LDFF1(hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
-DO_LDFF1(hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
+/*
+ * Common helper for all contiguous no-fault loads.
+ */
+static void sve_ldnf1_r(CPUARMState *env, void *vg, const target_ulong addr,
+                        uint32_t desc, const int esz, const int msz,
+                        sve_ld1_host_fn *host_fn)
+{
+    void *vd = &env->vfp.zregs[simd_data(desc)];
+    const int diffsz = esz - msz;
+    const intptr_t reg_max = simd_oprsz(desc);
+    const intptr_t mem_max = reg_max >> diffsz;
+    intptr_t split, reg_off, mem_off;
+    void *host;
 
-DO_LDFF1(ss_r,  cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
-DO_LDFF1(sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
-DO_LDFF1(sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
+#ifdef CONFIG_USER_ONLY
+    /* Do not set helper_retaddr as there should be no fault.  */
+    host = g2h(addr);
+    if (likely(page_check_range(addr, mem_max, PAGE_READ) == 0)) {
+        /* The entire operation is valid.  */
+        host_fn(vd, vg, host, 0, mem_max);
+        return;
+    }
+#else
+    const int mmu_idx = extract32(desc, SIMD_DATA_SHIFT, 4);
+    /* Unless we can load the entire vector from the same page,
+     * we need to search for the first active element.
+     */
+    split = max_for_page(addr, 0, mem_max);
+    if (likely(split == mem_max)) {
+        host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
+        if (host) {
+            host_fn(vd, vg, host, 0, mem_max);
+            return;
+        }
+    }
+#endif
 
-DO_LDFF1(dd_r,  cpu_ldq_data_ra, uint64_t, uint64_t, )
+    /* There will be no fault, so we may modify in advance.  */
+    memset(vd, 0, reg_max);
 
-#undef DO_LDFF1
+    /* Skip to the first true predicate.  */
+    reg_off = find_next_active(vg, 0, reg_max, esz);
+    if (unlikely(reg_off == reg_max)) {
+        /* The entire predicate was false; no load occurs.  */
+        return;
+    }
+    mem_off = reg_off >> diffsz;
 
-DO_LDNF1(bb_r)
-DO_LDNF1(bhu_r)
-DO_LDNF1(bhs_r)
-DO_LDNF1(bsu_r)
-DO_LDNF1(bss_r)
-DO_LDNF1(bdu_r)
-DO_LDNF1(bds_r)
+#ifdef CONFIG_USER_ONLY
+    if (page_check_range(addr + mem_off, 1 << msz, PAGE_READ) == 0) {
+        /* At least one load is valid; take the rest of the page.  */
+        split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
+        mem_off = host_fn(vd, vg, host, mem_off, split);
+        reg_off = mem_off << diffsz;
+    }
+#else
+    /* If the address is not in the TLB, we have no way to bring the
+     * entry into the TLB without also risking a fault.  Note that
+     * the corollary is that we never load from an address not in RAM.
+     * ??? This last may be out of spec.
+     */
+    host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
+    split = max_for_page(addr, mem_off, mem_max);
+    if (host && split >= (1 << msz)) {
+        mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
+        reg_off = mem_off << diffsz;
+    }
+#endif
 
-DO_LDNF1(hh_r)
-DO_LDNF1(hsu_r)
-DO_LDNF1(hss_r)
-DO_LDNF1(hdu_r)
-DO_LDNF1(hds_r)
+    record_fault(env, reg_off, reg_max);
+}
 
-DO_LDNF1(ss_r)
-DO_LDNF1(sdu_r)
-DO_LDNF1(sds_r)
+#define DO_LDFF1_LDNF1_1(PART, ESZ) \
+void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg,            \
+                                 target_ulong addr, uint32_t desc)      \
+{                                                                       \
+    sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, 0,                   \
+                sve_ld1##PART##_host, sve_ld1##PART##_tlb);             \
+}                                                                       \
+void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
+                                 target_ulong addr, uint32_t desc)      \
+{                                                                       \
+    sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host);     \
+}
 
-DO_LDNF1(dd_r)
+/* TODO: Propagate the endian check back to the translator.  */
+#define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
+void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg,            \
+                                 target_ulong addr, uint32_t desc)      \
+{                                                                       \
+    if (arm_cpu_data_is_big_endian(env)) {                              \
+        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
+                    sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb);   \
+    } else {                                                            \
+        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
+                    sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb);   \
+    }                                                                   \
+}                                                                       \
+void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
+                                 target_ulong addr, uint32_t desc)      \
+{                                                                       \
+    if (arm_cpu_data_is_big_endian(env)) {                              \
+        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
+                    sve_ld1##PART##_be_host);                           \
+    } else {                                                            \
+        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
+                    sve_ld1##PART##_le_host);                           \
+    }                                                                   \
+}
 
-#undef DO_LDNF1
+DO_LDFF1_LDNF1_1(bb,  0)
+DO_LDFF1_LDNF1_1(bhu, 1)
+DO_LDFF1_LDNF1_1(bhs, 1)
+DO_LDFF1_LDNF1_1(bsu, 2)
+DO_LDFF1_LDNF1_1(bss, 2)
+DO_LDFF1_LDNF1_1(bdu, 3)
+DO_LDFF1_LDNF1_1(bds, 3)
+
+DO_LDFF1_LDNF1_2(hh,  1, 1)
+DO_LDFF1_LDNF1_2(hsu, 2, 1)
+DO_LDFF1_LDNF1_2(hss, 2, 1)
+DO_LDFF1_LDNF1_2(hdu, 3, 1)
+DO_LDFF1_LDNF1_2(hds, 3, 1)
+
+DO_LDFF1_LDNF1_2(ss,  2, 2)
+DO_LDFF1_LDNF1_2(sdu, 3, 2)
+DO_LDFF1_LDNF1_2(sds, 3, 2)
+
+DO_LDFF1_LDNF1_2(dd,  3, 3)
+
+#undef DO_LDFF1_LDNF1_1
+#undef DO_LDFF1_LDNF1_2
 
 /*
  * Store contiguous data, protected by a governing predicate.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (11 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
@ 2018-08-09  4:21 ` Richard Henderson
  2018-08-23 16:04   ` Peter Maydell
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
                   ` (9 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:21 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

Use the same *_tlb primitives as we use for ld1.  This is not
a significant change, but does (for linux-user) hoist the set
of helper_retaddr, and (for softmmu) hoist the computation of
the current mmu_idx outside the loop.

This does fix the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 210 ++++++++++++++++++++++------------------
 1 file changed, 117 insertions(+), 93 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4ca9412e20..5cc7de5077 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4398,109 +4398,133 @@ DO_LD1_2(ld1dd,  3, 3)
 #undef DO_LD1_1
 #undef DO_LD1_2
 
-#define DO_LD2(NAME, FN, TYPEE, TYPEM, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc);                  \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    void *d1 = &env->vfp.zregs[rd];                        \
-    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
-    for (i = 0; i < oprsz; ) {                             \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            TYPEM m1 = 0, m2 = 0;                          \
-            if (pg & 1) {                                  \
-                m1 = FN(env, addr, ra);                    \
-                m2 = FN(env, addr + sizeof(TYPEM), ra);    \
-            }                                              \
-            *(TYPEE *)(d1 + H(i)) = m1;                    \
-            *(TYPEE *)(d2 + H(i)) = m2;                    \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += 2 * sizeof(TYPEM);                     \
-        } while (i & 15);                                  \
-    }                                                      \
+/*
+ * Common helpers for all contiguous 2,3,4-register predicated loads.
+ */
+static void sve_ld2_r(CPUARMState *env, void *vg, target_ulong addr,
+                      uint32_t desc, int size, uintptr_t ra,
+                      sve_ld1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
+    unsigned rd = simd_data(desc);
+    ARMVectorReg scratch[2] = { };
+
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
+                tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+            }
+            i += size, pg >>= size;
+            addr += 2 * size;
+        } while (i & 15);
+    }
+    set_helper_retaddr(0);
+
+    /* Wait until all exceptions have been raised to write back.  */
+    memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz);
+    memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz);
 }
 
-#define DO_LD3(NAME, FN, TYPEE, TYPEM, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc);                  \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    void *d1 = &env->vfp.zregs[rd];                        \
-    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
-    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
-    for (i = 0; i < oprsz; ) {                             \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            TYPEM m1 = 0, m2 = 0, m3 = 0;                  \
-            if (pg & 1) {                                  \
-                m1 = FN(env, addr, ra);                    \
-                m2 = FN(env, addr + sizeof(TYPEM), ra);    \
-                m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \
-            }                                              \
-            *(TYPEE *)(d1 + H(i)) = m1;                    \
-            *(TYPEE *)(d2 + H(i)) = m2;                    \
-            *(TYPEE *)(d3 + H(i)) = m3;                    \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += 3 * sizeof(TYPEM);                     \
-        } while (i & 15);                                  \
-    }                                                      \
+static void sve_ld3_r(CPUARMState *env, void *vg, target_ulong addr,
+                      uint32_t desc, int size, uintptr_t ra,
+                      sve_ld1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
+    unsigned rd = simd_data(desc);
+    ARMVectorReg scratch[3] = { };
+
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
+                tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+                tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
+            }
+            i += size, pg >>= size;
+            addr += 3 * size;
+        } while (i & 15);
+    }
+    set_helper_retaddr(0);
+
+    /* Wait until all exceptions have been raised to write back.  */
+    memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz);
+    memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz);
+    memcpy(&env->vfp.zregs[(rd + 2) & 31], &scratch[2], oprsz);
 }
 
-#define DO_LD4(NAME, FN, TYPEE, TYPEM, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc);                  \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    void *d1 = &env->vfp.zregs[rd];                        \
-    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
-    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
-    void *d4 = &env->vfp.zregs[(rd + 3) & 31];             \
-    for (i = 0; i < oprsz; ) {                             \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            TYPEM m1 = 0, m2 = 0, m3 = 0, m4 = 0;          \
-            if (pg & 1) {                                  \
-                m1 = FN(env, addr, ra);                    \
-                m2 = FN(env, addr + sizeof(TYPEM), ra);    \
-                m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \
-                m4 = FN(env, addr + 3 * sizeof(TYPEM), ra); \
-            }                                              \
-            *(TYPEE *)(d1 + H(i)) = m1;                    \
-            *(TYPEE *)(d2 + H(i)) = m2;                    \
-            *(TYPEE *)(d3 + H(i)) = m3;                    \
-            *(TYPEE *)(d4 + H(i)) = m4;                    \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += 4 * sizeof(TYPEM);                     \
-        } while (i & 15);                                  \
-    }                                                      \
+static void sve_ld4_r(CPUARMState *env, void *vg, target_ulong addr,
+                      uint32_t desc, int size, uintptr_t ra,
+                      sve_ld1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
+    unsigned rd = simd_data(desc);
+    ARMVectorReg scratch[4] = { };
+
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
+                tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+                tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
+                tlb_fn(env, &scratch[3], i, addr + 3 * size, mmu_idx, ra);
+            }
+            i += size, pg >>= size;
+            addr += 4 * size;
+        } while (i & 15);
+    }
+    set_helper_retaddr(0);
+
+    /* Wait until all exceptions have been raised to write back.  */
+    memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz);
+    memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz);
+    memcpy(&env->vfp.zregs[(rd + 2) & 31], &scratch[2], oprsz);
+    memcpy(&env->vfp.zregs[(rd + 3) & 31], &scratch[3], oprsz);
 }
 
-DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
-DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
-DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
+#define DO_LDN_1(N) \
+void __attribute__((flatten)) HELPER(sve_ld##N##bb_r)               \
+    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)  \
+{                                                                   \
+    sve_ld##N##_r(env, vg, addr, desc, 1, GETPC(), sve_ld1bb_tlb);  \
+}
 
-DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
-DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
-DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
+#define DO_LDN_2(N, SUFF, SIZE)                                       \
+void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_r)             \
+    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
+{                                                                     \
+    sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(),                 \
+                  arm_cpu_data_is_big_endian(env)                     \
+                  ? sve_ld1##SUFF##_be_tlb : sve_ld1##SUFF##_le_tlb); \
+}
 
-DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
-DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
-DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
+DO_LDN_1(2)
+DO_LDN_1(3)
+DO_LDN_1(4)
 
-DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
-DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
-DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
+DO_LDN_2(2, hh, 2)
+DO_LDN_2(3, hh, 2)
+DO_LDN_2(4, hh, 2)
 
-#undef DO_LD2
-#undef DO_LD3
-#undef DO_LD4
+DO_LDN_2(2, ss, 4)
+DO_LDN_2(3, ss, 4)
+DO_LDN_2(4, ss, 4)
+
+DO_LDN_2(2, dd, 8)
+DO_LDN_2(3, dd, 8)
+DO_LDN_2(4, dd, 8)
+
+#undef DO_LDN_1
+#undef DO_LDN_2
 
 /*
  * Load contiguous data, first-fault and no-fault.
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (12 preceding siblings ...)
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
@ 2018-08-09  4:22 ` Richard Henderson
  2018-08-23 16:06   ` Peter Maydell
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
                   ` (8 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

This fixes the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 351 ++++++++++++++++++++--------------------
 1 file changed, 172 insertions(+), 179 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 5cc7de5077..4eae6569cc 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3987,6 +3987,7 @@ typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
  */
 typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
                             target_ulong vaddr, int mmu_idx, uintptr_t ra);
+typedef sve_ld1_tlb_fn sve_st1_tlb_fn;
 
 /*
  * Generate the above primitives.
@@ -4772,214 +4773,206 @@ DO_LDFF1_LDNF1_2(dd,  3, 3)
 /*
  * Store contiguous data, protected by a governing predicate.
  */
-#define DO_ST1(NAME, FN, TYPEE, TYPEM, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc);                  \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    void *vd = &env->vfp.zregs[rd];                        \
-    for (i = 0; i < oprsz; ) {                             \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            if (pg & 1) {                                  \
-                TYPEM m = *(TYPEE *)(vd + H(i));           \
-                FN(env, addr, m, ra);                      \
-            }                                              \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += sizeof(TYPEM);                         \
-        } while (i & 15);                                  \
-    }                                                      \
+
+#ifdef CONFIG_SOFTMMU
+#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
+                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+{                                                                           \
+    TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
+    TLB(env, addr, *(TYPEM *)(vd + H(reg_off)), oi, ra);                    \
 }
-
-#define DO_ST1_D(NAME, FN, TYPEM)                          \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc) / 8;              \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    uint64_t *d = &env->vfp.zregs[rd].d[0];                \
-    uint8_t *pg = vg;                                      \
-    for (i = 0; i < oprsz; i += 1) {                       \
-        if (pg[H1(i)] & 1) {                               \
-            FN(env, addr, d[i], ra);                       \
-        }                                                  \
-        addr += sizeof(TYPEM);                             \
-    }                                                      \
+#else
+#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
+                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+{                                                                           \
+    HOST(g2h(addr), *(TYPEM *)(vd + H(reg_off)));                           \
 }
+#endif
 
-#define DO_ST2(NAME, FN, TYPEE, TYPEM, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc);                  \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    void *d1 = &env->vfp.zregs[rd];                        \
-    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
-    for (i = 0; i < oprsz; ) {                             \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            if (pg & 1) {                                  \
-                TYPEM m1 = *(TYPEE *)(d1 + H(i));          \
-                TYPEM m2 = *(TYPEE *)(d2 + H(i));          \
-                FN(env, addr, m1, ra);                     \
-                FN(env, addr + sizeof(TYPEM), m2, ra);     \
-            }                                              \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += 2 * sizeof(TYPEM);                     \
-        } while (i & 15);                                  \
-    }                                                      \
-}
+DO_ST_TLB(st1bb,   H1,  uint8_t, stb_p, 0, helper_ret_stb_mmu)
+DO_ST_TLB(st1bh, H1_2, uint16_t, stb_p, 0, helper_ret_stb_mmu)
+DO_ST_TLB(st1bs, H1_4, uint32_t, stb_p, 0, helper_ret_stb_mmu)
+DO_ST_TLB(st1bd,     , uint64_t, stb_p, 0, helper_ret_stb_mmu)
 
-#define DO_ST3(NAME, FN, TYPEE, TYPEM, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc);                  \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    void *d1 = &env->vfp.zregs[rd];                        \
-    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
-    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
-    for (i = 0; i < oprsz; ) {                             \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            if (pg & 1) {                                  \
-                TYPEM m1 = *(TYPEE *)(d1 + H(i));          \
-                TYPEM m2 = *(TYPEE *)(d2 + H(i));          \
-                TYPEM m3 = *(TYPEE *)(d3 + H(i));          \
-                FN(env, addr, m1, ra);                     \
-                FN(env, addr + sizeof(TYPEM), m2, ra);     \
-                FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \
-            }                                              \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += 3 * sizeof(TYPEM);                     \
-        } while (i & 15);                                  \
-    }                                                      \
-}
+DO_ST_TLB(st1hh_le, H1_2, uint16_t, stw_le_p, MO_LE, helper_le_stw_mmu)
+DO_ST_TLB(st1hs_le, H1_4, uint32_t, stw_le_p, MO_LE, helper_le_stw_mmu)
+DO_ST_TLB(st1hd_le,     , uint64_t, stw_le_p, MO_LE, helper_le_stw_mmu)
 
-#define DO_ST4(NAME, FN, TYPEE, TYPEM, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vg,              \
-                  target_ulong addr, uint32_t desc)        \
-{                                                          \
-    intptr_t i, oprsz = simd_oprsz(desc);                  \
-    intptr_t ra = GETPC();                                 \
-    unsigned rd = simd_data(desc);                         \
-    void *d1 = &env->vfp.zregs[rd];                        \
-    void *d2 = &env->vfp.zregs[(rd + 1) & 31];             \
-    void *d3 = &env->vfp.zregs[(rd + 2) & 31];             \
-    void *d4 = &env->vfp.zregs[(rd + 3) & 31];             \
-    for (i = 0; i < oprsz; ) {                             \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
-        do {                                               \
-            if (pg & 1) {                                  \
-                TYPEM m1 = *(TYPEE *)(d1 + H(i));          \
-                TYPEM m2 = *(TYPEE *)(d2 + H(i));          \
-                TYPEM m3 = *(TYPEE *)(d3 + H(i));          \
-                TYPEM m4 = *(TYPEE *)(d4 + H(i));          \
-                FN(env, addr, m1, ra);                     \
-                FN(env, addr + sizeof(TYPEM), m2, ra);     \
-                FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \
-                FN(env, addr + 3 * sizeof(TYPEM), m4, ra); \
-            }                                              \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
-            addr += 4 * sizeof(TYPEM);                     \
-        } while (i & 15);                                  \
-    }                                                      \
-}
+DO_ST_TLB(st1ss_le, H1_4, uint32_t, stl_le_p, MO_LE, helper_le_stl_mmu)
+DO_ST_TLB(st1sd_le,     , uint64_t, stl_le_p, MO_LE, helper_le_stl_mmu)
 
-DO_ST1(sve_st1bh_r, cpu_stb_data_ra, uint16_t, uint8_t, H1_2)
-DO_ST1(sve_st1bs_r, cpu_stb_data_ra, uint32_t, uint8_t, H1_4)
-DO_ST1_D(sve_st1bd_r, cpu_stb_data_ra, uint8_t)
+DO_ST_TLB(st1dd_le,     , uint64_t, stq_le_p, MO_LE, helper_le_stq_mmu)
 
-DO_ST1(sve_st1hs_r, cpu_stw_data_ra, uint32_t, uint16_t, H1_4)
-DO_ST1_D(sve_st1hd_r, cpu_stw_data_ra, uint16_t)
+DO_ST_TLB(st1hh_be, H1_2, uint16_t, stw_be_p, MO_BE, helper_be_stw_mmu)
+DO_ST_TLB(st1hs_be, H1_4, uint32_t, stw_be_p, MO_BE, helper_be_stw_mmu)
+DO_ST_TLB(st1hd_be,     , uint64_t, stw_be_p, MO_BE, helper_be_stw_mmu)
 
-DO_ST1_D(sve_st1sd_r, cpu_stl_data_ra, uint32_t)
+DO_ST_TLB(st1ss_be, H1_4, uint32_t, stl_be_p, MO_BE, helper_be_stl_mmu)
+DO_ST_TLB(st1sd_be,     , uint64_t, stl_be_p, MO_BE, helper_be_stl_mmu)
 
-DO_ST1(sve_st1bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
-DO_ST2(sve_st2bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
-DO_ST3(sve_st3bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
-DO_ST4(sve_st4bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
+DO_ST_TLB(st1dd_be,     , uint64_t, stq_be_p, MO_BE, helper_be_stq_mmu)
 
-DO_ST1(sve_st1hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
-DO_ST2(sve_st2hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
-DO_ST3(sve_st3hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
-DO_ST4(sve_st4hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
+#undef DO_ST_TLB
 
-DO_ST1(sve_st1ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-DO_ST2(sve_st2ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-DO_ST3(sve_st3ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-DO_ST4(sve_st4ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-
-DO_ST1_D(sve_st1dd_r, cpu_stq_data_ra, uint64_t)
-
-void HELPER(sve_st2dd_r)(CPUARMState *env, void *vg,
-                         target_ulong addr, uint32_t desc)
+/*
+ * Common helpers for all contiguous 1,2,3,4-register predicated stores.
+ */
+static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr,
+                      uint32_t desc, const uintptr_t ra,
+                      const int esize, const int msize,
+                      sve_st1_tlb_fn *tlb_fn)
 {
-    intptr_t i, oprsz = simd_oprsz(desc) / 8;
-    intptr_t ra = GETPC();
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
     unsigned rd = simd_data(desc);
-    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
-    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
-    uint8_t *pg = vg;
+    void *vd = &env->vfp.zregs[rd];
 
-    for (i = 0; i < oprsz; i += 1) {
-        if (pg[H1(i)] & 1) {
-            cpu_stq_data_ra(env, addr, d1[i], ra);
-            cpu_stq_data_ra(env, addr + 8, d2[i], ra);
-        }
-        addr += 2 * 8;
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                tlb_fn(env, vd, i, addr, mmu_idx, ra);
+            }
+            i += esize, pg >>= esize;
+            addr += msize;
+        } while (i & 15);
     }
+    set_helper_retaddr(0);
 }
 
-void HELPER(sve_st3dd_r)(CPUARMState *env, void *vg,
-                         target_ulong addr, uint32_t desc)
+static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr,
+                      uint32_t desc, const uintptr_t ra,
+                      const int esize, const int msize,
+                      sve_st1_tlb_fn *tlb_fn)
 {
-    intptr_t i, oprsz = simd_oprsz(desc) / 8;
-    intptr_t ra = GETPC();
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
     unsigned rd = simd_data(desc);
-    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
-    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
-    uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
-    uint8_t *pg = vg;
+    void *d1 = &env->vfp.zregs[rd];
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];
 
-    for (i = 0; i < oprsz; i += 1) {
-        if (pg[H1(i)] & 1) {
-            cpu_stq_data_ra(env, addr, d1[i], ra);
-            cpu_stq_data_ra(env, addr + 8, d2[i], ra);
-            cpu_stq_data_ra(env, addr + 16, d3[i], ra);
-        }
-        addr += 3 * 8;
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                tlb_fn(env, d1, i, addr, mmu_idx, ra);
+                tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+            }
+            i += esize, pg >>= esize;
+            addr += 2 * msize;
+        } while (i & 15);
     }
+    set_helper_retaddr(0);
 }
 
-void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg,
-                         target_ulong addr, uint32_t desc)
+static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr,
+                      uint32_t desc, const uintptr_t ra,
+                      const int esize, const int msize,
+                      sve_st1_tlb_fn *tlb_fn)
 {
-    intptr_t i, oprsz = simd_oprsz(desc) / 8;
-    intptr_t ra = GETPC();
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
     unsigned rd = simd_data(desc);
-    uint64_t *d1 = &env->vfp.zregs[rd].d[0];
-    uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
-    uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
-    uint64_t *d4 = &env->vfp.zregs[(rd + 3) & 31].d[0];
-    uint8_t *pg = vg;
+    void *d1 = &env->vfp.zregs[rd];
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];
+    void *d3 = &env->vfp.zregs[(rd + 2) & 31];
 
-    for (i = 0; i < oprsz; i += 1) {
-        if (pg[H1(i)] & 1) {
-            cpu_stq_data_ra(env, addr, d1[i], ra);
-            cpu_stq_data_ra(env, addr + 8, d2[i], ra);
-            cpu_stq_data_ra(env, addr + 16, d3[i], ra);
-            cpu_stq_data_ra(env, addr + 24, d4[i], ra);
-        }
-        addr += 4 * 8;
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                tlb_fn(env, d1, i, addr, mmu_idx, ra);
+                tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+                tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
+            }
+            i += esize, pg >>= esize;
+            addr += 3 * msize;
+        } while (i & 15);
     }
+    set_helper_retaddr(0);
 }
 
+static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr,
+                      uint32_t desc, const uintptr_t ra,
+                      const int esize, const int msize,
+                      sve_st1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
+    unsigned rd = simd_data(desc);
+    void *d1 = &env->vfp.zregs[rd];
+    void *d2 = &env->vfp.zregs[(rd + 1) & 31];
+    void *d3 = &env->vfp.zregs[(rd + 2) & 31];
+    void *d4 = &env->vfp.zregs[(rd + 3) & 31];
+
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                tlb_fn(env, d1, i, addr, mmu_idx, ra);
+                tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+                tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
+                tlb_fn(env, d4, i, addr + 3 * msize, mmu_idx, ra);
+            }
+            i += esize, pg >>= esize;
+            addr += 4 * msize;
+        } while (i & 15);
+    }
+    set_helper_retaddr(0);
+}
+
+#define DO_STN_1(N, NAME, ESIZE) \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r)           \
+    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)  \
+{                                                                   \
+    sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, 1,           \
+                  sve_st1##NAME##_tlb);                             \
+}
+
+#define DO_STN_2(N, NAME, ESIZE, MSIZE) \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r)             \
+    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
+{                                                                     \
+    sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE,         \
+                  arm_cpu_data_is_big_endian(env)                     \
+                  ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \
+}
+
+DO_STN_1(1, bb, 1)
+DO_STN_1(1, bh, 2)
+DO_STN_1(1, bs, 4)
+DO_STN_1(1, bd, 8)
+DO_STN_1(2, bb, 1)
+DO_STN_1(3, bb, 1)
+DO_STN_1(4, bb, 1)
+
+DO_STN_2(1, hh, 2, 2)
+DO_STN_2(1, hs, 4, 2)
+DO_STN_2(1, hd, 8, 2)
+DO_STN_2(2, hh, 2, 2)
+DO_STN_2(3, hh, 2, 2)
+DO_STN_2(4, hh, 2, 2)
+
+DO_STN_2(1, ss, 4, 4)
+DO_STN_2(1, sd, 8, 4)
+DO_STN_2(2, ss, 4, 4)
+DO_STN_2(3, ss, 4, 4)
+DO_STN_2(4, ss, 4, 4)
+
+DO_STN_2(1, dd, 8, 8)
+DO_STN_2(2, dd, 8, 8)
+DO_STN_2(3, dd, 8, 8)
+DO_STN_2(4, dd, 8, 8)
+
+#undef DO_STN_1
+#undef DO_STN_2
+
 /* Loads with a vector index.  */
 
 #define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN)                            \
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (13 preceding siblings ...)
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
@ 2018-08-09  4:22 ` Richard Henderson
  2018-08-11  5:40   ` Philippe Mathieu-Daudé
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
                   ` (7 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

We can choose the endianness at translation time, rather than
re-computing it at execution time.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 117 +++++++++++++++-------
 target/arm/sve_helper.c    |  70 ++++++-------
 target/arm/translate-sve.c | 196 +++++++++++++++++++++++++------------
 3 files changed, 252 insertions(+), 131 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 023952a9a4..526caec8da 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1128,20 +1128,35 @@ DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ld1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ld1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ld1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
@@ -1150,13 +1165,21 @@ DEF_HELPER_FLAGS_4(sve_ld1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ld1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ld1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
@@ -1166,17 +1189,28 @@ DEF_HELPER_FLAGS_4(sve_ldff1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ldff1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ldff1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ldff1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ldff1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ldff1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldff1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldff1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_ldnf1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ldnf1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
@@ -1186,17 +1220,28 @@ DEF_HELPER_FLAGS_4(sve_ldnf1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ldnf1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_ldnf1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ldnf1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ldnf1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_ldnf1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldnf1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldnf1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_st1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4eae6569cc..56e2f523c5 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4362,18 +4362,18 @@ void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
               sve_##NAME##_host, sve_##NAME##_tlb);            \
 }
 
-/* TODO: Propagate the endian check back to the translator.  */
 #define DO_LD1_2(NAME, ESZ, MSZ) \
-void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
-                            target_ulong addr, uint32_t desc)  \
-{                                                              \
-    if (arm_cpu_data_is_big_endian(env)) {                     \
-        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
-                  sve_##NAME##_be_host, sve_##NAME##_be_tlb);  \
-    } else {                                                   \
-        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
-                  sve_##NAME##_le_host, sve_##NAME##_le_tlb);  \
-    }                                                          \
+void HELPER(sve_##NAME##_le_r)(CPUARMState *env, void *vg,        \
+                               target_ulong addr, uint32_t desc)  \
+{                                                                 \
+    sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
+              sve_##NAME##_le_host, sve_##NAME##_le_tlb);         \
+}                                                                 \
+void HELPER(sve_##NAME##_be_r)(CPUARMState *env, void *vg,        \
+                               target_ulong addr, uint32_t desc)  \
+{                                                                 \
+    sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
+              sve_##NAME##_be_host, sve_##NAME##_be_tlb);         \
 }
 
 DO_LD1_1(ld1bb,  0)
@@ -4500,12 +4500,17 @@ void __attribute__((flatten)) HELPER(sve_ld##N##bb_r)               \
 }
 
 #define DO_LDN_2(N, SUFF, SIZE)                                       \
-void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_r)             \
+void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_le_r)          \
     (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
 {                                                                     \
     sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(),                 \
-                  arm_cpu_data_is_big_endian(env)                     \
-                  ? sve_ld1##SUFF##_be_tlb : sve_ld1##SUFF##_le_tlb); \
+                  sve_ld1##SUFF##_le_tlb);                            \
+}                                                                     \
+void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_be_r)          \
+    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
+{                                                                     \
+    sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(),                 \
+                  sve_ld1##SUFF##_be_tlb);                            \
 }
 
 DO_LDN_1(2)
@@ -4722,29 +4727,28 @@ void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
     sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host);     \
 }
 
-/* TODO: Propagate the endian check back to the translator.  */
 #define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
-void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg,            \
-                                 target_ulong addr, uint32_t desc)      \
+void HELPER(sve_ldff1##PART##_le_r)(CPUARMState *env, void *vg,         \
+                                    target_ulong addr, uint32_t desc)   \
 {                                                                       \
-    if (arm_cpu_data_is_big_endian(env)) {                              \
-        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
-                    sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb);   \
-    } else {                                                            \
-        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
-                    sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb);   \
-    }                                                                   \
+    sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,                 \
+                sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb);       \
 }                                                                       \
-void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
-                                 target_ulong addr, uint32_t desc)      \
+void HELPER(sve_ldnf1##PART##_le_r)(CPUARMState *env, void *vg,         \
+                                    target_ulong addr, uint32_t desc)   \
 {                                                                       \
-    if (arm_cpu_data_is_big_endian(env)) {                              \
-        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
-                    sve_ld1##PART##_be_host);                           \
-    } else {                                                            \
-        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
-                    sve_ld1##PART##_le_host);                           \
-    }                                                                   \
+    sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_le_host); \
+}                                                                       \
+void HELPER(sve_ldff1##PART##_be_r)(CPUARMState *env, void *vg,         \
+                                    target_ulong addr, uint32_t desc)   \
+{                                                                       \
+    sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,                 \
+                sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb);       \
+}                                                                       \
+void HELPER(sve_ldnf1##PART##_be_r)(CPUARMState *env, void *vg,         \
+                                    target_ulong addr, uint32_t desc)   \
+{                                                                       \
+    sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_be_host); \
 }
 
 DO_LDFF1_LDNF1_1(bb,  0)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index bef6b8242d..de12c01e7d 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4624,32 +4624,58 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
 static void do_ld_zpa(DisasContext *s, int zt, int pg,
                       TCGv_i64 addr, int dtype, int nreg)
 {
-    static gen_helper_gvec_mem * const fns[16][4] = {
-        { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
-          gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
-        { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
+    static gen_helper_gvec_mem * const fns[2][16][4] = {
+        /* Little-endian */
+        { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
+            gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
+          { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
 
-        { gen_helper_sve_ld1sds_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1hh_r, gen_helper_sve_ld2hh_r,
-          gen_helper_sve_ld3hh_r, gen_helper_sve_ld4hh_r },
-        { gen_helper_sve_ld1hsu_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1hdu_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r,
+            gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r },
+          { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL },
 
-        { gen_helper_sve_ld1hds_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1hss_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1ss_r, gen_helper_sve_ld2ss_r,
-          gen_helper_sve_ld3ss_r, gen_helper_sve_ld4ss_r },
-        { gen_helper_sve_ld1sdu_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r,
+            gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r },
+          { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL },
 
-        { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
-        { gen_helper_sve_ld1dd_r, gen_helper_sve_ld2dd_r,
-          gen_helper_sve_ld3dd_r, gen_helper_sve_ld4dd_r },
+          { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r,
+            gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } },
+
+        /* Big-endian */
+        { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
+            gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
+          { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
+
+          { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1hh_be_r, gen_helper_sve_ld2hh_be_r,
+            gen_helper_sve_ld3hh_be_r, gen_helper_sve_ld4hh_be_r },
+          { gen_helper_sve_ld1hsu_be_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1hdu_be_r, NULL, NULL, NULL },
+
+          { gen_helper_sve_ld1hds_be_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1hss_be_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld2ss_be_r,
+            gen_helper_sve_ld3ss_be_r, gen_helper_sve_ld4ss_be_r },
+          { gen_helper_sve_ld1sdu_be_r, NULL, NULL, NULL },
+
+          { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
+          { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r,
+            gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } }
     };
-    gen_helper_gvec_mem *fn = fns[dtype][nreg];
+    gen_helper_gvec_mem *fn = fns[s->be_data == MO_BE][dtype][nreg];
 
     /* While there are holes in the table, they are not
      * accessible via the instruction encoding.
@@ -4689,59 +4715,103 @@ static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
 
 static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
 {
-    static gen_helper_gvec_mem * const fns[16] = {
-        gen_helper_sve_ldff1bb_r,
-        gen_helper_sve_ldff1bhu_r,
-        gen_helper_sve_ldff1bsu_r,
-        gen_helper_sve_ldff1bdu_r,
+    static gen_helper_gvec_mem * const fns[2][16] = {
+        /* Little-endian */
+        { gen_helper_sve_ldff1bb_r,
+          gen_helper_sve_ldff1bhu_r,
+          gen_helper_sve_ldff1bsu_r,
+          gen_helper_sve_ldff1bdu_r,
 
-        gen_helper_sve_ldff1sds_r,
-        gen_helper_sve_ldff1hh_r,
-        gen_helper_sve_ldff1hsu_r,
-        gen_helper_sve_ldff1hdu_r,
+          gen_helper_sve_ldff1sds_le_r,
+          gen_helper_sve_ldff1hh_le_r,
+          gen_helper_sve_ldff1hsu_le_r,
+          gen_helper_sve_ldff1hdu_le_r,
 
-        gen_helper_sve_ldff1hds_r,
-        gen_helper_sve_ldff1hss_r,
-        gen_helper_sve_ldff1ss_r,
-        gen_helper_sve_ldff1sdu_r,
+          gen_helper_sve_ldff1hds_le_r,
+          gen_helper_sve_ldff1hss_le_r,
+          gen_helper_sve_ldff1ss_le_r,
+          gen_helper_sve_ldff1sdu_le_r,
 
-        gen_helper_sve_ldff1bds_r,
-        gen_helper_sve_ldff1bss_r,
-        gen_helper_sve_ldff1bhs_r,
-        gen_helper_sve_ldff1dd_r,
+          gen_helper_sve_ldff1bds_r,
+          gen_helper_sve_ldff1bss_r,
+          gen_helper_sve_ldff1bhs_r,
+          gen_helper_sve_ldff1dd_le_r },
+
+        /* Big-endian */
+        { gen_helper_sve_ldff1bb_r,
+          gen_helper_sve_ldff1bhu_r,
+          gen_helper_sve_ldff1bsu_r,
+          gen_helper_sve_ldff1bdu_r,
+
+          gen_helper_sve_ldff1sds_be_r,
+          gen_helper_sve_ldff1hh_be_r,
+          gen_helper_sve_ldff1hsu_be_r,
+          gen_helper_sve_ldff1hdu_be_r,
+
+          gen_helper_sve_ldff1hds_be_r,
+          gen_helper_sve_ldff1hss_be_r,
+          gen_helper_sve_ldff1ss_be_r,
+          gen_helper_sve_ldff1sdu_be_r,
+
+          gen_helper_sve_ldff1bds_r,
+          gen_helper_sve_ldff1bss_r,
+          gen_helper_sve_ldff1bhs_r,
+          gen_helper_sve_ldff1dd_be_r },
     };
 
     if (sve_access_check(s)) {
         TCGv_i64 addr = new_tmp_a64(s);
         tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
         tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
-        do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
+        do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
     }
     return true;
 }
 
 static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
 {
-    static gen_helper_gvec_mem * const fns[16] = {
-        gen_helper_sve_ldnf1bb_r,
-        gen_helper_sve_ldnf1bhu_r,
-        gen_helper_sve_ldnf1bsu_r,
-        gen_helper_sve_ldnf1bdu_r,
+    static gen_helper_gvec_mem * const fns[2][16] = {
+        /* Little-endian */
+        { gen_helper_sve_ldnf1bb_r,
+          gen_helper_sve_ldnf1bhu_r,
+          gen_helper_sve_ldnf1bsu_r,
+          gen_helper_sve_ldnf1bdu_r,
 
-        gen_helper_sve_ldnf1sds_r,
-        gen_helper_sve_ldnf1hh_r,
-        gen_helper_sve_ldnf1hsu_r,
-        gen_helper_sve_ldnf1hdu_r,
+          gen_helper_sve_ldnf1sds_le_r,
+          gen_helper_sve_ldnf1hh_le_r,
+          gen_helper_sve_ldnf1hsu_le_r,
+          gen_helper_sve_ldnf1hdu_le_r,
 
-        gen_helper_sve_ldnf1hds_r,
-        gen_helper_sve_ldnf1hss_r,
-        gen_helper_sve_ldnf1ss_r,
-        gen_helper_sve_ldnf1sdu_r,
+          gen_helper_sve_ldnf1hds_le_r,
+          gen_helper_sve_ldnf1hss_le_r,
+          gen_helper_sve_ldnf1ss_le_r,
+          gen_helper_sve_ldnf1sdu_le_r,
 
-        gen_helper_sve_ldnf1bds_r,
-        gen_helper_sve_ldnf1bss_r,
-        gen_helper_sve_ldnf1bhs_r,
-        gen_helper_sve_ldnf1dd_r,
+          gen_helper_sve_ldnf1bds_r,
+          gen_helper_sve_ldnf1bss_r,
+          gen_helper_sve_ldnf1bhs_r,
+          gen_helper_sve_ldnf1dd_le_r },
+
+        /* Big-endian */
+        { gen_helper_sve_ldnf1bb_r,
+          gen_helper_sve_ldnf1bhu_r,
+          gen_helper_sve_ldnf1bsu_r,
+          gen_helper_sve_ldnf1bdu_r,
+
+          gen_helper_sve_ldnf1sds_be_r,
+          gen_helper_sve_ldnf1hh_be_r,
+          gen_helper_sve_ldnf1hsu_be_r,
+          gen_helper_sve_ldnf1hdu_be_r,
+
+          gen_helper_sve_ldnf1hds_be_r,
+          gen_helper_sve_ldnf1hss_be_r,
+          gen_helper_sve_ldnf1ss_be_r,
+          gen_helper_sve_ldnf1sdu_be_r,
+
+          gen_helper_sve_ldnf1bds_r,
+          gen_helper_sve_ldnf1bss_r,
+          gen_helper_sve_ldnf1bhs_r,
+          gen_helper_sve_ldnf1dd_be_r },
     };
 
     if (sve_access_check(s)) {
@@ -4751,16 +4821,18 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
         TCGv_i64 addr = new_tmp_a64(s);
 
         tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), off);
-        do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
+        do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
     }
     return true;
 }
 
 static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
 {
-    static gen_helper_gvec_mem * const fns[4] = {
-        gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_r,
-        gen_helper_sve_ld1ss_r, gen_helper_sve_ld1dd_r,
+    static gen_helper_gvec_mem * const fns[2][4] = {
+        { gen_helper_sve_ld1bb_r,    gen_helper_sve_ld1hh_le_r,
+          gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld1dd_le_r },
+        { gen_helper_sve_ld1bb_r,    gen_helper_sve_ld1hh_be_r,
+          gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld1dd_be_r },
     };
     unsigned vsz = vec_full_reg_size(s);
     TCGv_ptr t_pg;
@@ -4792,7 +4864,7 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
     t_pg = tcg_temp_new_ptr();
     tcg_gen_addi_ptr(t_pg, cpu_env, poff);
 
-    fns[msz](cpu_env, t_pg, addr, desc);
+    fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, desc);
 
     tcg_temp_free_ptr(t_pg);
     tcg_temp_free_i32(desc);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores for endianness
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (14 preceding siblings ...)
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
@ 2018-08-09  4:22 ` Richard Henderson
  2018-08-11  5:41   ` Philippe Mathieu-Daudé
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
                   ` (6 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

We can choose the endianness at translation time, rather than
re-computing it at execution time.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    | 48 +++++++++++++++++--------
 target/arm/sve_helper.c    | 11 ++++--
 target/arm/translate-sve.c | 72 +++++++++++++++++++++++++++++---------
 3 files changed, 96 insertions(+), 35 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 526caec8da..1ad043101a 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1248,29 +1248,47 @@ DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_st3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_st4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_st1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_st1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_st1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hs_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hs_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
-DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1sd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 56e2f523c5..92c0e961a9 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4940,12 +4940,17 @@ void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r)           \
 }
 
 #define DO_STN_2(N, NAME, ESIZE, MSIZE) \
-void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r)             \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_le_r)          \
     (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
 {                                                                     \
     sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE,         \
-                  arm_cpu_data_is_big_endian(env)                     \
-                  ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \
+                  sve_st1##NAME##_le_tlb);                            \
+}                                                                     \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_be_r)          \
+    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
+{                                                                     \
+    sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE,         \
+                  sve_st1##NAME##_be_tlb);                            \
 }
 
 DO_STN_1(1, bb, 1)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index de12c01e7d..acb85731f8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4953,32 +4953,70 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
 static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
                       int msz, int esz, int nreg)
 {
-    static gen_helper_gvec_mem * const fn_single[4][4] = {
-        { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r,
-          gen_helper_sve_st1bs_r, gen_helper_sve_st1bd_r },
-        { NULL,                   gen_helper_sve_st1hh_r,
-          gen_helper_sve_st1hs_r, gen_helper_sve_st1hd_r },
-        { NULL, NULL,
-          gen_helper_sve_st1ss_r, gen_helper_sve_st1sd_r },
-        { NULL, NULL, NULL, gen_helper_sve_st1dd_r },
+    static gen_helper_gvec_mem * const fn_single[2][4][4] = {
+        { { gen_helper_sve_st1bb_r,
+            gen_helper_sve_st1bh_r,
+            gen_helper_sve_st1bs_r,
+            gen_helper_sve_st1bd_r },
+          { NULL,
+            gen_helper_sve_st1hh_le_r,
+            gen_helper_sve_st1hs_le_r,
+            gen_helper_sve_st1hd_le_r },
+          { NULL, NULL,
+            gen_helper_sve_st1ss_le_r,
+            gen_helper_sve_st1sd_le_r },
+          { NULL, NULL, NULL,
+            gen_helper_sve_st1dd_le_r } },
+        { { gen_helper_sve_st1bb_r,
+            gen_helper_sve_st1bh_r,
+            gen_helper_sve_st1bs_r,
+            gen_helper_sve_st1bd_r },
+          { NULL,
+            gen_helper_sve_st1hh_be_r,
+            gen_helper_sve_st1hs_be_r,
+            gen_helper_sve_st1hd_be_r },
+          { NULL, NULL,
+            gen_helper_sve_st1ss_be_r,
+            gen_helper_sve_st1sd_be_r },
+          { NULL, NULL, NULL,
+            gen_helper_sve_st1dd_be_r } },
     };
-    static gen_helper_gvec_mem * const fn_multiple[3][4] = {
-        { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_r,
-          gen_helper_sve_st2ss_r, gen_helper_sve_st2dd_r },
-        { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_r,
-          gen_helper_sve_st3ss_r, gen_helper_sve_st3dd_r },
-        { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_r,
-          gen_helper_sve_st4ss_r, gen_helper_sve_st4dd_r },
+    static gen_helper_gvec_mem * const fn_multiple[2][3][4] = {
+        { { gen_helper_sve_st2bb_r,
+            gen_helper_sve_st2hh_le_r,
+            gen_helper_sve_st2ss_le_r,
+            gen_helper_sve_st2dd_le_r },
+          { gen_helper_sve_st3bb_r,
+            gen_helper_sve_st3hh_le_r,
+            gen_helper_sve_st3ss_le_r,
+            gen_helper_sve_st3dd_le_r },
+          { gen_helper_sve_st4bb_r,
+            gen_helper_sve_st4hh_le_r,
+            gen_helper_sve_st4ss_le_r,
+            gen_helper_sve_st4dd_le_r } },
+        { { gen_helper_sve_st2bb_r,
+            gen_helper_sve_st2hh_be_r,
+            gen_helper_sve_st2ss_be_r,
+            gen_helper_sve_st2dd_be_r },
+          { gen_helper_sve_st3bb_r,
+            gen_helper_sve_st3hh_be_r,
+            gen_helper_sve_st3ss_be_r,
+            gen_helper_sve_st3dd_be_r },
+          { gen_helper_sve_st4bb_r,
+            gen_helper_sve_st4hh_be_r,
+            gen_helper_sve_st4ss_be_r,
+            gen_helper_sve_st4dd_be_r } },
     };
     gen_helper_gvec_mem *fn;
+    int be = s->be_data == MO_BE;
 
     if (nreg == 0) {
         /* ST1 */
-        fn = fn_single[msz][esz];
+        fn = fn_single[be][msz][esz];
     } else {
         /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
         assert(msz == esz);
-        fn = fn_multiple[nreg - 1][msz];
+        fn = fn_multiple[be][nreg - 1][msz];
     }
     assert(fn != NULL);
     do_mem_zpa(s, zt, pg, addr, fn);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (15 preceding siblings ...)
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
@ 2018-08-09  4:22 ` Richard Henderson
  2018-08-23 16:08   ` Peter Maydell
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
                   ` (5 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

This fixes the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  84 +++++++++----
 target/arm/sve_helper.c    | 218 +++++++++++++++++++++++----------
 target/arm/translate-sve.c | 244 +++++++++++++++++++++++++------------
 3 files changed, 380 insertions(+), 166 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 1ad043101a..49d1c09e30 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1292,69 +1292,111 @@ DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhsu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldssu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldbss_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhss_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhss_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhss_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldbsu_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhsu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldssu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldbss_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhss_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhss_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhss_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldbdu_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldddu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldsdu_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldbds_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldbdu_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldddu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldsdu_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldbds_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldbdu_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_le_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldddu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldsdu_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_be_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldbds_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_le_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_be_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 92c0e961a9..76d3f021e4 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4984,80 +4984,166 @@ DO_STN_2(4, dd, 8, 8)
 
 /* Loads with a vector index.  */
 
-#define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN)                            \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
-                  target_ulong base, uint32_t desc)                     \
-{                                                                       \
-    intptr_t i, oprsz = simd_oprsz(desc);                               \
-    unsigned scale = simd_data(desc);                                   \
-    uintptr_t ra = GETPC();                                             \
-    for (i = 0; i < oprsz; ) {                                          \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
-        do {                                                            \
-            TYPEM m = 0;                                                \
-            if (pg & 1) {                                               \
-                target_ulong off = *(TYPEI *)(vm + H1_4(i));            \
-                m = FN(env, base + (off << scale), ra);                 \
-            }                                                           \
-            *(uint32_t *)(vd + H1_4(i)) = m;                            \
-            i += 4, pg >>= 4;                                           \
-        } while (i & 15);                                               \
-    }                                                                   \
+typedef target_ulong zreg_off_fn(void *reg, intptr_t reg_ofs);
+
+static target_ulong off_zsu_s(void *reg, intptr_t reg_ofs)
+{
+    return *(uint32_t *)(reg + H1_4(reg_ofs));
 }
 
-#define DO_LD1_ZPZ_D(NAME, TYPEI, TYPEM, FN)                            \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
-                  target_ulong base, uint32_t desc)                     \
-{                                                                       \
-    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
-    unsigned scale = simd_data(desc);                                   \
-    uintptr_t ra = GETPC();                                             \
-    uint64_t *d = vd, *m = vm; uint8_t *pg = vg;                        \
-    for (i = 0; i < oprsz; i++) {                                       \
-        TYPEM mm = 0;                                                   \
-        if (pg[H1(i)] & 1) {                                            \
-            target_ulong off = (TYPEI)m[i];                             \
-            mm = FN(env, base + (off << scale), ra);                    \
-        }                                                               \
-        d[i] = mm;                                                      \
-    }                                                                   \
+static target_ulong off_zss_s(void *reg, intptr_t reg_ofs)
+{
+    return *(int32_t *)(reg + H1_4(reg_ofs));
 }
 
-DO_LD1_ZPZ_S(sve_ldbsu_zsu, uint32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhsu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_S(sve_ldssu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_S(sve_ldbss_zsu, uint32_t, int8_t,   cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhss_zsu, uint32_t, int16_t,  cpu_lduw_data_ra)
+static target_ulong off_zsu_d(void *reg, intptr_t reg_ofs)
+{
+    return (uint32_t)*(uint64_t *)(reg + reg_ofs);
+}
 
-DO_LD1_ZPZ_S(sve_ldbsu_zss, int32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhsu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_S(sve_ldssu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_S(sve_ldbss_zss, int32_t, int8_t,   cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhss_zss, int32_t, int16_t,  cpu_lduw_data_ra)
+static target_ulong off_zss_d(void *reg, intptr_t reg_ofs)
+{
+    return (int32_t)*(uint64_t *)(reg + reg_ofs);
+}
 
-DO_LD1_ZPZ_D(sve_ldbdu_zsu, uint32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhdu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsdu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_D(sve_ldddu_zsu, uint32_t, uint64_t, cpu_ldq_data_ra)
-DO_LD1_ZPZ_D(sve_ldbds_zsu, uint32_t, int8_t,   cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhds_zsu, uint32_t, int16_t,  cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsds_zsu, uint32_t, int32_t,  cpu_ldl_data_ra)
+static target_ulong off_zd_d(void *reg, intptr_t reg_ofs)
+{
+    return *(uint64_t *)(reg + reg_ofs);
+}
 
-DO_LD1_ZPZ_D(sve_ldbdu_zss, int32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhdu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsdu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_D(sve_ldddu_zss, int32_t, uint64_t, cpu_ldq_data_ra)
-DO_LD1_ZPZ_D(sve_ldbds_zss, int32_t, int8_t,   cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhds_zss, int32_t, int16_t,  cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsds_zss, int32_t, int32_t,  cpu_ldl_data_ra)
+static void sve_ld1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
+                       target_ulong base, uint32_t desc, uintptr_t ra,
+                       zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
+    unsigned scale = simd_data(desc);
+    ARMVectorReg scratch = { };
 
-DO_LD1_ZPZ_D(sve_ldbdu_zd, uint64_t, uint8_t,  cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhdu_zd, uint64_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsdu_zd, uint64_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_D(sve_ldddu_zd, uint64_t, uint64_t, cpu_ldq_data_ra)
-DO_LD1_ZPZ_D(sve_ldbds_zd, uint64_t, int8_t,   cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhds_zd, uint64_t, int16_t,  cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsds_zd, uint64_t, int32_t,  cpu_ldl_data_ra)
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                target_ulong off = off_fn(vm, i);
+                tlb_fn(env, &scratch, i, base + (off << scale), mmu_idx, ra);
+            }
+            i += 4, pg >>= 4;
+        } while (i & 15);
+    }
+    set_helper_retaddr(0);
+
+    /* Wait until all exceptions have been raised to write back.  */
+    memcpy(vd, &scratch, oprsz);
+}
+
+static void sve_ld1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
+                       target_ulong base, uint32_t desc, uintptr_t ra,
+                       zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    unsigned scale = simd_data(desc);
+    ARMVectorReg scratch = { };
+
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; i++) {
+        uint8_t pg = *(uint8_t *)(vg + H1(i));
+        if (pg & 1) {
+            target_ulong off = off_fn(vm, i * 8);
+            tlb_fn(env, &scratch, i * 8, base + (off << scale), mmu_idx, ra);
+        }
+    }
+    set_helper_retaddr(0);
+
+    /* Wait until all exceptions have been raised to write back.  */
+    memcpy(vd, &scratch, oprsz * 8);
+}
+
+#define DO_LD1_ZPZ_S(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_ld##MEM##_##OFS)    \
+    (CPUARMState *env, void *vd, void *vg, void *vm,         \
+     target_ulong base, uint32_t desc)                       \
+{                                                            \
+    sve_ld1_zs(env, vd, vg, vm, base, desc, GETPC(),         \
+              off_##OFS##_s, sve_ld1##MEM##_tlb);            \
+}
+
+#define DO_LD1_ZPZ_D(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_ld##MEM##_##OFS)    \
+    (CPUARMState *env, void *vd, void *vg, void *vm,         \
+     target_ulong base, uint32_t desc)                       \
+{                                                            \
+    sve_ld1_zd(env, vd, vg, vm, base, desc, GETPC(),         \
+               off_##OFS##_d, sve_ld1##MEM##_tlb);           \
+}
+
+DO_LD1_ZPZ_S(bsu, zsu)
+DO_LD1_ZPZ_S(bsu, zss)
+DO_LD1_ZPZ_D(bdu, zsu)
+DO_LD1_ZPZ_D(bdu, zss)
+DO_LD1_ZPZ_D(bdu, zd)
+
+DO_LD1_ZPZ_S(bss, zsu)
+DO_LD1_ZPZ_S(bss, zss)
+DO_LD1_ZPZ_D(bds, zsu)
+DO_LD1_ZPZ_D(bds, zss)
+DO_LD1_ZPZ_D(bds, zd)
+
+DO_LD1_ZPZ_S(hsu_le, zsu)
+DO_LD1_ZPZ_S(hsu_le, zss)
+DO_LD1_ZPZ_D(hdu_le, zsu)
+DO_LD1_ZPZ_D(hdu_le, zss)
+DO_LD1_ZPZ_D(hdu_le, zd)
+
+DO_LD1_ZPZ_S(hsu_be, zsu)
+DO_LD1_ZPZ_S(hsu_be, zss)
+DO_LD1_ZPZ_D(hdu_be, zsu)
+DO_LD1_ZPZ_D(hdu_be, zss)
+DO_LD1_ZPZ_D(hdu_be, zd)
+
+DO_LD1_ZPZ_S(hss_le, zsu)
+DO_LD1_ZPZ_S(hss_le, zss)
+DO_LD1_ZPZ_D(hds_le, zsu)
+DO_LD1_ZPZ_D(hds_le, zss)
+DO_LD1_ZPZ_D(hds_le, zd)
+
+DO_LD1_ZPZ_S(hss_be, zsu)
+DO_LD1_ZPZ_S(hss_be, zss)
+DO_LD1_ZPZ_D(hds_be, zsu)
+DO_LD1_ZPZ_D(hds_be, zss)
+DO_LD1_ZPZ_D(hds_be, zd)
+
+DO_LD1_ZPZ_S(ss_le, zsu)
+DO_LD1_ZPZ_S(ss_le, zss)
+DO_LD1_ZPZ_D(sdu_le, zsu)
+DO_LD1_ZPZ_D(sdu_le, zss)
+DO_LD1_ZPZ_D(sdu_le, zd)
+
+DO_LD1_ZPZ_S(ss_be, zsu)
+DO_LD1_ZPZ_S(ss_be, zss)
+DO_LD1_ZPZ_D(sdu_be, zsu)
+DO_LD1_ZPZ_D(sdu_be, zss)
+DO_LD1_ZPZ_D(sdu_be, zd)
+
+DO_LD1_ZPZ_D(sds_le, zsu)
+DO_LD1_ZPZ_D(sds_le, zss)
+DO_LD1_ZPZ_D(sds_le, zd)
+
+DO_LD1_ZPZ_D(sds_be, zsu)
+DO_LD1_ZPZ_D(sds_be, zss)
+DO_LD1_ZPZ_D(sds_be, zd)
+
+DO_LD1_ZPZ_D(dd_le, zsu)
+DO_LD1_ZPZ_D(dd_le, zss)
+DO_LD1_ZPZ_D(dd_le, zd)
+
+DO_LD1_ZPZ_D(dd_be, zsu)
+DO_LD1_ZPZ_D(dd_be, zss)
+DO_LD1_ZPZ_D(dd_be, zd)
+
+#undef DO_LD1_ZPZ_S
+#undef DO_LD1_ZPZ_D
 
 /* First fault loads with a vector index.  */
 
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index acb85731f8..d4d7e9d3ae 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5077,91 +5077,176 @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale,
     tcg_temp_free_i32(desc);
 }
 
-/* Indexed by [ff][xs][u][msz].  */
-static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][3] = {
-    { { { gen_helper_sve_ldbss_zsu,
-          gen_helper_sve_ldhss_zsu,
-          NULL, },
-        { gen_helper_sve_ldbsu_zsu,
-          gen_helper_sve_ldhsu_zsu,
-          gen_helper_sve_ldssu_zsu, } },
-      { { gen_helper_sve_ldbss_zss,
-          gen_helper_sve_ldhss_zss,
-          NULL, },
-        { gen_helper_sve_ldbsu_zss,
-          gen_helper_sve_ldhsu_zss,
-          gen_helper_sve_ldssu_zss, } } },
+/* Indexed by [be][ff][xs][u][msz].  */
+static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][2][3] = {
+    /* Little-endian */
+    { { { { gen_helper_sve_ldbss_zsu,
+            gen_helper_sve_ldhss_le_zsu,
+            NULL, },
+          { gen_helper_sve_ldbsu_zsu,
+            gen_helper_sve_ldhsu_le_zsu,
+            gen_helper_sve_ldss_le_zsu, } },
+        { { gen_helper_sve_ldbss_zss,
+            gen_helper_sve_ldhss_le_zss,
+            NULL, },
+          { gen_helper_sve_ldbsu_zss,
+            gen_helper_sve_ldhsu_le_zss,
+            gen_helper_sve_ldss_le_zss, } } },
 
-    { { { gen_helper_sve_ldffbss_zsu,
-          gen_helper_sve_ldffhss_zsu,
-          NULL, },
-        { gen_helper_sve_ldffbsu_zsu,
-          gen_helper_sve_ldffhsu_zsu,
-          gen_helper_sve_ldffssu_zsu, } },
-      { { gen_helper_sve_ldffbss_zss,
-          gen_helper_sve_ldffhss_zss,
-          NULL, },
-        { gen_helper_sve_ldffbsu_zss,
-          gen_helper_sve_ldffhsu_zss,
-          gen_helper_sve_ldffssu_zss, } } }
+      /* First-fault */
+      { { { gen_helper_sve_ldffbss_zsu,
+            gen_helper_sve_ldffhss_zsu,
+            NULL, },
+          { gen_helper_sve_ldffbsu_zsu,
+            gen_helper_sve_ldffhsu_zsu,
+            gen_helper_sve_ldffssu_zsu, } },
+        { { gen_helper_sve_ldffbss_zss,
+            gen_helper_sve_ldffhss_zss,
+            NULL, },
+          { gen_helper_sve_ldffbsu_zss,
+            gen_helper_sve_ldffhsu_zss,
+            gen_helper_sve_ldffssu_zss, } } } },
+
+    /* Big-endian */
+    { { { { gen_helper_sve_ldbss_zsu,
+            gen_helper_sve_ldhss_be_zsu,
+            NULL, },
+          { gen_helper_sve_ldbsu_zsu,
+            gen_helper_sve_ldhsu_be_zsu,
+            gen_helper_sve_ldss_be_zsu, } },
+        { { gen_helper_sve_ldbss_zss,
+            gen_helper_sve_ldhss_be_zss,
+            NULL, },
+          { gen_helper_sve_ldbsu_zss,
+            gen_helper_sve_ldhsu_be_zss,
+            gen_helper_sve_ldss_be_zss, } } },
+
+      /* First-fault */
+      { { { gen_helper_sve_ldffbss_zsu,
+            gen_helper_sve_ldffhss_zsu,
+            NULL, },
+          { gen_helper_sve_ldffbsu_zsu,
+            gen_helper_sve_ldffhsu_zsu,
+            gen_helper_sve_ldffssu_zsu, } },
+        { { gen_helper_sve_ldffbss_zss,
+            gen_helper_sve_ldffhss_zss,
+            NULL, },
+          { gen_helper_sve_ldffbsu_zss,
+            gen_helper_sve_ldffhsu_zss,
+            gen_helper_sve_ldffssu_zss, } } } },
 };
 
 /* Note that we overload xs=2 to indicate 64-bit offset.  */
-static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][3][2][4] = {
-    { { { gen_helper_sve_ldbds_zsu,
-          gen_helper_sve_ldhds_zsu,
-          gen_helper_sve_ldsds_zsu,
-          NULL, },
-        { gen_helper_sve_ldbdu_zsu,
-          gen_helper_sve_ldhdu_zsu,
-          gen_helper_sve_ldsdu_zsu,
-          gen_helper_sve_ldddu_zsu, } },
-      { { gen_helper_sve_ldbds_zss,
-          gen_helper_sve_ldhds_zss,
-          gen_helper_sve_ldsds_zss,
-          NULL, },
-        { gen_helper_sve_ldbdu_zss,
-          gen_helper_sve_ldhdu_zss,
-          gen_helper_sve_ldsdu_zss,
-          gen_helper_sve_ldddu_zss, } },
-      { { gen_helper_sve_ldbds_zd,
-          gen_helper_sve_ldhds_zd,
-          gen_helper_sve_ldsds_zd,
-          NULL, },
-        { gen_helper_sve_ldbdu_zd,
-          gen_helper_sve_ldhdu_zd,
-          gen_helper_sve_ldsdu_zd,
-          gen_helper_sve_ldddu_zd, } } },
+static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][2][3][2][4] = {
+    /* Little-endian */
+    { { { { gen_helper_sve_ldbds_zsu,
+            gen_helper_sve_ldhds_le_zsu,
+            gen_helper_sve_ldsds_le_zsu,
+            NULL, },
+          { gen_helper_sve_ldbdu_zsu,
+            gen_helper_sve_ldhdu_le_zsu,
+            gen_helper_sve_ldsdu_le_zsu,
+            gen_helper_sve_lddd_le_zsu, } },
+        { { gen_helper_sve_ldbds_zss,
+            gen_helper_sve_ldhds_le_zss,
+            gen_helper_sve_ldsds_le_zss,
+            NULL, },
+          { gen_helper_sve_ldbdu_zss,
+            gen_helper_sve_ldhdu_le_zss,
+            gen_helper_sve_ldsdu_le_zss,
+            gen_helper_sve_lddd_le_zss, } },
+        { { gen_helper_sve_ldbds_zd,
+            gen_helper_sve_ldhds_le_zd,
+            gen_helper_sve_ldsds_le_zd,
+            NULL, },
+          { gen_helper_sve_ldbdu_zd,
+            gen_helper_sve_ldhdu_le_zd,
+            gen_helper_sve_ldsdu_le_zd,
+            gen_helper_sve_lddd_le_zd, } } },
 
-    { { { gen_helper_sve_ldffbds_zsu,
-          gen_helper_sve_ldffhds_zsu,
-          gen_helper_sve_ldffsds_zsu,
-          NULL, },
-        { gen_helper_sve_ldffbdu_zsu,
-          gen_helper_sve_ldffhdu_zsu,
-          gen_helper_sve_ldffsdu_zsu,
-          gen_helper_sve_ldffddu_zsu, } },
-      { { gen_helper_sve_ldffbds_zss,
-          gen_helper_sve_ldffhds_zss,
-          gen_helper_sve_ldffsds_zss,
-          NULL, },
-        { gen_helper_sve_ldffbdu_zss,
-          gen_helper_sve_ldffhdu_zss,
-          gen_helper_sve_ldffsdu_zss,
-          gen_helper_sve_ldffddu_zss, } },
-      { { gen_helper_sve_ldffbds_zd,
-          gen_helper_sve_ldffhds_zd,
-          gen_helper_sve_ldffsds_zd,
-          NULL, },
-        { gen_helper_sve_ldffbdu_zd,
-          gen_helper_sve_ldffhdu_zd,
-          gen_helper_sve_ldffsdu_zd,
-          gen_helper_sve_ldffddu_zd, } } }
+      /* First-fault */
+      { { { gen_helper_sve_ldffbds_zsu,
+            gen_helper_sve_ldffhds_zsu,
+            gen_helper_sve_ldffsds_zsu,
+            NULL, },
+          { gen_helper_sve_ldffbdu_zsu,
+            gen_helper_sve_ldffhdu_zsu,
+            gen_helper_sve_ldffsdu_zsu,
+            gen_helper_sve_ldffddu_zsu, } },
+        { { gen_helper_sve_ldffbds_zss,
+            gen_helper_sve_ldffhds_zss,
+            gen_helper_sve_ldffsds_zss,
+            NULL, },
+          { gen_helper_sve_ldffbdu_zss,
+            gen_helper_sve_ldffhdu_zss,
+            gen_helper_sve_ldffsdu_zss,
+            gen_helper_sve_ldffddu_zss, } },
+        { { gen_helper_sve_ldffbds_zd,
+            gen_helper_sve_ldffhds_zd,
+            gen_helper_sve_ldffsds_zd,
+            NULL, },
+          { gen_helper_sve_ldffbdu_zd,
+            gen_helper_sve_ldffhdu_zd,
+            gen_helper_sve_ldffsdu_zd,
+            gen_helper_sve_ldffddu_zd, } } } },
+
+    /* Big-endian */
+    { { { { gen_helper_sve_ldbds_zsu,
+            gen_helper_sve_ldhds_be_zsu,
+            gen_helper_sve_ldsds_be_zsu,
+            NULL, },
+          { gen_helper_sve_ldbdu_zsu,
+            gen_helper_sve_ldhdu_be_zsu,
+            gen_helper_sve_ldsdu_be_zsu,
+            gen_helper_sve_lddd_be_zsu, } },
+        { { gen_helper_sve_ldbds_zss,
+            gen_helper_sve_ldhds_be_zss,
+            gen_helper_sve_ldsds_be_zss,
+            NULL, },
+          { gen_helper_sve_ldbdu_zss,
+            gen_helper_sve_ldhdu_be_zss,
+            gen_helper_sve_ldsdu_be_zss,
+            gen_helper_sve_lddd_be_zss, } },
+        { { gen_helper_sve_ldbds_zd,
+            gen_helper_sve_ldhds_be_zd,
+            gen_helper_sve_ldsds_be_zd,
+            NULL, },
+          { gen_helper_sve_ldbdu_zd,
+            gen_helper_sve_ldhdu_be_zd,
+            gen_helper_sve_ldsdu_be_zd,
+            gen_helper_sve_lddd_be_zd, } } },
+
+      /* First-fault */
+      { { { gen_helper_sve_ldffbds_zsu,
+            gen_helper_sve_ldffhds_zsu,
+            gen_helper_sve_ldffsds_zsu,
+            NULL, },
+          { gen_helper_sve_ldffbdu_zsu,
+            gen_helper_sve_ldffhdu_zsu,
+            gen_helper_sve_ldffsdu_zsu,
+            gen_helper_sve_ldffddu_zsu, } },
+        { { gen_helper_sve_ldffbds_zss,
+            gen_helper_sve_ldffhds_zss,
+            gen_helper_sve_ldffsds_zss,
+            NULL, },
+          { gen_helper_sve_ldffbdu_zss,
+            gen_helper_sve_ldffhdu_zss,
+            gen_helper_sve_ldffsdu_zss,
+            gen_helper_sve_ldffddu_zss, } },
+        { { gen_helper_sve_ldffbds_zd,
+            gen_helper_sve_ldffhds_zd,
+            gen_helper_sve_ldffsds_zd,
+            NULL, },
+          { gen_helper_sve_ldffbdu_zd,
+            gen_helper_sve_ldffhdu_zd,
+            gen_helper_sve_ldffsdu_zd,
+            gen_helper_sve_ldffddu_zd, } } } },
 };
 
 static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
 {
     gen_helper_gvec_mem_scatter *fn = NULL;
+    int be = s->be_data == MO_BE;
 
     if (!sve_access_check(s)) {
         return true;
@@ -5169,10 +5254,10 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
 
     switch (a->esz) {
     case MO_32:
-        fn = gather_load_fn32[a->ff][a->xs][a->u][a->msz];
+        fn = gather_load_fn32[be][a->ff][a->xs][a->u][a->msz];
         break;
     case MO_64:
-        fn = gather_load_fn64[a->ff][a->xs][a->u][a->msz];
+        fn = gather_load_fn64[be][a->ff][a->xs][a->u][a->msz];
         break;
     }
     assert(fn != NULL);
@@ -5185,6 +5270,7 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
 static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
 {
     gen_helper_gvec_mem_scatter *fn = NULL;
+    int be = s->be_data == MO_BE;
     TCGv_i64 imm;
 
     if (a->esz < a->msz || (a->esz == a->msz && !a->u)) {
@@ -5196,10 +5282,10 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
 
     switch (a->esz) {
     case MO_32:
-        fn = gather_load_fn32[a->ff][0][a->u][a->msz];
+        fn = gather_load_fn32[be][a->ff][0][a->u][a->msz];
         break;
     case MO_64:
-        fn = gather_load_fn64[a->ff][2][a->u][a->msz];
+        fn = gather_load_fn64[be][a->ff][2][a->u][a->msz];
         break;
     }
     assert(fn != NULL);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (16 preceding siblings ...)
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
@ 2018-08-09  4:22 ` Richard Henderson
  2018-08-23 16:09   ` Peter Maydell
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
                   ` (4 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

This fixes the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  52 ++++++++++----
 target/arm/sve_helper.c    | 139 ++++++++++++++++++++++++-------------
 target/arm/translate-sve.c |  74 +++++++++++++-------
 3 files changed, 177 insertions(+), 88 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 49d1c09e30..6b9b93af45 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1468,41 +1468,67 @@ DEF_HELPER_FLAGS_6(sve_ldffsds_zd, TCG_CALL_NO_WG,
 
 DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sths_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stss_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_stbs_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sths_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stss_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_stbd_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sthd_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stsd_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stdd_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_stsd_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_stbd_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sthd_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stsd_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stdd_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_stsd_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_stbd_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sthd_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_le_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stsd_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stdd_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_stsd_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_be_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 76d3f021e4..0a4756bff9 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5235,61 +5235,100 @@ DO_LDFF1_ZPZ_D(sve_ldffsds_zd, uint64_t, int32_t,  cpu_ldl_data_ra)
 
 /* Stores with a vector index.  */
 
-#define DO_ST1_ZPZ_S(NAME, TYPEI, FN)                                   \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
-                  target_ulong base, uint32_t desc)                     \
-{                                                                       \
-    intptr_t i, oprsz = simd_oprsz(desc);                               \
-    unsigned scale = simd_data(desc);                                   \
-    uintptr_t ra = GETPC();                                             \
-    for (i = 0; i < oprsz; ) {                                          \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
-        do {                                                            \
-            if (likely(pg & 1)) {                                       \
-                target_ulong off = *(TYPEI *)(vm + H1_4(i));            \
-                uint32_t d = *(uint32_t *)(vd + H1_4(i));               \
-                FN(env, base + (off << scale), d, ra);                  \
-            }                                                           \
-            i += sizeof(uint32_t), pg >>= sizeof(uint32_t);             \
-        } while (i & 15);                                               \
-    }                                                                   \
+static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
+                       target_ulong base, uint32_t desc, uintptr_t ra,
+                       zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc);
+    unsigned scale = simd_data(desc);
+
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; ) {
+        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+        do {
+            if (pg & 1) {
+                target_ulong off = off_fn(vm, i);
+                tlb_fn(env, vd, i, base + (off << scale), mmu_idx, ra);
+            }
+            i += 4, pg >>= 4;
+        } while (i & 15);
+    }
+    set_helper_retaddr(0);
 }
 
-#define DO_ST1_ZPZ_D(NAME, TYPEI, FN)                                   \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
-                  target_ulong base, uint32_t desc)                     \
-{                                                                       \
-    intptr_t i, oprsz = simd_oprsz(desc) / 8;                           \
-    unsigned scale = simd_data(desc);                                   \
-    uintptr_t ra = GETPC();                                             \
-    uint64_t *d = vd, *m = vm; uint8_t *pg = vg;                        \
-    for (i = 0; i < oprsz; i++) {                                       \
-        if (likely(pg[H1(i)] & 1)) {                                    \
-            target_ulong off = (target_ulong)(TYPEI)m[i] << scale;      \
-            FN(env, base + off, d[i], ra);                              \
-        }                                                               \
-    }                                                                   \
+static void sve_st1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
+                       target_ulong base, uint32_t desc, uintptr_t ra,
+                       zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t i, oprsz = simd_oprsz(desc) / 8;
+    unsigned scale = simd_data(desc);
+
+    set_helper_retaddr(ra);
+    for (i = 0; i < oprsz; i++) {
+        uint8_t pg = *(uint8_t *)(vg + H1(i));
+        if (pg & 1) {
+            target_ulong off = off_fn(vm, i * 8);
+            tlb_fn(env, vd, i * 8, base + (off << scale), mmu_idx, ra);
+        }
+    }
+    set_helper_retaddr(0);
 }
 
-DO_ST1_ZPZ_S(sve_stbs_zsu, uint32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_S(sve_sths_zsu, uint32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_S(sve_stss_zsu, uint32_t, cpu_stl_data_ra)
+#define DO_ST1_ZPZ_S(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_st##MEM##_##OFS)    \
+    (CPUARMState *env, void *vd, void *vg, void *vm,         \
+     target_ulong base, uint32_t desc)                       \
+{                                                            \
+    sve_st1_zs(env, vd, vg, vm, base, desc, GETPC(),         \
+              off_##OFS##_s, sve_st1##MEM##_tlb);            \
+}
 
-DO_ST1_ZPZ_S(sve_stbs_zss, int32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_S(sve_sths_zss, int32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_S(sve_stss_zss, int32_t, cpu_stl_data_ra)
+#define DO_ST1_ZPZ_D(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_st##MEM##_##OFS)    \
+    (CPUARMState *env, void *vd, void *vg, void *vm,         \
+     target_ulong base, uint32_t desc)                       \
+{                                                            \
+    sve_st1_zd(env, vd, vg, vm, base, desc, GETPC(),         \
+               off_##OFS##_d, sve_st1##MEM##_tlb);           \
+}
 
-DO_ST1_ZPZ_D(sve_stbd_zsu, uint32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_D(sve_sthd_zsu, uint32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_D(sve_stsd_zsu, uint32_t, cpu_stl_data_ra)
-DO_ST1_ZPZ_D(sve_stdd_zsu, uint32_t, cpu_stq_data_ra)
+DO_ST1_ZPZ_S(bs, zsu)
+DO_ST1_ZPZ_S(hs_le, zsu)
+DO_ST1_ZPZ_S(hs_be, zsu)
+DO_ST1_ZPZ_S(ss_le, zsu)
+DO_ST1_ZPZ_S(ss_be, zsu)
 
-DO_ST1_ZPZ_D(sve_stbd_zss, int32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_D(sve_sthd_zss, int32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_D(sve_stsd_zss, int32_t, cpu_stl_data_ra)
-DO_ST1_ZPZ_D(sve_stdd_zss, int32_t, cpu_stq_data_ra)
+DO_ST1_ZPZ_S(bs, zss)
+DO_ST1_ZPZ_S(hs_le, zss)
+DO_ST1_ZPZ_S(hs_be, zss)
+DO_ST1_ZPZ_S(ss_le, zss)
+DO_ST1_ZPZ_S(ss_be, zss)
 
-DO_ST1_ZPZ_D(sve_stbd_zd, uint64_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_D(sve_sthd_zd, uint64_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_D(sve_stsd_zd, uint64_t, cpu_stl_data_ra)
-DO_ST1_ZPZ_D(sve_stdd_zd, uint64_t, cpu_stq_data_ra)
+DO_ST1_ZPZ_D(bd, zsu)
+DO_ST1_ZPZ_D(hd_le, zsu)
+DO_ST1_ZPZ_D(hd_be, zsu)
+DO_ST1_ZPZ_D(sd_le, zsu)
+DO_ST1_ZPZ_D(sd_be, zsu)
+DO_ST1_ZPZ_D(dd_le, zsu)
+DO_ST1_ZPZ_D(dd_be, zsu)
+
+DO_ST1_ZPZ_D(bd, zss)
+DO_ST1_ZPZ_D(hd_le, zss)
+DO_ST1_ZPZ_D(hd_be, zss)
+DO_ST1_ZPZ_D(sd_le, zss)
+DO_ST1_ZPZ_D(sd_be, zss)
+DO_ST1_ZPZ_D(dd_le, zss)
+DO_ST1_ZPZ_D(dd_be, zss)
+
+DO_ST1_ZPZ_D(bd, zd)
+DO_ST1_ZPZ_D(hd_le, zd)
+DO_ST1_ZPZ_D(hd_be, zd)
+DO_ST1_ZPZ_D(sd_le, zd)
+DO_ST1_ZPZ_D(sd_be, zd)
+DO_ST1_ZPZ_D(dd_le, zd)
+DO_ST1_ZPZ_D(dd_be, zd)
+
+#undef DO_ST1_ZPZ_S
+#undef DO_ST1_ZPZ_D
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d4d7e9d3ae..fdd9b9b3a0 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5299,35 +5299,58 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
     return true;
 }
 
-/* Indexed by [xs][msz].  */
-static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][3] = {
-    { gen_helper_sve_stbs_zsu,
-      gen_helper_sve_sths_zsu,
-      gen_helper_sve_stss_zsu, },
-    { gen_helper_sve_stbs_zss,
-      gen_helper_sve_sths_zss,
-      gen_helper_sve_stss_zss, },
+/* Indexed by [be][xs][msz].  */
+static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][2][3] = {
+    /* Little-endian */
+    { { gen_helper_sve_stbs_zsu,
+        gen_helper_sve_sths_le_zsu,
+        gen_helper_sve_stss_le_zsu, },
+      { gen_helper_sve_stbs_zss,
+        gen_helper_sve_sths_le_zss,
+        gen_helper_sve_stss_le_zss, } },
+    /* Big-endian */
+    { { gen_helper_sve_stbs_zsu,
+        gen_helper_sve_sths_be_zsu,
+        gen_helper_sve_stss_be_zsu, },
+      { gen_helper_sve_stbs_zss,
+        gen_helper_sve_sths_be_zss,
+        gen_helper_sve_stss_be_zss, } },
 };
 
 /* Note that we overload xs=2 to indicate 64-bit offset.  */
-static gen_helper_gvec_mem_scatter * const scatter_store_fn64[3][4] = {
-    { gen_helper_sve_stbd_zsu,
-      gen_helper_sve_sthd_zsu,
-      gen_helper_sve_stsd_zsu,
-      gen_helper_sve_stdd_zsu, },
-    { gen_helper_sve_stbd_zss,
-      gen_helper_sve_sthd_zss,
-      gen_helper_sve_stsd_zss,
-      gen_helper_sve_stdd_zss, },
-    { gen_helper_sve_stbd_zd,
-      gen_helper_sve_sthd_zd,
-      gen_helper_sve_stsd_zd,
-      gen_helper_sve_stdd_zd, },
+static gen_helper_gvec_mem_scatter * const scatter_store_fn64[2][3][4] = {
+    /* Little-endian */
+    { { gen_helper_sve_stbd_zsu,
+        gen_helper_sve_sthd_le_zsu,
+        gen_helper_sve_stsd_le_zsu,
+        gen_helper_sve_stdd_le_zsu, },
+      { gen_helper_sve_stbd_zss,
+        gen_helper_sve_sthd_le_zss,
+        gen_helper_sve_stsd_le_zss,
+        gen_helper_sve_stdd_le_zss, },
+      { gen_helper_sve_stbd_zd,
+        gen_helper_sve_sthd_le_zd,
+        gen_helper_sve_stsd_le_zd,
+        gen_helper_sve_stdd_le_zd, } },
+    /* Big-endian */
+    { { gen_helper_sve_stbd_zsu,
+        gen_helper_sve_sthd_be_zsu,
+        gen_helper_sve_stsd_be_zsu,
+        gen_helper_sve_stdd_be_zsu, },
+      { gen_helper_sve_stbd_zss,
+        gen_helper_sve_sthd_be_zss,
+        gen_helper_sve_stsd_be_zss,
+        gen_helper_sve_stdd_be_zss, },
+      { gen_helper_sve_stbd_zd,
+        gen_helper_sve_sthd_be_zd,
+        gen_helper_sve_stsd_be_zd,
+        gen_helper_sve_stdd_be_zd, } },
 };
 
 static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
 {
     gen_helper_gvec_mem_scatter *fn;
+    int be = s->be_data == MO_BE;
 
     if (a->esz < a->msz || (a->msz == 0 && a->scale)) {
         return false;
@@ -5337,10 +5360,10 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
     }
     switch (a->esz) {
     case MO_32:
-        fn = scatter_store_fn32[a->xs][a->msz];
+        fn = scatter_store_fn32[be][a->xs][a->msz];
         break;
     case MO_64:
-        fn = scatter_store_fn64[a->xs][a->msz];
+        fn = scatter_store_fn64[be][a->xs][a->msz];
         break;
     default:
         g_assert_not_reached();
@@ -5353,6 +5376,7 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
 static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
 {
     gen_helper_gvec_mem_scatter *fn = NULL;
+    int be = s->be_data == MO_BE;
     TCGv_i64 imm;
 
     if (a->esz < a->msz) {
@@ -5364,10 +5388,10 @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
 
     switch (a->esz) {
     case MO_32:
-        fn = scatter_store_fn32[0][a->msz];
+        fn = scatter_store_fn32[be][0][a->msz];
         break;
     case MO_64:
-        fn = scatter_store_fn64[2][a->msz];
+        fn = scatter_store_fn64[be][2][a->msz];
         break;
     }
     assert(fn != NULL);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (17 preceding siblings ...)
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
@ 2018-08-09  4:22 ` Richard Henderson
  2018-08-23 16:10   ` Peter Maydell
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
                   ` (3 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

This implements the feature for softmmu, and moves the
main loop out of a macro and into a function.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/helper-sve.h    |  84 ++++++++---
 target/arm/sve_helper.c    | 290 +++++++++++++++++++++++++++----------
 target/arm/translate-sve.c |  84 +++++------
 3 files changed, 321 insertions(+), 137 deletions(-)

diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 6b9b93af45..9e79182ab4 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1401,69 +1401,111 @@ DEF_HELPER_FLAGS_6(sve_ldsds_be_zd, TCG_CALL_NO_WG,
 
 DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhsu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffssu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldffbss_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhss_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhss_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffhss_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldffbsu_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhsu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffssu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldffbss_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhss_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhss_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffhss_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldffbdu_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffddu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffsdu_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsdu_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldffbds_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_le_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_be_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_le_zsu, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_be_zsu, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldffbdu_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffddu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffsdu_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsdu_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldffbds_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_le_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_be_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_le_zss, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_be_zss, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_ldffbdu_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_le_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffddu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffsdu_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsdu_be_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 DEF_HELPER_FLAGS_6(sve_ldffbds_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_le_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_be_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_le_zd, TCG_CALL_NO_WG,
+                   void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_be_zd, TCG_CALL_NO_WG,
                    void, env, ptr, ptr, ptr, tl, i32)
 
 DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 0a4756bff9..6728862326 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5147,91 +5147,233 @@ DO_LD1_ZPZ_D(dd_be, zd)
 
 /* First fault loads with a vector index.  */
 
-#ifdef CONFIG_USER_ONLY
+/* Load one element into VD+REG_OFF from (ENV,VADDR) without faulting.
+ * The controlling predicate is known to be true.  Return true if the
+ * load was successful.
+ */
+typedef bool sve_ld1_nf_fn(CPUARMState *env, void *vd, intptr_t reg_off,
+                           target_ulong vaddr, int mmu_idx);
 
-#define DO_LDFF1_ZPZ(NAME, TYPEE, TYPEI, TYPEM, FN, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
-                  target_ulong base, uint32_t desc)                     \
-{                                                                       \
-    intptr_t i, oprsz = simd_oprsz(desc);                               \
-    unsigned scale = simd_data(desc);                                   \
-    uintptr_t ra = GETPC();                                             \
-    bool first = true;                                                  \
-    mmap_lock();                                                        \
-    for (i = 0; i < oprsz; ) {                                          \
-        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
-        do {                                                            \
-            TYPEM m = 0;                                                \
-            if (pg & 1) {                                               \
-                target_ulong off = *(TYPEI *)(vm + H(i));               \
-                target_ulong addr = base + (off << scale);              \
-                if (!first &&                                           \
-                    page_check_range(addr, sizeof(TYPEM), PAGE_READ)) { \
-                    record_fault(env, i, oprsz);                        \
-                    goto exit;                                          \
-                }                                                       \
-                m = FN(env, addr, ra);                                  \
-                first = false;                                          \
-            }                                                           \
-            *(TYPEE *)(vd + H(i)) = m;                                  \
-            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);                   \
-        } while (i & 15);                                               \
-    }                                                                   \
- exit:                                                                  \
-    mmap_unlock();                                                      \
+#ifdef CONFIG_SOFTMMU
+#define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
+static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
+                            target_ulong addr, int mmu_idx)                 \
+{                                                                           \
+    target_ulong next_page = -(addr | TARGET_PAGE_MASK);                    \
+    if (likely(next_page - addr >= sizeof(TYPEM))) {                        \
+        void *host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);  \
+        if (likely(host)) {                                                 \
+            TYPEM val = HOST(host);                                         \
+            *(TYPEE *)(vd + H(reg_off)) = val;                              \
+            return true;                                                    \
+        }                                                                   \
+    }                                                                       \
+    return false;                                                           \
 }
-
 #else
-
-#define DO_LDFF1_ZPZ(NAME, TYPEE, TYPEI, TYPEM, FN, H)                  \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
-                  target_ulong base, uint32_t desc)                     \
-{                                                                       \
-    g_assert_not_reached();                                             \
+#define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
+static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
+                            target_ulong addr, int mmu_idx)                 \
+{                                                                           \
+    if (likely(page_check_range(addr, sizeof(TYPEM), PAGE_READ))) {         \
+        TYPEM val = HOST(g2h(addr));                                        \
+        *(TYPEE *)(vd + H(reg_off)) = val;                                  \
+        return true;                                                        \
+    }                                                                       \
+    return false;                                                           \
 }
-
 #endif
 
-#define DO_LDFF1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \
-    DO_LDFF1_ZPZ(NAME, uint32_t, TYPEI, TYPEM, FN, H1_4)
-#define DO_LDFF1_ZPZ_D(NAME, TYPEI, TYPEM, FN) \
-    DO_LDFF1_ZPZ(NAME, uint64_t, TYPEI, TYPEM, FN, )
+DO_LD_NF(bsu, H1_4, uint32_t, uint8_t, ldub_p)
+DO_LD_NF(bss, H1_4, uint32_t,  int8_t, ldsb_p)
+DO_LD_NF(bdu,     , uint64_t, uint8_t, ldub_p)
+DO_LD_NF(bds,     , uint64_t,  int8_t, ldsb_p)
 
-DO_LDFF1_ZPZ_S(sve_ldffbsu_zsu, uint32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhsu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffssu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffbss_zsu, uint32_t, int8_t,   cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhss_zsu, uint32_t, int16_t,  cpu_lduw_data_ra)
+DO_LD_NF(hsu_le, H1_4, uint32_t, uint16_t, lduw_le_p)
+DO_LD_NF(hss_le, H1_4, uint32_t,  int16_t, ldsw_le_p)
+DO_LD_NF(hsu_be, H1_4, uint32_t, uint16_t, lduw_be_p)
+DO_LD_NF(hss_be, H1_4, uint32_t,  int16_t, ldsw_be_p)
+DO_LD_NF(hdu_le,     , uint64_t, uint16_t, lduw_le_p)
+DO_LD_NF(hds_le,     , uint64_t,  int16_t, ldsw_le_p)
+DO_LD_NF(hdu_be,     , uint64_t, uint16_t, lduw_be_p)
+DO_LD_NF(hds_be,     , uint64_t,  int16_t, ldsw_be_p)
 
-DO_LDFF1_ZPZ_S(sve_ldffbsu_zss, int32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhsu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffssu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffbss_zss, int32_t, int8_t,   cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhss_zss, int32_t, int16_t,  cpu_lduw_data_ra)
+DO_LD_NF(ss_le,  H1_4, uint32_t, uint32_t, ldl_le_p)
+DO_LD_NF(ss_be,  H1_4, uint32_t, uint32_t, ldl_be_p)
+DO_LD_NF(sdu_le,     , uint64_t, uint32_t, ldl_le_p)
+DO_LD_NF(sds_le,     , uint64_t,  int32_t, ldl_le_p)
+DO_LD_NF(sdu_be,     , uint64_t, uint32_t, ldl_be_p)
+DO_LD_NF(sds_be,     , uint64_t,  int32_t, ldl_be_p)
 
-DO_LDFF1_ZPZ_D(sve_ldffbdu_zsu, uint32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhdu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsdu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffddu_zsu, uint32_t, uint64_t, cpu_ldq_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffbds_zsu, uint32_t, int8_t,   cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhds_zsu, uint32_t, int16_t,  cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsds_zsu, uint32_t, int32_t,  cpu_ldl_data_ra)
+DO_LD_NF(dd_le,      , uint64_t, uint64_t, ldq_le_p)
+DO_LD_NF(dd_be,      , uint64_t, uint64_t, ldq_be_p)
 
-DO_LDFF1_ZPZ_D(sve_ldffbdu_zss, int32_t, uint8_t,  cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhdu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsdu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffddu_zss, int32_t, uint64_t, cpu_ldq_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffbds_zss, int32_t, int8_t,   cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhds_zss, int32_t, int16_t,  cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsds_zss, int32_t, int32_t,  cpu_ldl_data_ra)
+/*
+ * Common helper for all gather first-faulting loads.
+ */
+static inline void sve_ldff1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
+                                target_ulong base, uint32_t desc, uintptr_t ra,
+                                zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
+                                sve_ld1_nf_fn *nonfault_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t reg_off, reg_max = simd_oprsz(desc);
+    unsigned scale = simd_data(desc);
+    target_ulong addr;
 
-DO_LDFF1_ZPZ_D(sve_ldffbdu_zd, uint64_t, uint8_t,  cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhdu_zd, uint64_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsdu_zd, uint64_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffddu_zd, uint64_t, uint64_t, cpu_ldq_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffbds_zd, uint64_t, int8_t,   cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhds_zd, uint64_t, int16_t,  cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsds_zd, uint64_t, int32_t,  cpu_ldl_data_ra)
+    /* Skip to the first true predicate.  */
+    reg_off = find_next_active(vg, 0, reg_max, MO_32);
+    if (likely(reg_off < reg_max)) {
+        /* Perform one normal read, which will fault or not.  */
+        set_helper_retaddr(ra);
+        addr = off_fn(vm, reg_off);
+        addr = base + (addr << scale);
+        tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+
+        /* The rest of the reads will be non-faulting.  */
+        set_helper_retaddr(0);
+    }
+
+    /* After any fault, zero the leading predicated false elements.  */
+    swap_memzero(vd, reg_off);
+
+    while (likely((reg_off += 4) < reg_max)) {
+        uint64_t pg = *(uint64_t *)(vg + (reg_off >> 6) * 8);
+        if (likely((pg >> (reg_off & 63)) & 1)) {
+            addr = off_fn(vm, reg_off);
+            addr = base + (addr << scale);
+            if (!nonfault_fn(env, vd, reg_off, addr, mmu_idx)) {
+                record_fault(env, reg_off, reg_max);
+                break;
+            }
+        } else {
+            *(uint32_t *)(vd + H1_4(reg_off)) = 0;
+        }
+    }
+}
+
+static inline void sve_ldff1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
+                                target_ulong base, uint32_t desc, uintptr_t ra,
+                                zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
+                                sve_ld1_nf_fn *nonfault_fn)
+{
+    const int mmu_idx = cpu_mmu_index(env, false);
+    intptr_t reg_off, reg_max = simd_oprsz(desc);
+    unsigned scale = simd_data(desc);
+    target_ulong addr;
+
+    /* Skip to the first true predicate.  */
+    reg_off = find_next_active(vg, 0, reg_max, MO_64);
+    if (likely(reg_off < reg_max)) {
+        /* Perform one normal read, which will fault or not.  */
+        set_helper_retaddr(ra);
+        addr = off_fn(vm, reg_off);
+        addr = base + (addr << scale);
+        tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+
+        /* The rest of the reads will be non-faulting.  */
+        set_helper_retaddr(0);
+    }
+
+    /* After any fault, zero the leading predicated false elements.  */
+    swap_memzero(vd, reg_off);
+
+    while (likely((reg_off += 8) < reg_max)) {
+        uint8_t pg = *(uint8_t *)(vg + H1(reg_off >> 3));
+        if (likely(pg & 1)) {
+            addr = off_fn(vm, reg_off);
+            addr = base + (addr << scale);
+            if (!nonfault_fn(env, vd, reg_off, addr, mmu_idx)) {
+                record_fault(env, reg_off, reg_max);
+                break;
+            }
+        } else {
+            *(uint64_t *)(vd + reg_off) = 0;
+        }
+    }
+}
+
+#define DO_LDFF1_ZPZ_S(MEM, OFS) \
+void HELPER(sve_ldff##MEM##_##OFS)                                      \
+    (CPUARMState *env, void *vd, void *vg, void *vm,                    \
+     target_ulong base, uint32_t desc)                                  \
+{                                                                       \
+    sve_ldff1_zs(env, vd, vg, vm, base, desc, GETPC(),                  \
+                 off_##OFS##_s, sve_ld1##MEM##_tlb, sve_ld##MEM##_nf);  \
+}
+
+#define DO_LDFF1_ZPZ_D(MEM, OFS) \
+void HELPER(sve_ldff##MEM##_##OFS)                                      \
+    (CPUARMState *env, void *vd, void *vg, void *vm,                    \
+     target_ulong base, uint32_t desc)                                  \
+{                                                                       \
+    sve_ldff1_zd(env, vd, vg, vm, base, desc, GETPC(),                  \
+                 off_##OFS##_d, sve_ld1##MEM##_tlb, sve_ld##MEM##_nf);  \
+}
+
+DO_LDFF1_ZPZ_S(bsu, zsu)
+DO_LDFF1_ZPZ_S(bsu, zss)
+DO_LDFF1_ZPZ_D(bdu, zsu)
+DO_LDFF1_ZPZ_D(bdu, zss)
+DO_LDFF1_ZPZ_D(bdu, zd)
+
+DO_LDFF1_ZPZ_S(bss, zsu)
+DO_LDFF1_ZPZ_S(bss, zss)
+DO_LDFF1_ZPZ_D(bds, zsu)
+DO_LDFF1_ZPZ_D(bds, zss)
+DO_LDFF1_ZPZ_D(bds, zd)
+
+DO_LDFF1_ZPZ_S(hsu_le, zsu)
+DO_LDFF1_ZPZ_S(hsu_le, zss)
+DO_LDFF1_ZPZ_D(hdu_le, zsu)
+DO_LDFF1_ZPZ_D(hdu_le, zss)
+DO_LDFF1_ZPZ_D(hdu_le, zd)
+
+DO_LDFF1_ZPZ_S(hsu_be, zsu)
+DO_LDFF1_ZPZ_S(hsu_be, zss)
+DO_LDFF1_ZPZ_D(hdu_be, zsu)
+DO_LDFF1_ZPZ_D(hdu_be, zss)
+DO_LDFF1_ZPZ_D(hdu_be, zd)
+
+DO_LDFF1_ZPZ_S(hss_le, zsu)
+DO_LDFF1_ZPZ_S(hss_le, zss)
+DO_LDFF1_ZPZ_D(hds_le, zsu)
+DO_LDFF1_ZPZ_D(hds_le, zss)
+DO_LDFF1_ZPZ_D(hds_le, zd)
+
+DO_LDFF1_ZPZ_S(hss_be, zsu)
+DO_LDFF1_ZPZ_S(hss_be, zss)
+DO_LDFF1_ZPZ_D(hds_be, zsu)
+DO_LDFF1_ZPZ_D(hds_be, zss)
+DO_LDFF1_ZPZ_D(hds_be, zd)
+
+DO_LDFF1_ZPZ_S(ss_le,  zsu)
+DO_LDFF1_ZPZ_S(ss_le,  zss)
+DO_LDFF1_ZPZ_D(sdu_le, zsu)
+DO_LDFF1_ZPZ_D(sdu_le, zss)
+DO_LDFF1_ZPZ_D(sdu_le, zd)
+
+DO_LDFF1_ZPZ_S(ss_be,  zsu)
+DO_LDFF1_ZPZ_S(ss_be,  zss)
+DO_LDFF1_ZPZ_D(sdu_be, zsu)
+DO_LDFF1_ZPZ_D(sdu_be, zss)
+DO_LDFF1_ZPZ_D(sdu_be, zd)
+
+DO_LDFF1_ZPZ_D(sds_le, zsu)
+DO_LDFF1_ZPZ_D(sds_le, zss)
+DO_LDFF1_ZPZ_D(sds_le, zd)
+
+DO_LDFF1_ZPZ_D(sds_be, zsu)
+DO_LDFF1_ZPZ_D(sds_be, zss)
+DO_LDFF1_ZPZ_D(sds_be, zd)
+
+DO_LDFF1_ZPZ_D(dd_le, zsu)
+DO_LDFF1_ZPZ_D(dd_le, zss)
+DO_LDFF1_ZPZ_D(dd_le, zd)
+
+DO_LDFF1_ZPZ_D(dd_be, zsu)
+DO_LDFF1_ZPZ_D(dd_be, zss)
+DO_LDFF1_ZPZ_D(dd_be, zd)
 
 /* Stores with a vector index.  */
 
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index fdd9b9b3a0..20492e9b8b 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5095,17 +5095,17 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][2][3] = {
 
       /* First-fault */
       { { { gen_helper_sve_ldffbss_zsu,
-            gen_helper_sve_ldffhss_zsu,
+            gen_helper_sve_ldffhss_le_zsu,
             NULL, },
           { gen_helper_sve_ldffbsu_zsu,
-            gen_helper_sve_ldffhsu_zsu,
-            gen_helper_sve_ldffssu_zsu, } },
+            gen_helper_sve_ldffhsu_le_zsu,
+            gen_helper_sve_ldffss_le_zsu, } },
         { { gen_helper_sve_ldffbss_zss,
-            gen_helper_sve_ldffhss_zss,
+            gen_helper_sve_ldffhss_le_zss,
             NULL, },
           { gen_helper_sve_ldffbsu_zss,
-            gen_helper_sve_ldffhsu_zss,
-            gen_helper_sve_ldffssu_zss, } } } },
+            gen_helper_sve_ldffhsu_le_zss,
+            gen_helper_sve_ldffss_le_zss, } } } },
 
     /* Big-endian */
     { { { { gen_helper_sve_ldbss_zsu,
@@ -5123,17 +5123,17 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][2][3] = {
 
       /* First-fault */
       { { { gen_helper_sve_ldffbss_zsu,
-            gen_helper_sve_ldffhss_zsu,
+            gen_helper_sve_ldffhss_be_zsu,
             NULL, },
           { gen_helper_sve_ldffbsu_zsu,
-            gen_helper_sve_ldffhsu_zsu,
-            gen_helper_sve_ldffssu_zsu, } },
+            gen_helper_sve_ldffhsu_be_zsu,
+            gen_helper_sve_ldffss_be_zsu, } },
         { { gen_helper_sve_ldffbss_zss,
-            gen_helper_sve_ldffhss_zss,
+            gen_helper_sve_ldffhss_be_zss,
             NULL, },
           { gen_helper_sve_ldffbsu_zss,
-            gen_helper_sve_ldffhsu_zss,
-            gen_helper_sve_ldffssu_zss, } } } },
+            gen_helper_sve_ldffhsu_be_zss,
+            gen_helper_sve_ldffss_be_zss, } } } },
 };
 
 /* Note that we overload xs=2 to indicate 64-bit offset.  */
@@ -5166,29 +5166,29 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][2][3][2][4] = {
 
       /* First-fault */
       { { { gen_helper_sve_ldffbds_zsu,
-            gen_helper_sve_ldffhds_zsu,
-            gen_helper_sve_ldffsds_zsu,
+            gen_helper_sve_ldffhds_le_zsu,
+            gen_helper_sve_ldffsds_le_zsu,
             NULL, },
           { gen_helper_sve_ldffbdu_zsu,
-            gen_helper_sve_ldffhdu_zsu,
-            gen_helper_sve_ldffsdu_zsu,
-            gen_helper_sve_ldffddu_zsu, } },
+            gen_helper_sve_ldffhdu_le_zsu,
+            gen_helper_sve_ldffsdu_le_zsu,
+            gen_helper_sve_ldffdd_le_zsu, } },
         { { gen_helper_sve_ldffbds_zss,
-            gen_helper_sve_ldffhds_zss,
-            gen_helper_sve_ldffsds_zss,
+            gen_helper_sve_ldffhds_le_zss,
+            gen_helper_sve_ldffsds_le_zss,
             NULL, },
           { gen_helper_sve_ldffbdu_zss,
-            gen_helper_sve_ldffhdu_zss,
-            gen_helper_sve_ldffsdu_zss,
-            gen_helper_sve_ldffddu_zss, } },
+            gen_helper_sve_ldffhdu_le_zss,
+            gen_helper_sve_ldffsdu_le_zss,
+            gen_helper_sve_ldffdd_le_zss, } },
         { { gen_helper_sve_ldffbds_zd,
-            gen_helper_sve_ldffhds_zd,
-            gen_helper_sve_ldffsds_zd,
+            gen_helper_sve_ldffhds_le_zd,
+            gen_helper_sve_ldffsds_le_zd,
             NULL, },
           { gen_helper_sve_ldffbdu_zd,
-            gen_helper_sve_ldffhdu_zd,
-            gen_helper_sve_ldffsdu_zd,
-            gen_helper_sve_ldffddu_zd, } } } },
+            gen_helper_sve_ldffhdu_le_zd,
+            gen_helper_sve_ldffsdu_le_zd,
+            gen_helper_sve_ldffdd_le_zd, } } } },
 
     /* Big-endian */
     { { { { gen_helper_sve_ldbds_zsu,
@@ -5218,29 +5218,29 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][2][3][2][4] = {
 
       /* First-fault */
       { { { gen_helper_sve_ldffbds_zsu,
-            gen_helper_sve_ldffhds_zsu,
-            gen_helper_sve_ldffsds_zsu,
+            gen_helper_sve_ldffhds_be_zsu,
+            gen_helper_sve_ldffsds_be_zsu,
             NULL, },
           { gen_helper_sve_ldffbdu_zsu,
-            gen_helper_sve_ldffhdu_zsu,
-            gen_helper_sve_ldffsdu_zsu,
-            gen_helper_sve_ldffddu_zsu, } },
+            gen_helper_sve_ldffhdu_be_zsu,
+            gen_helper_sve_ldffsdu_be_zsu,
+            gen_helper_sve_ldffdd_be_zsu, } },
         { { gen_helper_sve_ldffbds_zss,
-            gen_helper_sve_ldffhds_zss,
-            gen_helper_sve_ldffsds_zss,
+            gen_helper_sve_ldffhds_be_zss,
+            gen_helper_sve_ldffsds_be_zss,
             NULL, },
           { gen_helper_sve_ldffbdu_zss,
-            gen_helper_sve_ldffhdu_zss,
-            gen_helper_sve_ldffsdu_zss,
-            gen_helper_sve_ldffddu_zss, } },
+            gen_helper_sve_ldffhdu_be_zss,
+            gen_helper_sve_ldffsdu_be_zss,
+            gen_helper_sve_ldffdd_be_zss, } },
         { { gen_helper_sve_ldffbds_zd,
-            gen_helper_sve_ldffhds_zd,
-            gen_helper_sve_ldffsds_zd,
+            gen_helper_sve_ldffhds_be_zd,
+            gen_helper_sve_ldffsds_be_zd,
             NULL, },
           { gen_helper_sve_ldffbdu_zd,
-            gen_helper_sve_ldffhdu_zd,
-            gen_helper_sve_ldffsdu_zd,
-            gen_helper_sve_ldffddu_zd, } } } },
+            gen_helper_sve_ldffhdu_be_zd,
+            gen_helper_sve_ldffsdu_be_zd,
+            gen_helper_sve_ldffdd_be_zd, } } } },
 };
 
 static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (18 preceding siblings ...)
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
@ 2018-08-09  4:22 ` Richard Henderson
  2018-08-23 16:23   ` Peter Maydell
  2018-08-09  5:48 ` [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Laurent Desnogues
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09  4:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee

There is quite a lot of code required to compute cpu_mem_index,
or even put together the full TCGMemOpIdx.  This can easily be
done at translation time.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/internals.h     |   5 ++
 target/arm/sve_helper.c    | 138 +++++++++++++++++++------------------
 target/arm/translate-sve.c |  67 +++++++++++-------
 3 files changed, 121 insertions(+), 89 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index dc9357766c..24c0444c8d 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -796,4 +796,9 @@ static inline uint32_t arm_debug_exception_fsr(CPUARMState *env)
     }
 }
 
+/* Note make_memop_idx reserves 4 bits for mmu_idx, and MO_BSWAP is bit 3.
+ * Thus a TCGMemOpIdx, without any MO_ALIGN bits, fits in 8 bits.
+ */
+#define MEMOPIDX_SHIFT  8
+
 #endif
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6728862326..5bae600d17 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -19,6 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "cpu.h"
+#include "internals.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
 #include "exec/helper-proto.h"
@@ -3986,7 +3987,7 @@ typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
  * The controlling predicate is known to be true.
  */
 typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
-                            target_ulong vaddr, int mmu_idx, uintptr_t ra);
+                            target_ulong vaddr, TCGMemOpIdx oi, uintptr_t ra);
 typedef sve_ld1_tlb_fn sve_st1_tlb_fn;
 
 /*
@@ -4013,16 +4014,15 @@ static intptr_t sve_##NAME##_host(void *vd, void *vg, void *host,           \
 #ifdef CONFIG_SOFTMMU
 #define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
 static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+                             target_ulong addr, TCGMemOpIdx oi, uintptr_t ra)  \
 {                                                                           \
-    TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
     TYPEM val = TLB(env, addr, oi, ra);                                     \
     *(TYPEE *)(vd + H(reg_off)) = val;                                      \
 }
 #else
 #define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB)                  \
 static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+                             target_ulong addr, TCGMemOpIdx oi, uintptr_t ra)  \
 {                                                                           \
     TYPEM val = HOST(g2h(addr));                                            \
     *(TYPEE *)(vd + H(reg_off)) = val;                                      \
@@ -4287,11 +4287,13 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
                       sve_ld1_host_fn *host_fn,
                       sve_ld1_tlb_fn *tlb_fn)
 {
-    void *vd = &env->vfp.zregs[simd_data(desc)];
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int mmu_idx = get_mmuidx(oi);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+    void *vd = &env->vfp.zregs[rd];
     const int diffsz = esz - msz;
     const intptr_t reg_max = simd_oprsz(desc);
     const intptr_t mem_max = reg_max >> diffsz;
-    const int mmu_idx = cpu_mmu_index(env, false);
     ARMVectorReg scratch;
     void *host, *result;
     intptr_t split;
@@ -4345,7 +4347,7 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
          * on I/O memory, it may succeed but not bring in the TLB entry.
          * But even then we have still made forward progress.
          */
-        tlb_fn(env, result, reg_off, addr + mem_off, mmu_idx, retaddr);
+        tlb_fn(env, result, reg_off, addr + mem_off, oi, retaddr);
         reg_off += 1 << esz;
     }
 #endif
@@ -4406,9 +4408,9 @@ static void sve_ld2_r(CPUARMState *env, void *vg, target_ulong addr,
                       uint32_t desc, int size, uintptr_t ra,
                       sve_ld1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned rd = simd_data(desc);
     ARMVectorReg scratch[2] = { };
 
     set_helper_retaddr(ra);
@@ -4416,8 +4418,8 @@ static void sve_ld2_r(CPUARMState *env, void *vg, target_ulong addr,
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
         do {
             if (pg & 1) {
-                tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
-                tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+                tlb_fn(env, &scratch[0], i, addr, oi, ra);
+                tlb_fn(env, &scratch[1], i, addr + size, oi, ra);
             }
             i += size, pg >>= size;
             addr += 2 * size;
@@ -4434,9 +4436,9 @@ static void sve_ld3_r(CPUARMState *env, void *vg, target_ulong addr,
                       uint32_t desc, int size, uintptr_t ra,
                       sve_ld1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned rd = simd_data(desc);
     ARMVectorReg scratch[3] = { };
 
     set_helper_retaddr(ra);
@@ -4444,9 +4446,9 @@ static void sve_ld3_r(CPUARMState *env, void *vg, target_ulong addr,
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
         do {
             if (pg & 1) {
-                tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
-                tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
-                tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
+                tlb_fn(env, &scratch[0], i, addr, oi, ra);
+                tlb_fn(env, &scratch[1], i, addr + size, oi, ra);
+                tlb_fn(env, &scratch[2], i, addr + 2 * size, oi, ra);
             }
             i += size, pg >>= size;
             addr += 3 * size;
@@ -4464,9 +4466,9 @@ static void sve_ld4_r(CPUARMState *env, void *vg, target_ulong addr,
                       uint32_t desc, int size, uintptr_t ra,
                       sve_ld1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned rd = simd_data(desc);
     ARMVectorReg scratch[4] = { };
 
     set_helper_retaddr(ra);
@@ -4474,10 +4476,10 @@ static void sve_ld4_r(CPUARMState *env, void *vg, target_ulong addr,
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
         do {
             if (pg & 1) {
-                tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
-                tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
-                tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
-                tlb_fn(env, &scratch[3], i, addr + 3 * size, mmu_idx, ra);
+                tlb_fn(env, &scratch[0], i, addr, oi, ra);
+                tlb_fn(env, &scratch[1], i, addr + size, oi, ra);
+                tlb_fn(env, &scratch[2], i, addr + 2 * size, oi, ra);
+                tlb_fn(env, &scratch[3], i, addr + 3 * size, oi, ra);
             }
             i += size, pg >>= size;
             addr += 4 * size;
@@ -4572,11 +4574,13 @@ static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
                         sve_ld1_host_fn *host_fn,
                         sve_ld1_tlb_fn *tlb_fn)
 {
-    void *vd = &env->vfp.zregs[simd_data(desc)];
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int mmu_idx = get_mmuidx(oi);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+    void *vd = &env->vfp.zregs[rd];
     const int diffsz = esz - msz;
     const intptr_t reg_max = simd_oprsz(desc);
     const intptr_t mem_max = reg_max >> diffsz;
-    const int mmu_idx = cpu_mmu_index(env, false);
     intptr_t split, reg_off, mem_off;
     void *host;
 
@@ -4620,7 +4624,7 @@ static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
     /* Perform one normal read, which will fault or not.
      * But it is likely to bring the page into the tlb.
      */
-    tlb_fn(env, vd, reg_off, addr + mem_off, mmu_idx, retaddr);
+    tlb_fn(env, vd, reg_off, addr + mem_off, oi, retaddr);
 
     /* After any fault, zero any leading predicated false elts.  */
     swap_memzero(vd, reg_off);
@@ -4649,7 +4653,8 @@ static void sve_ldnf1_r(CPUARMState *env, void *vg, const target_ulong addr,
                         uint32_t desc, const int esz, const int msz,
                         sve_ld1_host_fn *host_fn)
 {
-    void *vd = &env->vfp.zregs[simd_data(desc)];
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+    void *vd = &env->vfp.zregs[rd];
     const int diffsz = esz - msz;
     const intptr_t reg_max = simd_oprsz(desc);
     const intptr_t mem_max = reg_max >> diffsz;
@@ -4781,15 +4786,14 @@ DO_LDFF1_LDNF1_2(dd,  3, 3)
 #ifdef CONFIG_SOFTMMU
 #define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
 static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+                             target_ulong addr, TCGMemOpIdx oi, uintptr_t ra) \
 {                                                                           \
-    TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
     TLB(env, addr, *(TYPEM *)(vd + H(reg_off)), oi, ra);                    \
 }
 #else
 #define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
 static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
-                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
+                             target_ulong addr, TCGMemOpIdx oi, uintptr_t ra) \
 {                                                                           \
     HOST(g2h(addr), *(TYPEM *)(vd + H(reg_off)));                           \
 }
@@ -4828,9 +4832,9 @@ static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr,
                       const int esize, const int msize,
                       sve_st1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned rd = simd_data(desc);
     void *vd = &env->vfp.zregs[rd];
 
     set_helper_retaddr(ra);
@@ -4838,7 +4842,7 @@ static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr,
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
         do {
             if (pg & 1) {
-                tlb_fn(env, vd, i, addr, mmu_idx, ra);
+                tlb_fn(env, vd, i, addr, oi, ra);
             }
             i += esize, pg >>= esize;
             addr += msize;
@@ -4852,9 +4856,9 @@ static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr,
                       const int esize, const int msize,
                       sve_st1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned rd = simd_data(desc);
     void *d1 = &env->vfp.zregs[rd];
     void *d2 = &env->vfp.zregs[(rd + 1) & 31];
 
@@ -4863,8 +4867,8 @@ static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr,
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
         do {
             if (pg & 1) {
-                tlb_fn(env, d1, i, addr, mmu_idx, ra);
-                tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+                tlb_fn(env, d1, i, addr, oi, ra);
+                tlb_fn(env, d2, i, addr + msize, oi, ra);
             }
             i += esize, pg >>= esize;
             addr += 2 * msize;
@@ -4878,9 +4882,9 @@ static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr,
                       const int esize, const int msize,
                       sve_st1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned rd = simd_data(desc);
     void *d1 = &env->vfp.zregs[rd];
     void *d2 = &env->vfp.zregs[(rd + 1) & 31];
     void *d3 = &env->vfp.zregs[(rd + 2) & 31];
@@ -4890,9 +4894,9 @@ static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr,
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
         do {
             if (pg & 1) {
-                tlb_fn(env, d1, i, addr, mmu_idx, ra);
-                tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
-                tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
+                tlb_fn(env, d1, i, addr, oi, ra);
+                tlb_fn(env, d2, i, addr + msize, oi, ra);
+                tlb_fn(env, d3, i, addr + 2 * msize, oi, ra);
             }
             i += esize, pg >>= esize;
             addr += 3 * msize;
@@ -4906,9 +4910,9 @@ static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr,
                       const int esize, const int msize,
                       sve_st1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned rd = simd_data(desc);
     void *d1 = &env->vfp.zregs[rd];
     void *d2 = &env->vfp.zregs[(rd + 1) & 31];
     void *d3 = &env->vfp.zregs[(rd + 2) & 31];
@@ -4919,10 +4923,10 @@ static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr,
         uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
         do {
             if (pg & 1) {
-                tlb_fn(env, d1, i, addr, mmu_idx, ra);
-                tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
-                tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
-                tlb_fn(env, d4, i, addr + 3 * msize, mmu_idx, ra);
+                tlb_fn(env, d1, i, addr, oi, ra);
+                tlb_fn(env, d2, i, addr + msize, oi, ra);
+                tlb_fn(env, d3, i, addr + 2 * msize, oi, ra);
+                tlb_fn(env, d4, i, addr + 3 * msize, oi, ra);
             }
             i += esize, pg >>= esize;
             addr += 4 * msize;
@@ -5015,9 +5019,9 @@ static void sve_ld1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
                        target_ulong base, uint32_t desc, uintptr_t ra,
                        zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned scale = simd_data(desc);
     ARMVectorReg scratch = { };
 
     set_helper_retaddr(ra);
@@ -5026,7 +5030,7 @@ static void sve_ld1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
         do {
             if (pg & 1) {
                 target_ulong off = off_fn(vm, i);
-                tlb_fn(env, &scratch, i, base + (off << scale), mmu_idx, ra);
+                tlb_fn(env, &scratch, i, base + (off << scale), oi, ra);
             }
             i += 4, pg >>= 4;
         } while (i & 15);
@@ -5041,9 +5045,9 @@ static void sve_ld1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
                        target_ulong base, uint32_t desc, uintptr_t ra,
                        zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
     intptr_t i, oprsz = simd_oprsz(desc) / 8;
-    unsigned scale = simd_data(desc);
     ARMVectorReg scratch = { };
 
     set_helper_retaddr(ra);
@@ -5051,7 +5055,7 @@ static void sve_ld1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
         uint8_t pg = *(uint8_t *)(vg + H1(i));
         if (pg & 1) {
             target_ulong off = off_fn(vm, i * 8);
-            tlb_fn(env, &scratch, i * 8, base + (off << scale), mmu_idx, ra);
+            tlb_fn(env, &scratch, i * 8, base + (off << scale), oi, ra);
         }
     }
     set_helper_retaddr(0);
@@ -5157,7 +5161,7 @@ typedef bool sve_ld1_nf_fn(CPUARMState *env, void *vd, intptr_t reg_off,
 #ifdef CONFIG_SOFTMMU
 #define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
 static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
-                            target_ulong addr, int mmu_idx)                 \
+                              target_ulong addr, int mmu_idx)               \
 {                                                                           \
     target_ulong next_page = -(addr | TARGET_PAGE_MASK);                    \
     if (likely(next_page - addr >= sizeof(TYPEM))) {                        \
@@ -5216,9 +5220,10 @@ static inline void sve_ldff1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
                                 zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
                                 sve_ld1_nf_fn *nonfault_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int mmu_idx = get_mmuidx(oi);
+    const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
     intptr_t reg_off, reg_max = simd_oprsz(desc);
-    unsigned scale = simd_data(desc);
     target_ulong addr;
 
     /* Skip to the first true predicate.  */
@@ -5228,7 +5233,7 @@ static inline void sve_ldff1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
         set_helper_retaddr(ra);
         addr = off_fn(vm, reg_off);
         addr = base + (addr << scale);
-        tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+        tlb_fn(env, vd, reg_off, addr, oi, ra);
 
         /* The rest of the reads will be non-faulting.  */
         set_helper_retaddr(0);
@@ -5257,9 +5262,10 @@ static inline void sve_ldff1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
                                 zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
                                 sve_ld1_nf_fn *nonfault_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int mmu_idx = get_mmuidx(oi);
+    const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
     intptr_t reg_off, reg_max = simd_oprsz(desc);
-    unsigned scale = simd_data(desc);
     target_ulong addr;
 
     /* Skip to the first true predicate.  */
@@ -5269,7 +5275,7 @@ static inline void sve_ldff1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
         set_helper_retaddr(ra);
         addr = off_fn(vm, reg_off);
         addr = base + (addr << scale);
-        tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+        tlb_fn(env, vd, reg_off, addr, oi, ra);
 
         /* The rest of the reads will be non-faulting.  */
         set_helper_retaddr(0);
@@ -5381,9 +5387,9 @@ static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
                        target_ulong base, uint32_t desc, uintptr_t ra,
                        zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
     intptr_t i, oprsz = simd_oprsz(desc);
-    unsigned scale = simd_data(desc);
 
     set_helper_retaddr(ra);
     for (i = 0; i < oprsz; ) {
@@ -5391,7 +5397,7 @@ static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
         do {
             if (pg & 1) {
                 target_ulong off = off_fn(vm, i);
-                tlb_fn(env, vd, i, base + (off << scale), mmu_idx, ra);
+                tlb_fn(env, vd, i, base + (off << scale), oi, ra);
             }
             i += 4, pg >>= 4;
         } while (i & 15);
@@ -5403,16 +5409,16 @@ static void sve_st1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
                        target_ulong base, uint32_t desc, uintptr_t ra,
                        zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
 {
-    const int mmu_idx = cpu_mmu_index(env, false);
+    const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+    const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
     intptr_t i, oprsz = simd_oprsz(desc) / 8;
-    unsigned scale = simd_data(desc);
 
     set_helper_retaddr(ra);
     for (i = 0; i < oprsz; i++) {
         uint8_t pg = *(uint8_t *)(vg + H1(i));
         if (pg & 1) {
             target_ulong off = off_fn(vm, i * 8);
-            tlb_fn(env, vd, i * 8, base + (off << scale), mmu_idx, ra);
+            tlb_fn(env, vd, i * 8, base + (off << scale), oi, ra);
         }
     }
     set_helper_retaddr(0);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 20492e9b8b..05ba0518c8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4600,25 +4600,34 @@ static const uint8_t dtype_esz[16] = {
     3, 2, 1, 3
 };
 
+static TCGMemOpIdx sve_memopidx(DisasContext *s, int dtype)
+{
+    return make_memop_idx(s->be_data | dtype_mop[dtype], get_mem_index(s));
+}
+
 static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
-                       gen_helper_gvec_mem *fn)
+                       int dtype, gen_helper_gvec_mem *fn)
 {
     unsigned vsz = vec_full_reg_size(s);
     TCGv_ptr t_pg;
-    TCGv_i32 desc;
+    TCGv_i32 t_desc;
+    int desc;
 
     /* For e.g. LD4, there are not enough arguments to pass all 4
      * registers as pointers, so encode the regno into the data field.
      * For consistency, do this even for LD1.
      */
-    desc = tcg_const_i32(simd_desc(vsz, vsz, zt));
+    desc = sve_memopidx(s, dtype);
+    desc |= zt << MEMOPIDX_SHIFT;
+    desc = simd_desc(vsz, vsz, desc);
+    t_desc = tcg_const_i32(desc);
     t_pg = tcg_temp_new_ptr();
 
     tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
-    fn(cpu_env, t_pg, addr, desc);
+    fn(cpu_env, t_pg, addr, t_desc);
 
     tcg_temp_free_ptr(t_pg);
-    tcg_temp_free_i32(desc);
+    tcg_temp_free_i32(t_desc);
 }
 
 static void do_ld_zpa(DisasContext *s, int zt, int pg,
@@ -4681,7 +4690,7 @@ static void do_ld_zpa(DisasContext *s, int zt, int pg,
      * accessible via the instruction encoding.
      */
     assert(fn != NULL);
-    do_mem_zpa(s, zt, pg, addr, fn);
+    do_mem_zpa(s, zt, pg, addr, dtype, fn);
 }
 
 static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
@@ -4763,7 +4772,8 @@ static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
         TCGv_i64 addr = new_tmp_a64(s);
         tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
         tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
-        do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
+        do_mem_zpa(s, a->rd, a->pg, addr, a->dtype,
+                   fns[s->be_data == MO_BE][a->dtype]);
     }
     return true;
 }
@@ -4821,7 +4831,8 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
         TCGv_i64 addr = new_tmp_a64(s);
 
         tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), off);
-        do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
+        do_mem_zpa(s, a->rd, a->pg, addr, a->dtype,
+                   fns[s->be_data == MO_BE][a->dtype]);
     }
     return true;
 }
@@ -4836,11 +4847,14 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
     };
     unsigned vsz = vec_full_reg_size(s);
     TCGv_ptr t_pg;
-    TCGv_i32 desc;
-    int poff;
+    TCGv_i32 t_desc;
+    int desc, poff;
 
     /* Load the first quadword using the normal predicated load helpers.  */
-    desc = tcg_const_i32(simd_desc(16, 16, zt));
+    desc = sve_memopidx(s, msz_dtype(msz));
+    desc |= zt << MEMOPIDX_SHIFT;
+    desc = simd_desc(16, 16, desc);
+    t_desc = tcg_const_i32(desc);
 
     poff = pred_full_reg_offset(s, pg);
     if (vsz > 16) {
@@ -4864,10 +4878,10 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
     t_pg = tcg_temp_new_ptr();
     tcg_gen_addi_ptr(t_pg, cpu_env, poff);
 
-    fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, desc);
+    fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, t_desc);
 
     tcg_temp_free_ptr(t_pg);
-    tcg_temp_free_i32(desc);
+    tcg_temp_free_i32(t_desc);
 
     /* Replicate that first quadword.  */
     if (vsz > 16) {
@@ -5019,7 +5033,7 @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
         fn = fn_multiple[be][nreg - 1][msz];
     }
     assert(fn != NULL);
-    do_mem_zpa(s, zt, pg, addr, fn);
+    do_mem_zpa(s, zt, pg, addr, msz_dtype(msz), fn);
 }
 
 static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a, uint32_t insn)
@@ -5057,24 +5071,31 @@ static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a, uint32_t insn)
  *** SVE gather loads / scatter stores
  */
 
-static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale,
-                       TCGv_i64 scalar, gen_helper_gvec_mem_scatter *fn)
+static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm,
+                       int scale, TCGv_i64 scalar, int msz,
+                       gen_helper_gvec_mem_scatter *fn)
 {
     unsigned vsz = vec_full_reg_size(s);
-    TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, scale));
     TCGv_ptr t_zm = tcg_temp_new_ptr();
     TCGv_ptr t_pg = tcg_temp_new_ptr();
     TCGv_ptr t_zt = tcg_temp_new_ptr();
+    TCGv_i32 t_desc;
+    int desc;
+
+    desc = sve_memopidx(s, msz_dtype(msz));
+    desc |= scale << MEMOPIDX_SHIFT;
+    desc = simd_desc(vsz, vsz, desc);
+    t_desc = tcg_const_i32(desc);
 
     tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
     tcg_gen_addi_ptr(t_zm, cpu_env, vec_full_reg_offset(s, zm));
     tcg_gen_addi_ptr(t_zt, cpu_env, vec_full_reg_offset(s, zt));
-    fn(cpu_env, t_zt, t_pg, t_zm, scalar, desc);
+    fn(cpu_env, t_zt, t_pg, t_zm, scalar, t_desc);
 
     tcg_temp_free_ptr(t_zt);
     tcg_temp_free_ptr(t_zm);
     tcg_temp_free_ptr(t_pg);
-    tcg_temp_free_i32(desc);
+    tcg_temp_free_i32(t_desc);
 }
 
 /* Indexed by [be][ff][xs][u][msz].  */
@@ -5263,7 +5284,7 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
     assert(fn != NULL);
 
     do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
-               cpu_reg_sp(s, a->rn), fn);
+               cpu_reg_sp(s, a->rn), a->msz, fn);
     return true;
 }
 
@@ -5294,7 +5315,7 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
      * by loading the immediate into the scalar parameter.
      */
     imm = tcg_const_i64(a->imm << a->msz);
-    do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn);
+    do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, a->msz, fn);
     tcg_temp_free_i64(imm);
     return true;
 }
@@ -5369,7 +5390,7 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
         g_assert_not_reached();
     }
     do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
-               cpu_reg_sp(s, a->rn), fn);
+               cpu_reg_sp(s, a->rn), a->msz, fn);
     return true;
 }
 
@@ -5400,7 +5421,7 @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
      * by loading the immediate into the scalar parameter.
      */
     imm = tcg_const_i64(a->imm << a->msz);
-    do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn);
+    do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, a->msz, fn);
     tcg_temp_free_i64(imm);
     return true;
 }
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (19 preceding siblings ...)
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
@ 2018-08-09  5:48 ` Laurent Desnogues
  2018-08-18  9:15 ` no-reply
  2018-08-18 10:01 ` no-reply
  22 siblings, 0 replies; 51+ messages in thread
From: Laurent Desnogues @ 2018-08-09  5:48 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel@nongnu.org, Peter Maydell, Alex Bennée

Hello,

On Thu, Aug 9, 2018 at 6:21 AM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This is my current set of patches for running SVE in system mode.
>
> The first half deal with the system registers that affect SVE.
> I recall that Peter has said he'd like the first patch to be
> done a different way, but we haven't had a chance to talk about
> what form it should take.  I've left it as-is since it does what
> I need for now.
>
> The second half re-implement the SVE memory operations.
> The FF and NF loads had been stubbed out.  Getting those to work
> requires some infrastructure that can be reused to speed up normal
> loads -- one guest-to-host tlb lookup can be reused for the rest
> of the page.

I did not review every patch individually but tested the whole and
found no issue.

Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>

Thanks,

Laurent

>
> r~
>
>
> Based-on: <20180809034033.10579-1-richard.henderson@linaro.org>
> Richard Henderson (20):
>   target/arm: Set ISAR bits for -cpu max
>   target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
>   target/arm: Define ID_AA64ZFR0_EL1
>   target/arm: Adjust sve_exception_el
>   target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
>   target/arm: Fix arm_current_el for user-only
>   target/arm: Fix is_a64 for user-only
>   target/arm: Pass in current_el to fp and sve_exception_el
>   target/arm: Handle SVE vector length changes in system mode
>   target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
>   target/arm: Clear unused predicate bits for LD1RQ
>   target/arm: Rewrite helper_sve_ld1*_r using pages
>   target/arm: Rewrite helper_sve_ld[234]*_r
>   target/arm: Rewrite helper_sve_st[1234]*_r
>   target/arm: Split contiguous loads for endianness
>   target/arm: Split contiguous stores for endianness
>   target/arm: Rewrite vector gather loads
>   target/arm: Rewrite vector gather stores
>   target/arm: Rewrite vector gather first-fault loads
>   target/arm: Pass TCGMemOpIdx to sve memory helpers
>
>  target/arm/cpu.h           |   47 +-
>  target/arm/helper-sve.h    |  385 +++++--
>  target/arm/internals.h     |    5 +
>  target/arm/cpu.c           |   24 +-
>  target/arm/cpu64.c         |   93 +-
>  target/arm/helper.c        |  237 +++--
>  target/arm/op_helper.c     |    1 +
>  target/arm/sve_helper.c    | 2062 +++++++++++++++++++++++++-----------
>  target/arm/translate-a64.c |    8 +-
>  target/arm/translate-sve.c |  670 ++++++++----
>  10 files changed, 2453 insertions(+), 1079 deletions(-)
>
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
@ 2018-08-09 18:01   ` Alex Bennée
  2018-08-09 18:50     ` Richard Henderson
  0 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2018-08-09 18:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent.desnogues, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> We are going to want to determine whether sve is enabled
> for EL than current.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Are these patches meant to apply to origin/master or on top of the
user-mode fixes? This didn't apply for me:


> @@ -12385,11 +12382,12 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>                            target_ulong *cs_base, uint32_t *pflags)
>  {
>      ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
> -    int fp_el = fp_exception_el(env);
> +    int current_el = arm_current_el(env);
> +    int fp_el = fp_exception_el(env, current_el);
>      uint32_t flags;
>
>      if (is_a64(env)) {
> -        int sve_el = sve_exception_el(env);
> +        int sve_el = sve_exception_el(env, current_el);
>          uint32_t zcr_len;
>
>          *pc = env->pc;
> @@ -12404,7 +12402,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
>          if (sve_el != 0 && fp_el == 0) {
>              zcr_len = 0;
>          } else {
> -            int current_el = arm_current_el(env);
>              ARMCPU *cpu = arm_env_get_cpu(env);
>
>              zcr_len = cpu->sve_max_vq - 1;

++<<<<<<< HEAD
 +            int current_el = arm_current_el(env);
++=======
+             ARMCPU *cpu = arm_env_get_cpu(env);

--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el
  2018-08-09 18:01   ` Alex Bennée
@ 2018-08-09 18:50     ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 18:50 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent.desnogues, peter.maydell

On 08/09/2018 11:01 AM, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> We are going to want to determine whether sve is enabled
>> for EL than current.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> Are these patches meant to apply to origin/master or on top of the
> user-mode fixes? This didn't apply for me:

On top of the user-mode fixes.

Based-on: <20180809034033.10579-1-richard.henderson@linaro.org>

And of course you can see it all on my sve-3.1 branch.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
@ 2018-08-10  9:13   ` Alex Bennée
  2018-08-10 19:15     ` Richard Henderson
  2018-08-23 16:01   ` Peter Maydell
  1 sibling, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2018-08-10  9:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, laurent.desnogues, peter.maydell


Richard Henderson <richard.henderson@linaro.org> writes:

> Uses tlb_vaddr_to_host for correct operation with softmmu.
> Optimize for accesses within a single page or pair of pages.
>
> Perf report comparison for cortex-strings test-strlen
> with aarch64-linux-user:
>
<snip>
> +/*
> + * Common helper for all contiguous one-register predicated loads.
> + */
> +static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
> +                      uint32_t desc, const uintptr_t retaddr,
> +                      const int esz, const int msz,
> +                      sve_ld1_host_fn *host_fn,
> +                      sve_ld1_tlb_fn *tlb_fn)
> +{
> +    void *vd = &env->vfp.zregs[simd_data(desc)];
> +    const int diffsz = esz - msz;
> +    const intptr_t reg_max = simd_oprsz(desc);
> +    const intptr_t mem_max = reg_max >> diffsz;
> +    const int mmu_idx = cpu_mmu_index(env, false);
> +    ARMVectorReg scratch;
> +    void *host, *result;
> +    intptr_t split;
> +
> +    set_helper_retaddr(retaddr);
> +
> +    host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> +    if (test_host_page(host)) {
> +        split = max_for_page(addr, 0, mem_max);
> +        if (likely(split == mem_max)) {
> +            /* The load is entirely within a valid page.  For softmmu,
> +             * no faults.  For user-only, if the first byte does not
> +             * fault then none of them will fault, so Vd will never be
> +             * partially modified.
> +             */
> +            host_fn(vd, vg, host, 0, mem_max);
> +            set_helper_retaddr(0);
> +            return;
> +        }
> +    }
> +
> +    /* Perform the predicated read into a temporary, thus ensuring
> +     * if the load of the last element faults, Vd is not modified.
> +     */
> +    result = &scratch;
> +#ifdef CONFIG_USER_ONLY
> +    host_fn(vd, vg, host, 0, mem_max);
> +#else
> +    memset(result, 0, reg_max);
> +    for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);

Hmm this blew up CI complaining about c99-isms, but QEMU is supposed to
be c99 compliant.

  https://travis-ci.org/stsquad/qemu/builds/414248994

--
Alex Bennée

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
  2018-08-10  9:13   ` Alex Bennée
@ 2018-08-10 19:15     ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-10 19:15 UTC (permalink / raw)
  To: Alex Bennée; +Cc: qemu-devel, laurent.desnogues, peter.maydell

On 08/10/2018 02:13 AM, Alex Bennée wrote:
>> +    for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
> 
> Hmm this blew up CI complaining about c99-isms, but QEMU is supposed to
> be c99 compliant.
> 
>   https://travis-ci.org/stsquad/qemu/builds/414248994

Bah.  That's what I get for doing two things at once on
different projects with different standards.

On the other hand, would anyone seriously object to me
adding -std=gnu99 to the compile flags?  C99's 20th
birthday is coming up next year...


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
@ 2018-08-11  5:40   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-08-11  5:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: laurent.desnogues, peter.maydell, alex.bennee

On 08/09/2018 01:22 AM, Richard Henderson wrote:
> We can choose the endianness at translation time, rather than
> re-computing it at execution time.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  target/arm/helper-sve.h    | 117 +++++++++++++++-------
>  target/arm/sve_helper.c    |  70 ++++++-------
>  target/arm/translate-sve.c | 196 +++++++++++++++++++++++++------------
>  3 files changed, 252 insertions(+), 131 deletions(-)
> 
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 023952a9a4..526caec8da 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -1128,20 +1128,35 @@ DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ld4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ld1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ld1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ld1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
>  DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> @@ -1150,13 +1165,21 @@ DEF_HELPER_FLAGS_4(sve_ld1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ld1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ld1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ld1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
>  DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> @@ -1166,17 +1189,28 @@ DEF_HELPER_FLAGS_4(sve_ldff1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ldff1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ldff1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ldff1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ldff1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ldff1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldff1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldff1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
>  DEF_HELPER_FLAGS_4(sve_ldnf1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ldnf1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> @@ -1186,17 +1220,28 @@ DEF_HELPER_FLAGS_4(sve_ldnf1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ldnf1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_ldnf1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ldnf1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ldnf1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_ldnf1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldnf1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldnf1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
>  DEF_HELPER_FLAGS_4(sve_st1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 4eae6569cc..56e2f523c5 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -4362,18 +4362,18 @@ void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
>                sve_##NAME##_host, sve_##NAME##_tlb);            \
>  }
>  
> -/* TODO: Propagate the endian check back to the translator.  */
>  #define DO_LD1_2(NAME, ESZ, MSZ) \
> -void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
> -                            target_ulong addr, uint32_t desc)  \
> -{                                                              \
> -    if (arm_cpu_data_is_big_endian(env)) {                     \
> -        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
> -                  sve_##NAME##_be_host, sve_##NAME##_be_tlb);  \
> -    } else {                                                   \
> -        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
> -                  sve_##NAME##_le_host, sve_##NAME##_le_tlb);  \
> -    }                                                          \
> +void HELPER(sve_##NAME##_le_r)(CPUARMState *env, void *vg,        \
> +                               target_ulong addr, uint32_t desc)  \
> +{                                                                 \
> +    sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
> +              sve_##NAME##_le_host, sve_##NAME##_le_tlb);         \
> +}                                                                 \
> +void HELPER(sve_##NAME##_be_r)(CPUARMState *env, void *vg,        \
> +                               target_ulong addr, uint32_t desc)  \
> +{                                                                 \
> +    sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
> +              sve_##NAME##_be_host, sve_##NAME##_be_tlb);         \
>  }
>  
>  DO_LD1_1(ld1bb,  0)
> @@ -4500,12 +4500,17 @@ void __attribute__((flatten)) HELPER(sve_ld##N##bb_r)               \
>  }
>  
>  #define DO_LDN_2(N, SUFF, SIZE)                                       \
> -void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_r)             \
> +void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_le_r)          \
>      (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
>  {                                                                     \
>      sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(),                 \
> -                  arm_cpu_data_is_big_endian(env)                     \
> -                  ? sve_ld1##SUFF##_be_tlb : sve_ld1##SUFF##_le_tlb); \
> +                  sve_ld1##SUFF##_le_tlb);                            \
> +}                                                                     \
> +void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_be_r)          \
> +    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
> +{                                                                     \
> +    sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(),                 \
> +                  sve_ld1##SUFF##_be_tlb);                            \
>  }
>  
>  DO_LDN_1(2)
> @@ -4722,29 +4727,28 @@ void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
>      sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host);     \
>  }
>  
> -/* TODO: Propagate the endian check back to the translator.  */
>  #define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
> -void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg,            \
> -                                 target_ulong addr, uint32_t desc)      \
> +void HELPER(sve_ldff1##PART##_le_r)(CPUARMState *env, void *vg,         \
> +                                    target_ulong addr, uint32_t desc)   \
>  {                                                                       \
> -    if (arm_cpu_data_is_big_endian(env)) {                              \
> -        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
> -                    sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb);   \
> -    } else {                                                            \
> -        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
> -                    sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb);   \
> -    }                                                                   \
> +    sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,                 \
> +                sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb);       \
>  }                                                                       \
> -void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
> -                                 target_ulong addr, uint32_t desc)      \
> +void HELPER(sve_ldnf1##PART##_le_r)(CPUARMState *env, void *vg,         \
> +                                    target_ulong addr, uint32_t desc)   \
>  {                                                                       \
> -    if (arm_cpu_data_is_big_endian(env)) {                              \
> -        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
> -                    sve_ld1##PART##_be_host);                           \
> -    } else {                                                            \
> -        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
> -                    sve_ld1##PART##_le_host);                           \
> -    }                                                                   \
> +    sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_le_host); \
> +}                                                                       \
> +void HELPER(sve_ldff1##PART##_be_r)(CPUARMState *env, void *vg,         \
> +                                    target_ulong addr, uint32_t desc)   \
> +{                                                                       \
> +    sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,                 \
> +                sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb);       \
> +}                                                                       \
> +void HELPER(sve_ldnf1##PART##_be_r)(CPUARMState *env, void *vg,         \
> +                                    target_ulong addr, uint32_t desc)   \
> +{                                                                       \
> +    sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_be_host); \
>  }
>  
>  DO_LDFF1_LDNF1_1(bb,  0)
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index bef6b8242d..de12c01e7d 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4624,32 +4624,58 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
>  static void do_ld_zpa(DisasContext *s, int zt, int pg,
>                        TCGv_i64 addr, int dtype, int nreg)
>  {
> -    static gen_helper_gvec_mem * const fns[16][4] = {
> -        { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
> -          gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
> -        { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
> +    static gen_helper_gvec_mem * const fns[2][16][4] = {
> +        /* Little-endian */
> +        { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
> +            gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
> +          { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
>  
> -        { gen_helper_sve_ld1sds_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1hh_r, gen_helper_sve_ld2hh_r,
> -          gen_helper_sve_ld3hh_r, gen_helper_sve_ld4hh_r },
> -        { gen_helper_sve_ld1hsu_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1hdu_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r,
> +            gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r },
> +          { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL },
>  
> -        { gen_helper_sve_ld1hds_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1hss_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1ss_r, gen_helper_sve_ld2ss_r,
> -          gen_helper_sve_ld3ss_r, gen_helper_sve_ld4ss_r },
> -        { gen_helper_sve_ld1sdu_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r,
> +            gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r },
> +          { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL },
>  
> -        { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
> -        { gen_helper_sve_ld1dd_r, gen_helper_sve_ld2dd_r,
> -          gen_helper_sve_ld3dd_r, gen_helper_sve_ld4dd_r },
> +          { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r,
> +            gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } },
> +
> +        /* Big-endian */
> +        { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
> +            gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
> +          { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
> +
> +          { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1hh_be_r, gen_helper_sve_ld2hh_be_r,
> +            gen_helper_sve_ld3hh_be_r, gen_helper_sve_ld4hh_be_r },
> +          { gen_helper_sve_ld1hsu_be_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1hdu_be_r, NULL, NULL, NULL },
> +
> +          { gen_helper_sve_ld1hds_be_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1hss_be_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld2ss_be_r,
> +            gen_helper_sve_ld3ss_be_r, gen_helper_sve_ld4ss_be_r },
> +          { gen_helper_sve_ld1sdu_be_r, NULL, NULL, NULL },
> +
> +          { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
> +          { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r,
> +            gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } }
>      };
> -    gen_helper_gvec_mem *fn = fns[dtype][nreg];
> +    gen_helper_gvec_mem *fn = fns[s->be_data == MO_BE][dtype][nreg];
>  
>      /* While there are holes in the table, they are not
>       * accessible via the instruction encoding.
> @@ -4689,59 +4715,103 @@ static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>  
>  static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
>  {
> -    static gen_helper_gvec_mem * const fns[16] = {
> -        gen_helper_sve_ldff1bb_r,
> -        gen_helper_sve_ldff1bhu_r,
> -        gen_helper_sve_ldff1bsu_r,
> -        gen_helper_sve_ldff1bdu_r,
> +    static gen_helper_gvec_mem * const fns[2][16] = {
> +        /* Little-endian */
> +        { gen_helper_sve_ldff1bb_r,
> +          gen_helper_sve_ldff1bhu_r,
> +          gen_helper_sve_ldff1bsu_r,
> +          gen_helper_sve_ldff1bdu_r,
>  
> -        gen_helper_sve_ldff1sds_r,
> -        gen_helper_sve_ldff1hh_r,
> -        gen_helper_sve_ldff1hsu_r,
> -        gen_helper_sve_ldff1hdu_r,
> +          gen_helper_sve_ldff1sds_le_r,
> +          gen_helper_sve_ldff1hh_le_r,
> +          gen_helper_sve_ldff1hsu_le_r,
> +          gen_helper_sve_ldff1hdu_le_r,
>  
> -        gen_helper_sve_ldff1hds_r,
> -        gen_helper_sve_ldff1hss_r,
> -        gen_helper_sve_ldff1ss_r,
> -        gen_helper_sve_ldff1sdu_r,
> +          gen_helper_sve_ldff1hds_le_r,
> +          gen_helper_sve_ldff1hss_le_r,
> +          gen_helper_sve_ldff1ss_le_r,
> +          gen_helper_sve_ldff1sdu_le_r,
>  
> -        gen_helper_sve_ldff1bds_r,
> -        gen_helper_sve_ldff1bss_r,
> -        gen_helper_sve_ldff1bhs_r,
> -        gen_helper_sve_ldff1dd_r,
> +          gen_helper_sve_ldff1bds_r,
> +          gen_helper_sve_ldff1bss_r,
> +          gen_helper_sve_ldff1bhs_r,
> +          gen_helper_sve_ldff1dd_le_r },
> +
> +        /* Big-endian */
> +        { gen_helper_sve_ldff1bb_r,
> +          gen_helper_sve_ldff1bhu_r,
> +          gen_helper_sve_ldff1bsu_r,
> +          gen_helper_sve_ldff1bdu_r,
> +
> +          gen_helper_sve_ldff1sds_be_r,
> +          gen_helper_sve_ldff1hh_be_r,
> +          gen_helper_sve_ldff1hsu_be_r,
> +          gen_helper_sve_ldff1hdu_be_r,
> +
> +          gen_helper_sve_ldff1hds_be_r,
> +          gen_helper_sve_ldff1hss_be_r,
> +          gen_helper_sve_ldff1ss_be_r,
> +          gen_helper_sve_ldff1sdu_be_r,
> +
> +          gen_helper_sve_ldff1bds_r,
> +          gen_helper_sve_ldff1bss_r,
> +          gen_helper_sve_ldff1bhs_r,
> +          gen_helper_sve_ldff1dd_be_r },
>      };
>  
>      if (sve_access_check(s)) {
>          TCGv_i64 addr = new_tmp_a64(s);
>          tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
>          tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
> -        do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
> +        do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
>      }
>      return true;
>  }
>  
>  static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>  {
> -    static gen_helper_gvec_mem * const fns[16] = {
> -        gen_helper_sve_ldnf1bb_r,
> -        gen_helper_sve_ldnf1bhu_r,
> -        gen_helper_sve_ldnf1bsu_r,
> -        gen_helper_sve_ldnf1bdu_r,
> +    static gen_helper_gvec_mem * const fns[2][16] = {
> +        /* Little-endian */
> +        { gen_helper_sve_ldnf1bb_r,
> +          gen_helper_sve_ldnf1bhu_r,
> +          gen_helper_sve_ldnf1bsu_r,
> +          gen_helper_sve_ldnf1bdu_r,
>  
> -        gen_helper_sve_ldnf1sds_r,
> -        gen_helper_sve_ldnf1hh_r,
> -        gen_helper_sve_ldnf1hsu_r,
> -        gen_helper_sve_ldnf1hdu_r,
> +          gen_helper_sve_ldnf1sds_le_r,
> +          gen_helper_sve_ldnf1hh_le_r,
> +          gen_helper_sve_ldnf1hsu_le_r,
> +          gen_helper_sve_ldnf1hdu_le_r,
>  
> -        gen_helper_sve_ldnf1hds_r,
> -        gen_helper_sve_ldnf1hss_r,
> -        gen_helper_sve_ldnf1ss_r,
> -        gen_helper_sve_ldnf1sdu_r,
> +          gen_helper_sve_ldnf1hds_le_r,
> +          gen_helper_sve_ldnf1hss_le_r,
> +          gen_helper_sve_ldnf1ss_le_r,
> +          gen_helper_sve_ldnf1sdu_le_r,
>  
> -        gen_helper_sve_ldnf1bds_r,
> -        gen_helper_sve_ldnf1bss_r,
> -        gen_helper_sve_ldnf1bhs_r,
> -        gen_helper_sve_ldnf1dd_r,
> +          gen_helper_sve_ldnf1bds_r,
> +          gen_helper_sve_ldnf1bss_r,
> +          gen_helper_sve_ldnf1bhs_r,
> +          gen_helper_sve_ldnf1dd_le_r },
> +
> +        /* Big-endian */
> +        { gen_helper_sve_ldnf1bb_r,
> +          gen_helper_sve_ldnf1bhu_r,
> +          gen_helper_sve_ldnf1bsu_r,
> +          gen_helper_sve_ldnf1bdu_r,
> +
> +          gen_helper_sve_ldnf1sds_be_r,
> +          gen_helper_sve_ldnf1hh_be_r,
> +          gen_helper_sve_ldnf1hsu_be_r,
> +          gen_helper_sve_ldnf1hdu_be_r,
> +
> +          gen_helper_sve_ldnf1hds_be_r,
> +          gen_helper_sve_ldnf1hss_be_r,
> +          gen_helper_sve_ldnf1ss_be_r,
> +          gen_helper_sve_ldnf1sdu_be_r,
> +
> +          gen_helper_sve_ldnf1bds_r,
> +          gen_helper_sve_ldnf1bss_r,
> +          gen_helper_sve_ldnf1bhs_r,
> +          gen_helper_sve_ldnf1dd_be_r },
>      };
>  
>      if (sve_access_check(s)) {
> @@ -4751,16 +4821,18 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>          TCGv_i64 addr = new_tmp_a64(s);
>  
>          tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), off);
> -        do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
> +        do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
>      }
>      return true;
>  }
>  
>  static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
>  {
> -    static gen_helper_gvec_mem * const fns[4] = {
> -        gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_r,
> -        gen_helper_sve_ld1ss_r, gen_helper_sve_ld1dd_r,
> +    static gen_helper_gvec_mem * const fns[2][4] = {
> +        { gen_helper_sve_ld1bb_r,    gen_helper_sve_ld1hh_le_r,
> +          gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld1dd_le_r },
> +        { gen_helper_sve_ld1bb_r,    gen_helper_sve_ld1hh_be_r,
> +          gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld1dd_be_r },
>      };
>      unsigned vsz = vec_full_reg_size(s);
>      TCGv_ptr t_pg;
> @@ -4792,7 +4864,7 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
>      t_pg = tcg_temp_new_ptr();
>      tcg_gen_addi_ptr(t_pg, cpu_env, poff);
>  
> -    fns[msz](cpu_env, t_pg, addr, desc);
> +    fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, desc);
>  
>      tcg_temp_free_ptr(t_pg);
>      tcg_temp_free_i32(desc);
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores for endianness
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
@ 2018-08-11  5:41   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-08-11  5:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: laurent.desnogues, peter.maydell, alex.bennee

On 08/09/2018 01:22 AM, Richard Henderson wrote:
> We can choose the endianness at translation time, rather than
> re-computing it at execution time.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>

> ---
>  target/arm/helper-sve.h    | 48 +++++++++++++++++--------
>  target/arm/sve_helper.c    | 11 ++++--
>  target/arm/translate-sve.c | 72 +++++++++++++++++++++++++++++---------
>  3 files changed, 96 insertions(+), 35 deletions(-)
> 
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 526caec8da..1ad043101a 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -1248,29 +1248,47 @@ DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_st3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_st4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_st1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_st1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_st1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_st1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_st1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_st1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
>  DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hs_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hs_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
> -DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1sd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>  
>  DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG,
>                     void, env, ptr, ptr, ptr, tl, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 56e2f523c5..92c0e961a9 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -4940,12 +4940,17 @@ void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r)           \
>  }
>  
>  #define DO_STN_2(N, NAME, ESIZE, MSIZE) \
> -void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r)             \
> +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_le_r)          \
>      (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
>  {                                                                     \
>      sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE,         \
> -                  arm_cpu_data_is_big_endian(env)                     \
> -                  ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \
> +                  sve_st1##NAME##_le_tlb);                            \
> +}                                                                     \
> +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_be_r)          \
> +    (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc)    \
> +{                                                                     \
> +    sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE,         \
> +                  sve_st1##NAME##_be_tlb);                            \
>  }
>  
>  DO_STN_1(1, bb, 1)
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index de12c01e7d..acb85731f8 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4953,32 +4953,70 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>  static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
>                        int msz, int esz, int nreg)
>  {
> -    static gen_helper_gvec_mem * const fn_single[4][4] = {
> -        { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r,
> -          gen_helper_sve_st1bs_r, gen_helper_sve_st1bd_r },
> -        { NULL,                   gen_helper_sve_st1hh_r,
> -          gen_helper_sve_st1hs_r, gen_helper_sve_st1hd_r },
> -        { NULL, NULL,
> -          gen_helper_sve_st1ss_r, gen_helper_sve_st1sd_r },
> -        { NULL, NULL, NULL, gen_helper_sve_st1dd_r },
> +    static gen_helper_gvec_mem * const fn_single[2][4][4] = {
> +        { { gen_helper_sve_st1bb_r,
> +            gen_helper_sve_st1bh_r,
> +            gen_helper_sve_st1bs_r,
> +            gen_helper_sve_st1bd_r },
> +          { NULL,
> +            gen_helper_sve_st1hh_le_r,
> +            gen_helper_sve_st1hs_le_r,
> +            gen_helper_sve_st1hd_le_r },
> +          { NULL, NULL,
> +            gen_helper_sve_st1ss_le_r,
> +            gen_helper_sve_st1sd_le_r },
> +          { NULL, NULL, NULL,
> +            gen_helper_sve_st1dd_le_r } },
> +        { { gen_helper_sve_st1bb_r,
> +            gen_helper_sve_st1bh_r,
> +            gen_helper_sve_st1bs_r,
> +            gen_helper_sve_st1bd_r },
> +          { NULL,
> +            gen_helper_sve_st1hh_be_r,
> +            gen_helper_sve_st1hs_be_r,
> +            gen_helper_sve_st1hd_be_r },
> +          { NULL, NULL,
> +            gen_helper_sve_st1ss_be_r,
> +            gen_helper_sve_st1sd_be_r },
> +          { NULL, NULL, NULL,
> +            gen_helper_sve_st1dd_be_r } },
>      };
> -    static gen_helper_gvec_mem * const fn_multiple[3][4] = {
> -        { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_r,
> -          gen_helper_sve_st2ss_r, gen_helper_sve_st2dd_r },
> -        { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_r,
> -          gen_helper_sve_st3ss_r, gen_helper_sve_st3dd_r },
> -        { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_r,
> -          gen_helper_sve_st4ss_r, gen_helper_sve_st4dd_r },
> +    static gen_helper_gvec_mem * const fn_multiple[2][3][4] = {
> +        { { gen_helper_sve_st2bb_r,
> +            gen_helper_sve_st2hh_le_r,
> +            gen_helper_sve_st2ss_le_r,
> +            gen_helper_sve_st2dd_le_r },
> +          { gen_helper_sve_st3bb_r,
> +            gen_helper_sve_st3hh_le_r,
> +            gen_helper_sve_st3ss_le_r,
> +            gen_helper_sve_st3dd_le_r },
> +          { gen_helper_sve_st4bb_r,
> +            gen_helper_sve_st4hh_le_r,
> +            gen_helper_sve_st4ss_le_r,
> +            gen_helper_sve_st4dd_le_r } },
> +        { { gen_helper_sve_st2bb_r,
> +            gen_helper_sve_st2hh_be_r,
> +            gen_helper_sve_st2ss_be_r,
> +            gen_helper_sve_st2dd_be_r },
> +          { gen_helper_sve_st3bb_r,
> +            gen_helper_sve_st3hh_be_r,
> +            gen_helper_sve_st3ss_be_r,
> +            gen_helper_sve_st3dd_be_r },
> +          { gen_helper_sve_st4bb_r,
> +            gen_helper_sve_st4hh_be_r,
> +            gen_helper_sve_st4ss_be_r,
> +            gen_helper_sve_st4dd_be_r } },
>      };
>      gen_helper_gvec_mem *fn;
> +    int be = s->be_data == MO_BE;
>  
>      if (nreg == 0) {
>          /* ST1 */
> -        fn = fn_single[msz][esz];
> +        fn = fn_single[be][msz][esz];
>      } else {
>          /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
>          assert(msz == esz);
> -        fn = fn_multiple[nreg - 1][msz];
> +        fn = fn_multiple[be][nreg - 1][msz];
>      }
>      assert(fn != NULL);
>      do_mem_zpa(s, zt, pg, addr, fn);
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
@ 2018-08-17 15:50   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 15:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Given that the only field defined for this new register may only
> be 0, we don't actually need to change anything except the name.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
@ 2018-08-17 15:57   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 15:57 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Check for EL3 before testing CPTR_EL3.EZ.  Return 0 when the exception
> should be routed via AdvSIMDFPAccessTrap.  Mirror the structure of
> CheckSVEEnabled more closely.
>
> Fixes: 5be5e8eda78
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper.c | 96 ++++++++++++++++++++++-----------------------
>  1 file changed, 46 insertions(+), 50 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
@ 2018-08-17 16:02   ` Peter Maydell
  2018-08-17 16:47     ` Richard Henderson
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Unlike aa32, endianness cannot be adjusted by userland in aa64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h | 27 +++++++++++++++++----------
>  1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 9526ed27cb..2d6d7d03aa 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -2709,8 +2709,6 @@ static inline bool arm_sctlr_b(CPUARMState *env)
>  /* Return true if the processor is in big-endian mode. */
>  static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
>  {
> -    int cur_el;
> -
>      /* In 32bit endianness is determined by looking at CPSR's E bit */
>      if (!is_a64(env)) {
>          return
> @@ -2729,15 +2727,24 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
>              arm_sctlr_b(env) ||
>  #endif
>                  ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
> +    } else {
> +#ifdef CONFIG_USER_ONLY
> +        /* AArch64 does not have a SETEND instruction; endianness
> +         * for usermode is fixed at compile-time.
> +         */
> +# ifdef TARGET_WORDS_BIGENDIAN
> +        return true;
> +# else
> +        return false;
> +# endif
> +#else
> +        int cur_el = arm_current_el(env);
> +        if (cur_el == 0) {
> +            return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
> +        }
> +        return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
> +#endif
>      }
> -
> -    cur_el = arm_current_el(env);
> -
> -    if (cur_el == 0) {
> -        return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
> -    }
> -
> -    return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
>  }
>

When does this make a difference? For user-mode, we've already
dealt with the "aa32" case, so the code here is aa64-only.
In linux-user/aarch64/cpu_loop.c we set sctlr_el[1]'s E0E bit
if TARGET_WORDS_BIGENDIAN is defined, and cur_el is definitely
zero, so we should already be returning true from this function
if TARGET_WORDS_BIGENDIAN and false otherwise.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
@ 2018-08-17 16:03   ` Peter Maydell
  2018-08-17 16:51     ` Richard Henderson
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Saves about 12k code size in qemu-aarch64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 2d6d7d03aa..aedaf2631e 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1958,6 +1958,9 @@ static inline bool arm_v7m_is_handler_mode(CPUARMState *env)
>   */
>  static inline int arm_current_el(CPUARMState *env)
>  {
> +#ifdef CONFIG_USER_ONLY
> +    return 0;
> +#else
>      if (arm_feature(env, ARM_FEATURE_M)) {
>          return arm_v7m_is_handler_mode(env) ||
>              !(env->v7m.control[env->v7m.secure] & 1);
> @@ -1984,6 +1987,7 @@ static inline int arm_current_el(CPUARMState *env)
>
>          return 1;
>      }
> +#endif

Again, the #ifdeffery here should be unnecessary ? env->pstate,
env->uncached_cpsr, etc should be set so that we return the
right thing.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
@ 2018-08-17 16:03   ` Peter Maydell
  2018-08-17 16:10     ` Laurent Desnogues
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:03 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Saves about 8k code size in qemu-aarch64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h | 8 ++++++++
>  1 file changed, 8 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index aedaf2631e..ed51a2f5aa 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -918,7 +918,15 @@ void aarch64_sync_64_to_32(CPUARMState *env);
>
>  static inline bool is_a64(CPUARMState *env)
>  {
> +#ifdef CONFIG_USER_ONLY
> +# ifdef TARGET_AARCH64
> +    return true;
> +# else
> +    return false;
> +# endif
> +#else
>      return env->aarch64;
> +#endif
>  }

And again. I don't want to pepper the code with ifdefs if
we can do the right thing without them.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
  2018-08-17 16:03   ` Peter Maydell
@ 2018-08-17 16:10     ` Laurent Desnogues
  2018-08-17 16:23       ` Peter Maydell
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Desnogues @ 2018-08-17 16:10 UTC (permalink / raw)
  To: Peter Maydell; +Cc: Richard Henderson, qemu-devel@nongnu.org, Alex Bennée

Hello,

On Fri, Aug 17, 2018 at 6:04 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> > Saves about 8k code size in qemu-aarch64.
> >
> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > ---
> >  target/arm/cpu.h | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index aedaf2631e..ed51a2f5aa 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -918,7 +918,15 @@ void aarch64_sync_64_to_32(CPUARMState *env);
> >
> >  static inline bool is_a64(CPUARMState *env)
> >  {
> > +#ifdef CONFIG_USER_ONLY
> > +# ifdef TARGET_AARCH64
> > +    return true;
> > +# else
> > +    return false;
> > +# endif
> > +#else
> >      return env->aarch64;
> > +#endif
> >  }
>
> And again. I don't want to pepper the code with ifdefs if
> we can do the right thing without them.

FWIW I find it more readable with the ifdef's (here and in the
previous patches) and I guess that helps the compiler too.

Thanks,

Laurent

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
@ 2018-08-17 16:22   ` Peter Maydell
  2018-08-25 19:41     ` Richard Henderson
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:22 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> SVE vector length can change when changing EL, or when writing
> to one of the ZCR_ELn registers.
>
> For correctness, our implementation requires that predicate bits
> that are inaccessible are never set.  Which means noticing length
> changes and zeroing the appropriate register bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---


> +/*
> + * Notice a change in SVE vector size when changing EL.
> + */
> +void aarch64_sve_change_el(CPUARMState *env, int old_el, int new_el)
> +{
> +    int old_len, new_len;
> +
> +    /* Nothing to do if no SVE.  */
> +    if (!arm_feature(env, ARM_FEATURE_SVE)) {
> +        return;
> +    }
> +
> +    /* Nothing to do if FP is disabled in either EL.  */
> +    if (fp_exception_el(env, old_el) || fp_exception_el(env, new_el)) {
> +        return;
> +    }
> +
> +    /*
> +     * When FP is enabled, but SVE is disabled, the effective len is 0.
> +     * ??? How should sve_exception_el interact with AArch32 state?
> +     * That isn't included in the CheckSVEEnabled pseudocode, so is the
> +     * host kernel required to explicitly disable SVE for an EL using aa32?
> +     */

I'm not clear what you're asking here. If the EL is AArch32
then why does it make a difference if SVE is enabled or disabled?
You can't get at it...

> +    old_len = (sve_exception_el(env, old_el)
> +               ? 0 : sve_zcr_len_for_el(env, old_el));
> +    new_len = (sve_exception_el(env, new_el)
> +               ? 0 : sve_zcr_len_for_el(env, new_el));
> +
> +    /* When changing vector length, clear inaccessible state.  */
> +    if (new_len < old_len) {
> +        aarch64_sve_narrow_vq(env, new_len + 1);
> +    }
> +}
> +#endif

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
  2018-08-17 16:10     ` Laurent Desnogues
@ 2018-08-17 16:23       ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:23 UTC (permalink / raw)
  To: Laurent Desnogues
  Cc: Richard Henderson, qemu-devel@nongnu.org, Alex Bennée

On 17 August 2018 at 17:10, Laurent Desnogues
<laurent.desnogues@gmail.com> wrote:
> Hello,
>
> On Fri, Aug 17, 2018 at 6:04 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>> And again. I don't want to pepper the code with ifdefs if
>> we can do the right thing without them.
>
> FWIW I find it more readable with the ifdef's (here and in the
> previous patches) and I guess that helps the compiler too.

Hmm. I prefer to think of user-mode as a funny variant on
system emulation where we make the minimal changes required,
and mostly work just by having the CPU being in the state it
would be for system-emulation EL0 and not being able to get out
of it.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
@ 2018-08-17 16:35   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Use the existing helpers to determine if (1) the fpu is enabled,
> (2) sve state is enabled, and (3) the current sve vector length.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h           | 4 ++++
>  target/arm/helper.c        | 6 +++---
>  target/arm/translate-a64.c | 8 ++++++--
>  3 files changed, 13 insertions(+), 5 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

I can't think of a reason why we'd want to look at the FPU
state with the FPU disabled, so I guess this is ok...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
  2018-08-17 16:02   ` Peter Maydell
@ 2018-08-17 16:47     ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-17 16:47 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 08/17/2018 09:02 AM, Peter Maydell wrote:
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Unlike aa32, endianness cannot be adjusted by userland in aa64.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/cpu.h | 27 +++++++++++++++++----------
>>  1 file changed, 17 insertions(+), 10 deletions(-)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 9526ed27cb..2d6d7d03aa 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -2709,8 +2709,6 @@ static inline bool arm_sctlr_b(CPUARMState *env)
>>  /* Return true if the processor is in big-endian mode. */
>>  static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
>>  {
>> -    int cur_el;
>> -
>>      /* In 32bit endianness is determined by looking at CPSR's E bit */
>>      if (!is_a64(env)) {
>>          return
>> @@ -2729,15 +2727,24 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
>>              arm_sctlr_b(env) ||
>>  #endif
>>                  ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
>> +    } else {
>> +#ifdef CONFIG_USER_ONLY
>> +        /* AArch64 does not have a SETEND instruction; endianness
>> +         * for usermode is fixed at compile-time.
>> +         */
>> +# ifdef TARGET_WORDS_BIGENDIAN
>> +        return true;
>> +# else
>> +        return false;
>> +# endif
>> +#else
>> +        int cur_el = arm_current_el(env);
>> +        if (cur_el == 0) {
>> +            return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
>> +        }
>> +        return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
>> +#endif
>>      }
>> -
>> -    cur_el = arm_current_el(env);
>> -
>> -    if (cur_el == 0) {
>> -        return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
>> -    }
>> -
>> -    return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
>>  }
>>
> 
> When does this make a difference? For user-mode, we've already
> dealt with the "aa32" case, so the code here is aa64-only.
> In linux-user/aarch64/cpu_loop.c we set sctlr_el[1]'s E0E bit
> if TARGET_WORDS_BIGENDIAN is defined, and cur_el is definitely
> zero, so we should already be returning true from this function
> if TARGET_WORDS_BIGENDIAN and false otherwise.

I should have re-ordered this after the other following
simplifications to see if it still matters.  But I was
after a code-size reduction.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only
  2018-08-17 16:03   ` Peter Maydell
@ 2018-08-17 16:51     ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-17 16:51 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 08/17/2018 09:03 AM, Peter Maydell wrote:
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Saves about 12k code size in qemu-aarch64.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/cpu.h | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 2d6d7d03aa..aedaf2631e 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -1958,6 +1958,9 @@ static inline bool arm_v7m_is_handler_mode(CPUARMState *env)
>>   */
>>  static inline int arm_current_el(CPUARMState *env)
>>  {
>> +#ifdef CONFIG_USER_ONLY
>> +    return 0;
>> +#else
>>      if (arm_feature(env, ARM_FEATURE_M)) {
>>          return arm_v7m_is_handler_mode(env) ||
>>              !(env->v7m.control[env->v7m.secure] & 1);
>> @@ -1984,6 +1987,7 @@ static inline int arm_current_el(CPUARMState *env)
>>
>>          return 1;
>>      }
>> +#endif
> 
> Again, the #ifdeffery here should be unnecessary ? env->pstate,
> env->uncached_cpsr, etc should be set so that we return the
> right thing.

We get the right result, but you should have a look at how large the expansion
of this function is.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (20 preceding siblings ...)
  2018-08-09  5:48 ` [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Laurent Desnogues
@ 2018-08-18  9:15 ` no-reply
  2018-08-18 10:01 ` no-reply
  22 siblings, 0 replies; 51+ messages in thread
From: no-reply @ 2018-08-18  9:15 UTC (permalink / raw)
  To: richard.henderson
  Cc: famz, qemu-devel, laurent.desnogues, peter.maydell, alex.bennee

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20180809042206.15726-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
35e9c8f21d target/arm: Pass TCGMemOpIdx to sve memory helpers
6b726db20c target/arm: Rewrite vector gather first-fault loads
42f47a681a target/arm: Rewrite vector gather stores
ebdf36bdd6 target/arm: Rewrite vector gather loads
bea4bda9bd target/arm: Split contiguous stores for endianness
f74a87d369 target/arm: Split contiguous loads for endianness
522240fd71 target/arm: Rewrite helper_sve_st[1234]*_r
291c9a4079 target/arm: Rewrite helper_sve_ld[234]*_r
761fe6b96c target/arm: Rewrite helper_sve_ld1*_r using pages
439f82f39c target/arm: Clear unused predicate bits for LD1RQ
9f664be291 target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
72b8c608a0 target/arm: Handle SVE vector length changes in system mode
4d25343973 target/arm: Pass in current_el to fp and sve_exception_el
f63e45c476 target/arm: Fix is_a64 for user-only
77c7e3327f target/arm: Fix arm_current_el for user-only
065eea0432 target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
4fb82ef6d0 target/arm: Adjust sve_exception_el
bc8fa3f868 target/arm: Define ID_AA64ZFR0_EL1
3233806b21 target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
2989413056 target/arm: Set ISAR bits for -cpu max

=== OUTPUT BEGIN ===
Checking PATCH 1/20: target/arm: Set ISAR bits for -cpu max...
Checking PATCH 2/20: target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max...
Checking PATCH 3/20: target/arm: Define ID_AA64ZFR0_EL1...
Checking PATCH 4/20: target/arm: Adjust sve_exception_el...
ERROR: return is not a function, parentheses are not required
#57: FILE: target/arm/helper.c:4367:
+            return (arm_feature(env, ARM_FEATURE_EL2)

total: 1 errors, 0 warnings, 113 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 5/20: target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only...
Checking PATCH 6/20: target/arm: Fix arm_current_el for user-only...
Checking PATCH 7/20: target/arm: Fix is_a64 for user-only...
Checking PATCH 8/20: target/arm: Pass in current_el to fp and sve_exception_el...
Checking PATCH 9/20: target/arm: Handle SVE vector length changes in system mode...
Checking PATCH 10/20: target/arm: Adjust aarch64_cpu_dump_state for system mode SVE...
Checking PATCH 11/20: target/arm: Clear unused predicate bits for LD1RQ...
Checking PATCH 12/20: target/arm: Rewrite helper_sve_ld1*_r using pages...
Checking PATCH 13/20: target/arm: Rewrite helper_sve_ld[234]*_r...
Checking PATCH 14/20: target/arm: Rewrite helper_sve_st[1234]*_r...
ERROR: spaces required around that '*' (ctx:WxV)
#215: FILE: target/arm/sve_helper.c:4825:
+                      sve_st1_tlb_fn *tlb_fn)
                                      ^

total: 1 errors, 0 warnings, 392 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 15/20: target/arm: Split contiguous loads for endianness...
Checking PATCH 16/20: target/arm: Split contiguous stores for endianness...
Checking PATCH 17/20: target/arm: Rewrite vector gather loads...
Checking PATCH 18/20: target/arm: Rewrite vector gather stores...
Checking PATCH 19/20: target/arm: Rewrite vector gather first-fault loads...
ERROR: spaces required around that '*' (ctx:WxV)
#292: FILE: target/arm/sve_helper.c:5216:
+                                zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
                                             ^

ERROR: spaces required around that '*' (ctx:WxV)
#292: FILE: target/arm/sve_helper.c:5216:
+                                zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
                                                                     ^

ERROR: spaces required around that '*' (ctx:WxV)
#293: FILE: target/arm/sve_helper.c:5217:
+                                sve_ld1_nf_fn *nonfault_fn)
                                               ^

total: 3 errors, 0 warnings, 573 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 20/20: target/arm: Pass TCGMemOpIdx to sve memory helpers...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
  2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
                   ` (21 preceding siblings ...)
  2018-08-18  9:15 ` no-reply
@ 2018-08-18 10:01 ` no-reply
  22 siblings, 0 replies; 51+ messages in thread
From: no-reply @ 2018-08-18 10:01 UTC (permalink / raw)
  To: richard.henderson
  Cc: famz, qemu-devel, laurent.desnogues, peter.maydell, alex.bennee

Hi,

This series failed docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

Type: series
Message-id: 20180809042206.15726-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-quick@centos7 SHOW_ENV=1 J=8
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
35e9c8f21d target/arm: Pass TCGMemOpIdx to sve memory helpers
6b726db20c target/arm: Rewrite vector gather first-fault loads
42f47a681a target/arm: Rewrite vector gather stores
ebdf36bdd6 target/arm: Rewrite vector gather loads
bea4bda9bd target/arm: Split contiguous stores for endianness
f74a87d369 target/arm: Split contiguous loads for endianness
522240fd71 target/arm: Rewrite helper_sve_st[1234]*_r
291c9a4079 target/arm: Rewrite helper_sve_ld[234]*_r
761fe6b96c target/arm: Rewrite helper_sve_ld1*_r using pages
439f82f39c target/arm: Clear unused predicate bits for LD1RQ
9f664be291 target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
72b8c608a0 target/arm: Handle SVE vector length changes in system mode
4d25343973 target/arm: Pass in current_el to fp and sve_exception_el
f63e45c476 target/arm: Fix is_a64 for user-only
77c7e3327f target/arm: Fix arm_current_el for user-only
065eea0432 target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
4fb82ef6d0 target/arm: Adjust sve_exception_el
bc8fa3f868 target/arm: Define ID_AA64ZFR0_EL1
3233806b21 target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
2989413056 target/arm: Set ISAR bits for -cpu max

=== OUTPUT BEGIN ===
  BUILD   centos7
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-6uc3dv26/src'
  GEN     /var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
  COPY    RUNNER
    RUN test-quick in qemu:centos7 
Packages installed:
SDL-devel-1.2.15-14.el7.x86_64
bison-3.0.4-1.el7.x86_64
bzip2-devel-1.0.6-13.el7.x86_64
ccache-3.3.4-1.el7.x86_64
csnappy-devel-0-6.20150729gitd7bc683.el7.x86_64
flex-2.5.37-3.el7.x86_64
gcc-4.8.5-16.el7_4.2.x86_64
gettext-0.19.8.1-2.el7.x86_64
git-1.8.3.1-12.el7_4.x86_64
glib2-devel-2.50.3-3.el7.x86_64
libepoxy-devel-1.3.1-1.el7.x86_64
libfdt-devel-1.4.6-1.el7.x86_64
lzo-devel-2.06-8.el7.x86_64
make-3.82-23.el7.x86_64
mesa-libEGL-devel-17.0.1-6.20170307.el7.x86_64
mesa-libgbm-devel-17.0.1-6.20170307.el7.x86_64
package g++ is not installed
package librdmacm-devel is not installed
pixman-devel-0.34.0-1.el7.x86_64
spice-glib-devel-0.33-6.el7_4.1.x86_64
spice-server-devel-0.12.8-2.el7.1.x86_64
tar-1.26-32.el7.x86_64
vte-devel-0.28.2-10.el7.x86_64
xen-devel-4.6.6-10.el7.x86_64
zlib-devel-1.2.7-17.el7.x86_64

Environment variables:
PACKAGES=bison     bzip2-devel     ccache     csnappy-devel     flex     g++     gcc     gettext     git     glib2-devel     libepoxy-devel     libfdt-devel     librdmacm-devel     lzo-devel     make     mesa-libEGL-devel     mesa-libgbm-devel     pixman-devel     SDL-devel     spice-glib-devel     spice-server-devel     tar     vte-devel     xen-devel     zlib-devel
HOSTNAME=78baf9183a69
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/home/patchew
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env

Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/tmp/qemu-test/install
No C++ compiler available; disabling C++ specific optional code
Install prefix    /tmp/qemu-test/install
BIOS directory    /tmp/qemu-test/install/share/qemu
firmware path     /tmp/qemu-test/install/share/qemu-firmware
binary directory  /tmp/qemu-test/install/bin
library directory /tmp/qemu-test/install/lib
module directory  /tmp/qemu-test/install/lib/qemu
libexec directory /tmp/qemu-test/install/libexec
include directory /tmp/qemu-test/install/include
config directory  /tmp/qemu-test/install/etc
local state directory   /tmp/qemu-test/install/var
Manual directory  /tmp/qemu-test/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /tmp/qemu-test/src
GIT binary        git
GIT submodules    
C compiler        cc
Host C compiler   cc
C++ compiler      
Objective-C compiler cc
ARFLAGS           rv
CFLAGS            -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g 
QEMU_CFLAGS       -I/usr/include/pixman-1    -Werror -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include   -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv  -Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -Wno-missing-braces -I/usr/include/libpng15     -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/nss3 -I/usr/include/nspr4 -I/usr/include/spice-1  
LDFLAGS           -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g 
QEMU_LDFLAGS       
make              make
install           install
python            python -B
smbd              /usr/sbin/smbd
module support    no
host CPU          x86_64
host big endian   no
target list       x86_64-softmmu aarch64-softmmu
gprof enabled     no
sparse enabled    no
strip binaries    yes
profiler          no
static build      no
SDL support       yes (1.2.15)
GTK support       yes (2.24.31)
GTK GL support    no
VTE support       yes (0.28.2)
TLS priority      NORMAL
GNUTLS support    no
GNUTLS rnd        no
libgcrypt         no
libgcrypt kdf     no
nettle            no 
nettle kdf        no
libtasn1          no
curses support    yes
virgl support     no 
curl support      no
mingw32 support   no
Audio drivers     oss
Block whitelist (rw) 
Block whitelist (ro) 
VirtFS support    no
Multipath support no
VNC support       yes
VNC SASL support  no
VNC JPEG support  no
VNC PNG support   yes
xen support       yes
xen ctrl version  40600
pv dom build      no
brlapi support    no
bluez  support    no
Documentation     no
PIE               yes
vde support       no
netmap support    no
Linux AIO support no
ATTR/XATTR support yes
Install blobs     yes
KVM support       yes
HAX support       no
HVF support       no
WHPX support      no
TCG support       yes
TCG debug enabled no
TCG interpreter   no
malloc trim support yes
RDMA support      yes
fdt support       system
membarrier        no
preadv support    yes
fdatasync         yes
madvise           yes
posix_madvise     yes
posix_memalign    yes
libcap-ng support no
vhost-net support yes
vhost-crypto support yes
vhost-scsi support yes
vhost-vsock support yes
vhost-user support yes
Trace backends    log
spice support     yes (0.12.12/0.12.8)
rbd support       no
xfsctl support    no
smartcard support yes
libusb            no
usb net redir     no
OpenGL support    yes
OpenGL dmabufs    yes
libiscsi support  no
libnfs support    no
build guest agent yes
QGA VSS support   no
QGA w32 disk info no
QGA MSI support   no
seccomp support   no
coroutine backend ucontext
coroutine pool    yes
debug stack usage no
mutex debugging   no
crypto afalg      no
GlusterFS support no
gcov              gcov
gcov enabled      no
TPM support       yes
libssh2 support   no
TPM passthrough   yes
TPM emulator      yes
QOM debugging     yes
Live block migration yes
lzo support       yes
snappy support    no
bzip2 support     yes
NUMA host support no
libxml2           no
tcmalloc support  no
jemalloc support  no
avx2 optimization yes
replication support yes
VxHS block device no
capstone          no
docker            no

WARNING: Use of GTK 2.0 is deprecated and will be removed in
WARNING: future releases. Please switch to using GTK 3.0

WARNING: Use of SDL 1.2 is deprecated and will be removed in
WARNING: future releases. Please switch to using SDL 2.0

NOTE: cross-compilers enabled:  'cc'
  GEN     x86_64-softmmu/config-devices.mak.tmp
  GEN     aarch64-softmmu/config-devices.mak.tmp
  GEN     config-host.h
  GEN     qapi-gen
  GEN     qemu-options.def
  GEN     trace/generated-tcg-tracers.h
  GEN     trace/generated-helpers-wrappers.h
  GEN     trace/generated-helpers.h
  GEN     trace/generated-helpers.c
  GEN     module_block.h
  GEN     x86_64-softmmu/config-devices.mak
  GEN     aarch64-softmmu/config-devices.mak
  GEN     ui/input-keymap-atset1-to-qcode.c
  GEN     ui/input-keymap-linux-to-qcode.c
  GEN     ui/input-keymap-qcode-to-atset1.c
  GEN     ui/input-keymap-qcode-to-atset2.c
  GEN     ui/input-keymap-qcode-to-atset3.c
  GEN     ui/input-keymap-qcode-to-linux.c
  GEN     ui/input-keymap-qcode-to-qnum.c
  GEN     ui/input-keymap-qcode-to-sun.c
  GEN     ui/input-keymap-qnum-to-qcode.c
  GEN     ui/input-keymap-usb-to-qcode.c
  GEN     ui/input-keymap-win32-to-qcode.c
  GEN     ui/input-keymap-x11-to-qcode.c
  GEN     ui/input-keymap-xorgevdev-to-qcode.c
  GEN     ui/input-keymap-xorgxquartz-to-qcode.c
  GEN     ui/input-keymap-xorgkbd-to-qcode.c
  GEN     ui/input-keymap-xorgxwin-to-qcode.c
  GEN     ui/input-keymap-osx-to-qcode.c
  GEN     tests/test-qapi-gen
  GEN     trace-root.h
  GEN     accel/kvm/trace.h
  GEN     accel/tcg/trace.h
  GEN     audio/trace.h
  GEN     block/trace.h
  GEN     chardev/trace.h
  GEN     crypto/trace.h
  GEN     hw/9pfs/trace.h
  GEN     hw/acpi/trace.h
  GEN     hw/alpha/trace.h
  GEN     hw/arm/trace.h
  GEN     hw/audio/trace.h
  GEN     hw/block/trace.h
  GEN     hw/block/dataplane/trace.h
  GEN     hw/char/trace.h
  GEN     hw/display/trace.h
  GEN     hw/dma/trace.h
  GEN     hw/hppa/trace.h
  GEN     hw/i2c/trace.h
  GEN     hw/i386/trace.h
  GEN     hw/i386/xen/trace.h
  GEN     hw/ide/trace.h
  GEN     hw/input/trace.h
  GEN     hw/intc/trace.h
  GEN     hw/isa/trace.h
  GEN     hw/mem/trace.h
  GEN     hw/misc/trace.h
  GEN     hw/misc/macio/trace.h
  GEN     hw/net/trace.h
  GEN     hw/nvram/trace.h
  GEN     hw/pci/trace.h
  GEN     hw/pci-host/trace.h
  GEN     hw/ppc/trace.h
  GEN     hw/rdma/trace.h
  GEN     hw/rdma/vmw/trace.h
  GEN     hw/s390x/trace.h
  GEN     hw/scsi/trace.h
  GEN     hw/sd/trace.h
  GEN     hw/sparc/trace.h
  GEN     hw/sparc64/trace.h
  GEN     hw/timer/trace.h
  GEN     hw/tpm/trace.h
  GEN     hw/usb/trace.h
  GEN     hw/vfio/trace.h
  GEN     hw/virtio/trace.h
  GEN     hw/xen/trace.h
  GEN     io/trace.h
  GEN     linux-user/trace.h
  GEN     migration/trace.h
  GEN     nbd/trace.h
  GEN     net/trace.h
  GEN     qapi/trace.h
  GEN     qom/trace.h
  GEN     scsi/trace.h
  GEN     target/arm/trace.h
  GEN     target/i386/trace.h
  GEN     target/mips/trace.h
  GEN     target/ppc/trace.h
  GEN     target/s390x/trace.h
  GEN     target/sparc/trace.h
  GEN     ui/trace.h
  GEN     util/trace.h
  GEN     trace-root.c
  GEN     accel/kvm/trace.c
  GEN     accel/tcg/trace.c
  GEN     audio/trace.c
  GEN     block/trace.c
  GEN     chardev/trace.c
  GEN     crypto/trace.c
  GEN     hw/9pfs/trace.c
  GEN     hw/acpi/trace.c
  GEN     hw/alpha/trace.c
  GEN     hw/arm/trace.c
  GEN     hw/audio/trace.c
  GEN     hw/block/trace.c
  GEN     hw/block/dataplane/trace.c
  GEN     hw/char/trace.c
  GEN     hw/display/trace.c
  GEN     hw/dma/trace.c
  GEN     hw/hppa/trace.c
  GEN     hw/i2c/trace.c
  GEN     hw/i386/trace.c
  GEN     hw/i386/xen/trace.c
  GEN     hw/ide/trace.c
  GEN     hw/input/trace.c
  GEN     hw/intc/trace.c
  GEN     hw/isa/trace.c
  GEN     hw/mem/trace.c
  GEN     hw/misc/trace.c
  GEN     hw/misc/macio/trace.c
  GEN     hw/net/trace.c
  GEN     hw/nvram/trace.c
  GEN     hw/pci/trace.c
  GEN     hw/pci-host/trace.c
  GEN     hw/ppc/trace.c
  GEN     hw/rdma/trace.c
  GEN     hw/rdma/vmw/trace.c
  GEN     hw/s390x/trace.c
  GEN     hw/scsi/trace.c
  GEN     hw/sd/trace.c
  GEN     hw/sparc/trace.c
  GEN     hw/sparc64/trace.c
  GEN     hw/timer/trace.c
  GEN     hw/tpm/trace.c
  GEN     hw/usb/trace.c
  GEN     hw/vfio/trace.c
  GEN     hw/virtio/trace.c
  GEN     hw/xen/trace.c
  GEN     io/trace.c
  GEN     linux-user/trace.c
  GEN     migration/trace.c
  GEN     nbd/trace.c
  GEN     net/trace.c
  GEN     qapi/trace.c
  GEN     qom/trace.c
  GEN     scsi/trace.c
  GEN     target/arm/trace.c
  GEN     target/i386/trace.c
  GEN     target/mips/trace.c
  GEN     target/ppc/trace.c
  GEN     target/s390x/trace.c
  GEN     target/sparc/trace.c
  GEN     ui/trace.c
  GEN     util/trace.c
  GEN     config-all-devices.mak
  CC      tests/qemu-iotests/socket_scm_helper.o
  GEN     qga/qapi-generated/qapi-gen
  CC      qapi/qapi-types.o
  CC      qapi/qapi-builtin-types.o
  CC      qapi/qapi-types-block-core.o
  CC      qapi/qapi-types-block.o
  CC      qapi/qapi-types-char.o
  CC      qapi/qapi-types-common.o
  CC      qapi/qapi-types-introspect.o
  CC      qapi/qapi-types-crypto.o
  CC      qapi/qapi-types-job.o
  CC      qapi/qapi-types-migration.o
  CC      qapi/qapi-types-misc.o
  CC      qapi/qapi-types-net.o
  CC      qapi/qapi-types-rocker.o
  CC      qapi/qapi-types-run-state.o
  CC      qapi/qapi-types-sockets.o
  CC      qapi/qapi-types-tpm.o
  CC      qapi/qapi-types-trace.o
  CC      qapi/qapi-types-transaction.o
  CC      qapi/qapi-types-ui.o
  CC      qapi/qapi-builtin-visit.o
  CC      qapi/qapi-visit.o
  CC      qapi/qapi-visit-block-core.o
  CC      qapi/qapi-visit-block.o
  CC      qapi/qapi-visit-char.o
  CC      qapi/qapi-visit-common.o
  CC      qapi/qapi-visit-crypto.o
  CC      qapi/qapi-visit-introspect.o
  CC      qapi/qapi-visit-job.o
  CC      qapi/qapi-visit-migration.o
  CC      qapi/qapi-visit-misc.o
  CC      qapi/qapi-visit-net.o
  CC      qapi/qapi-visit-rocker.o
  CC      qapi/qapi-visit-run-state.o
  CC      qapi/qapi-visit-sockets.o
  CC      qapi/qapi-visit-tpm.o
  CC      qapi/qapi-visit-trace.o
  CC      qapi/qapi-visit-transaction.o
  CC      qapi/qapi-visit-ui.o
  CC      qapi/qapi-events.o
  CC      qapi/qapi-events-block-core.o
  CC      qapi/qapi-events-block.o
  CC      qapi/qapi-events-char.o
  CC      qapi/qapi-events-common.o
  CC      qapi/qapi-events-crypto.o
  CC      qapi/qapi-events-introspect.o
  CC      qapi/qapi-events-job.o
  CC      qapi/qapi-events-migration.o
  CC      qapi/qapi-events-misc.o
  CC      qapi/qapi-events-net.o
  CC      qapi/qapi-events-rocker.o
  CC      qapi/qapi-events-run-state.o
  CC      qapi/qapi-events-sockets.o
  CC      qapi/qapi-events-tpm.o
  CC      qapi/qapi-events-trace.o
  CC      qapi/qapi-events-transaction.o
  CC      qapi/qapi-events-ui.o
  CC      qapi/qapi-introspect.o
  CC      qapi/qapi-visit-core.o
  CC      qapi/qapi-dealloc-visitor.o
  CC      qapi/qobject-input-visitor.o
  CC      qapi/qobject-output-visitor.o
  CC      qapi/qmp-registry.o
  CC      qapi/qmp-dispatch.o
  CC      qapi/string-input-visitor.o
  CC      qapi/string-output-visitor.o
  CC      qapi/opts-visitor.o
  CC      qapi/qapi-clone-visitor.o
  CC      qapi/qmp-event.o
  CC      qapi/qapi-util.o
  CC      qobject/qnull.o
  CC      qobject/qnum.o
  CC      qobject/qstring.o
  CC      qobject/qdict.o
  CC      qobject/qlist.o
  CC      qobject/qbool.o
  CC      qobject/qlit.o
  CC      qobject/qjson.o
  CC      qobject/qobject.o
  CC      qobject/json-lexer.o
  CC      qobject/json-streamer.o
  CC      qobject/json-parser.o
  CC      qobject/block-qdict.o
  CC      trace/control.o
  CC      trace/qmp.o
  CC      util/osdep.o
  CC      util/cutils.o
  CC      util/unicode.o
  CC      util/qemu-timer-common.o
  CC      util/bufferiszero.o
  CC      util/lockcnt.o
  CC      util/aiocb.o
  CC      util/async.o
  CC      util/aio-wait.o
  CC      util/thread-pool.o
  CC      util/qemu-timer.o
  CC      util/main-loop.o
  CC      util/iohandler.o
  CC      util/aio-posix.o
  CC      util/compatfd.o
  CC      util/event_notifier-posix.o
  CC      util/mmap-alloc.o
  CC      util/oslib-posix.o
  CC      util/qemu-openpty.o
  CC      util/qemu-thread-posix.o
  CC      util/memfd.o
  CC      util/envlist.o
  CC      util/path.o
  CC      util/module.o
  CC      util/host-utils.o
  CC      util/bitmap.o
  CC      util/bitops.o
  CC      util/hbitmap.o
  CC      util/fifo8.o
  CC      util/acl.o
  CC      util/cacheinfo.o
  CC      util/error.o
  CC      util/qemu-error.o
  CC      util/id.o
  CC      util/iov.o
  CC      util/qemu-config.o
  CC      util/qemu-sockets.o
  CC      util/uri.o
  CC      util/notify.o
  CC      util/qemu-option.o
  CC      util/qemu-progress.o
  CC      util/keyval.o
  CC      util/hexdump.o
  CC      util/crc32c.o
  CC      util/uuid.o
  CC      util/throttle.o
  CC      util/getauxval.o
  CC      util/readline.o
  CC      util/rcu.o
  CC      util/qemu-coroutine.o
  CC      util/qemu-coroutine-lock.o
  CC      util/qemu-coroutine-io.o
  CC      util/qemu-coroutine-sleep.o
  CC      util/coroutine-ucontext.o
  CC      util/buffer.o
  CC      util/timed-average.o
  CC      util/base64.o
  CC      util/log.o
  CC      util/pagesize.o
  CC      util/qdist.o
  CC      util/qht.o
  CC      util/range.o
  CC      util/stats64.o
  CC      util/systemd.o
  CC      util/iova-tree.o
  CC      util/vfio-helpers.o
  CC      trace-root.o
  CC      accel/kvm/trace.o
  CC      accel/tcg/trace.o
  CC      audio/trace.o
  CC      block/trace.o
  CC      chardev/trace.o
  CC      crypto/trace.o
  CC      hw/9pfs/trace.o
  CC      hw/acpi/trace.o
  CC      hw/alpha/trace.o
  CC      hw/arm/trace.o
  CC      hw/audio/trace.o
  CC      hw/block/trace.o
  CC      hw/block/dataplane/trace.o
  CC      hw/char/trace.o
  CC      hw/display/trace.o
  CC      hw/dma/trace.o
  CC      hw/hppa/trace.o
  CC      hw/i2c/trace.o
  CC      hw/i386/trace.o
  CC      hw/i386/xen/trace.o
  CC      hw/ide/trace.o
  CC      hw/input/trace.o
  CC      hw/intc/trace.o
  CC      hw/isa/trace.o
  CC      hw/mem/trace.o
  CC      hw/misc/trace.o
  CC      hw/misc/macio/trace.o
  CC      hw/net/trace.o
  CC      hw/nvram/trace.o
  CC      hw/pci/trace.o
  CC      hw/pci-host/trace.o
  CC      hw/ppc/trace.o
  CC      hw/rdma/trace.o
  CC      hw/rdma/vmw/trace.o
  CC      hw/s390x/trace.o
  CC      hw/scsi/trace.o
  CC      hw/sd/trace.o
  CC      hw/sparc/trace.o
  CC      hw/sparc64/trace.o
  CC      hw/timer/trace.o
  CC      hw/tpm/trace.o
  CC      hw/usb/trace.o
  CC      hw/vfio/trace.o
  CC      hw/virtio/trace.o
  CC      hw/xen/trace.o
  CC      io/trace.o
  CC      linux-user/trace.o
  CC      migration/trace.o
  CC      nbd/trace.o
  CC      net/trace.o
  CC      qapi/trace.o
  CC      qom/trace.o
  CC      scsi/trace.o
  CC      target/arm/trace.o
  CC      target/i386/trace.o
  CC      target/mips/trace.o
  CC      target/ppc/trace.o
  CC      target/s390x/trace.o
  CC      target/sparc/trace.o
  CC      ui/trace.o
  CC      util/trace.o
  CC      crypto/pbkdf-stub.o
  CC      stubs/arch-query-cpu-def.o
  CC      stubs/arch-query-cpu-model-expansion.o
  CC      stubs/arch-query-cpu-model-comparison.o
  CC      stubs/arch-query-cpu-model-baseline.o
  CC      stubs/bdrv-next-monitor-owned.o
  CC      stubs/blk-commit-all.o
  CC      stubs/blockdev-close-all-bdrv-states.o
  CC      stubs/clock-warp.o
  CC      stubs/cpu-get-clock.o
  CC      stubs/cpu-get-icount.o
  CC      stubs/dump.o
  CC      stubs/error-printf.o
  CC      stubs/fdset.o
  CC      stubs/gdbstub.o
  CC      stubs/get-vm-name.o
  CC      stubs/iothread.o
  CC      stubs/iothread-lock.o
  CC      stubs/is-daemonized.o
  CC      stubs/machine-init-done.o
  CC      stubs/migr-blocker.o
  CC      stubs/change-state-handler.o
  CC      stubs/monitor.o
  CC      stubs/notify-event.o
  CC      stubs/qtest.o
  CC      stubs/replay.o
  CC      stubs/runstate-check.o
  CC      stubs/set-fd-handler.o
  CC      stubs/slirp.o
  CC      stubs/sysbus.o
  CC      stubs/tpm.o
  CC      stubs/trace-control.o
  CC      stubs/uuid.o
  CC      stubs/vm-stop.o
  CC      stubs/vmstate.o
  CC      stubs/qmp_memory_device.o
  CC      stubs/target-monitor-defs.o
  CC      stubs/target-get-monitor-def.o
  CC      stubs/pc_madt_cpu_entry.o
  CC      stubs/vmgenid.o
  CC      stubs/xen-common.o
  CC      stubs/xen-hvm.o
  CC      stubs/pci-host-piix.o
  CC      stubs/ram-block.o
  CC      contrib/ivshmem-client/ivshmem-client.o
  CC      contrib/ivshmem-client/main.o
  CC      contrib/ivshmem-server/ivshmem-server.o
  CC      contrib/ivshmem-server/main.o
  CC      qemu-nbd.o
  CC      block.o
  CC      blockjob.o
  CC      job.o
  CC      qemu-io-cmds.o
  CC      replication.o
  CC      block/raw-format.o
  CC      block/qcow.o
  CC      block/vdi.o
  CC      block/vmdk.o
  CC      block/cloop.o
  CC      block/bochs.o
  CC      block/vpc.o
  CC      block/vvfat.o
  CC      block/dmg.o
  CC      block/qcow2.o
  CC      block/qcow2-refcount.o
  CC      block/qcow2-cluster.o
  CC      block/qcow2-snapshot.o
  CC      block/qcow2-cache.o
  CC      block/qcow2-bitmap.o
  CC      block/qed.o
  CC      block/qed-table.o
  CC      block/qed-l2-cache.o
  CC      block/qed-cluster.o
  CC      block/qed-check.o
  CC      block/vhdx.o
  CC      block/vhdx-endian.o
  CC      block/vhdx-log.o
  CC      block/quorum.o
  CC      block/parallels.o
  CC      block/blkdebug.o
  CC      block/blkverify.o
  CC      block/blkreplay.o
  CC      block/blklogwrites.o
  CC      block/block-backend.o
  CC      block/snapshot.o
  CC      block/qapi.o
  CC      block/file-posix.o
  CC      block/null.o
  CC      block/mirror.o
  CC      block/commit.o
  CC      block/io.o
  CC      block/create.o
  CC      block/throttle-groups.o
  CC      block/nvme.o
  CC      block/nbd.o
  CC      block/nbd-client.o
  CC      block/sheepdog.o
  CC      block/accounting.o
  CC      block/dirty-bitmap.o
  CC      block/write-threshold.o
  CC      block/backup.o
  CC      block/replication.o
  CC      block/throttle.o
  CC      block/copy-on-read.o
  CC      block/crypto.o
  CC      nbd/server.o
  CC      nbd/client.o
  CC      nbd/common.o
  CC      scsi/utils.o
  CC      scsi/pr-manager.o
  CC      scsi/pr-manager-helper.o
  CC      block/dmg-bz2.o
  CC      crypto/init.o
  CC      crypto/hash.o
  CC      crypto/hash-glib.o
  CC      crypto/hmac.o
  CC      crypto/hmac-glib.o
  CC      crypto/aes.o
  CC      crypto/desrfb.o
  CC      crypto/cipher.o
  CC      crypto/tlscreds.o
  CC      crypto/tlscredsanon.o
  CC      crypto/tlscredspsk.o
  CC      crypto/tlscredsx509.o
  CC      crypto/tlssession.o
  CC      crypto/secret.o
  CC      crypto/random-platform.o
  CC      crypto/pbkdf.o
  CC      crypto/ivgen.o
  CC      crypto/ivgen-essiv.o
  CC      crypto/ivgen-plain.o
  CC      crypto/ivgen-plain64.o
  CC      crypto/afsplit.o
  CC      crypto/xts.o
  CC      crypto/block.o
  CC      crypto/block-luks.o
  CC      crypto/block-qcow.o
  CC      io/channel.o
  CC      io/channel-buffer.o
  CC      io/channel-command.o
  CC      io/channel-file.o
  CC      io/channel-socket.o
  CC      io/channel-tls.o
  CC      io/channel-watch.o
  CC      io/channel-websock.o
  CC      io/channel-util.o
  CC      io/dns-resolver.o
  CC      io/net-listener.o
  CC      io/task.o
  CC      qom/object.o
  CC      qom/container.o
  CC      qom/qom-qobject.o
  CC      qom/object_interfaces.o
  GEN     qemu-img-cmds.h
  CC      qemu-io.o
  CC      scsi/qemu-pr-helper.o
  CC      qemu-bridge-helper.o
  CC      blockdev.o
  CC      blockdev-nbd.o
  CC      bootdevice.o
  CC      iothread.o
  CC      job-qmp.o
  CC      qdev-monitor.o
  CC      device-hotplug.o
  CC      os-posix.o
  CC      bt-host.o
  CC      bt-vhci.o
  CC      dma-helpers.o
  CC      vl.o
  CC      tpm.o
  CC      device_tree.o
  CC      qapi/qapi-commands.o
  CC      qapi/qapi-commands-block-core.o
  CC      qapi/qapi-commands-block.o
  CC      qapi/qapi-commands-char.o
  CC      qapi/qapi-commands-common.o
  CC      qapi/qapi-commands-crypto.o
  CC      qapi/qapi-commands-introspect.o
  CC      qapi/qapi-commands-job.o
  CC      qapi/qapi-commands-migration.o
  CC      qapi/qapi-commands-misc.o
  CC      qapi/qapi-commands-net.o
  CC      qapi/qapi-commands-rocker.o
  CC      qapi/qapi-commands-run-state.o
  CC      qapi/qapi-commands-sockets.o
  CC      qapi/qapi-commands-tpm.o
  CC      qapi/qapi-commands-trace.o
  CC      qapi/qapi-commands-transaction.o
  CC      qapi/qapi-commands-ui.o
  CC      qmp.o
  CC      hmp.o
  CC      cpus-common.o
  CC      audio/audio.o
  CC      audio/noaudio.o
  CC      audio/wavaudio.o
  CC      audio/mixeng.o
  CC      audio/spiceaudio.o
  CC      audio/wavcapture.o
  CC      backends/rng.o
  CC      backends/rng-egd.o
  CC      backends/rng-random.o
  CC      backends/tpm.o
  CC      backends/hostmem.o
  CC      backends/hostmem-ram.o
  CC      backends/hostmem-file.o
  CC      backends/cryptodev.o
  CC      backends/cryptodev-builtin.o
  CC      backends/cryptodev-vhost.o
  CC      backends/cryptodev-vhost-user.o
  CC      backends/hostmem-memfd.o
  CC      block/stream.o
  CC      chardev/msmouse.o
  CC      chardev/wctablet.o
  CC      chardev/testdev.o
  CC      chardev/spice.o
  CC      disas/arm.o
  CC      disas/i386.o
  CC      fsdev/qemu-fsdev-dummy.o
  CC      fsdev/qemu-fsdev-opts.o
  CC      fsdev/qemu-fsdev-throttle.o
  CC      hw/acpi/core.o
  CC      hw/acpi/piix4.o
  CC      hw/acpi/pcihp.o
  CC      hw/acpi/ich9.o
  CC      hw/acpi/tco.o
  CC      hw/acpi/cpu_hotplug.o
  CC      hw/acpi/memory_hotplug.o
  CC      hw/acpi/cpu.o
  CC      hw/acpi/nvdimm.o
  CC      hw/acpi/vmgenid.o
  CC      hw/acpi/acpi_interface.o
  CC      hw/acpi/bios-linker-loader.o
  CC      hw/acpi/aml-build.o
  CC      hw/acpi/ipmi.o
  CC      hw/acpi/acpi-stub.o
  CC      hw/acpi/ipmi-stub.o
  CC      hw/audio/sb16.o
  CC      hw/audio/es1370.o
  CC      hw/audio/ac97.o
  CC      hw/audio/fmopl.o
  CC      hw/audio/adlib.o
  CC      hw/audio/gus.o
  CC      hw/audio/gusemu_hal.o
  CC      hw/audio/gusemu_mixer.o
  CC      hw/audio/cs4231a.o
  CC      hw/audio/intel-hda.o
  CC      hw/audio/hda-codec.o
  CC      hw/audio/pcspk.o
  CC      hw/audio/wm8750.o
  CC      hw/audio/pl041.o
  CC      hw/audio/lm4549.o
  CC      hw/audio/marvell_88w8618.o
  CC      hw/audio/soundhw.o
  CC      hw/block/block.o
  CC      hw/block/cdrom.o
  CC      hw/block/hd-geometry.o
  CC      hw/block/fdc.o
  CC      hw/block/m25p80.o
  CC      hw/block/nand.o
  CC      hw/block/pflash_cfi01.o
  CC      hw/block/pflash_cfi02.o
  CC      hw/block/xen_disk.o
  CC      hw/block/ecc.o
  CC      hw/block/onenand.o
  CC      hw/block/nvme.o
  CC      hw/bt/core.o
  CC      hw/bt/l2cap.o
  CC      hw/bt/sdp.o
  CC      hw/bt/hci.o
  CC      hw/bt/hid.o
  CC      hw/bt/hci-csr.o
  CC      hw/char/ipoctal232.o
  CC      hw/char/parallel.o
  CC      hw/char/parallel-isa.o
  CC      hw/char/pl011.o
  CC      hw/char/serial.o
  CC      hw/char/serial-isa.o
  CC      hw/char/serial-pci.o
  CC      hw/char/virtio-console.o
  CC      hw/char/xen_console.o
  CC      hw/char/cadence_uart.o
  CC      hw/char/cmsdk-apb-uart.o
  CC      hw/char/debugcon.o
  CC      hw/char/imx_serial.o
  CC      hw/core/qdev.o
  CC      hw/core/qdev-properties.o
  CC      hw/core/bus.o
  CC      hw/core/reset.o
  CC      hw/core/qdev-fw.o
  CC      hw/core/fw-path-provider.o
  CC      hw/core/irq.o
  CC      hw/core/hotplug.o
  CC      hw/core/nmi.o
  CC      hw/core/stream.o
  CC      hw/core/ptimer.o
  CC      hw/core/sysbus.o
  CC      hw/core/machine.o
  CC      hw/core/loader.o
  CC      hw/core/qdev-properties-system.o
  CC      hw/core/register.o
  CC      hw/core/or-irq.o
  CC      hw/core/split-irq.o
  CC      hw/core/platform-bus.o
  CC      hw/cpu/core.o
  CC      hw/display/ramfb.o
  CC      hw/display/ramfb-standalone.o
  CC      hw/display/ads7846.o
  CC      hw/display/cirrus_vga.o
  CC      hw/display/pl110.o
  CC      hw/display/sii9022.o
  CC      hw/display/ssd0303.o
  CC      hw/display/ssd0323.o
  CC      hw/display/xenfb.o
  CC      hw/display/vga-pci.o
  CC      hw/display/bochs-display.o
  CC      hw/display/vga-isa.o
  CC      hw/display/vmware_vga.o
  CC      hw/display/blizzard.o
  CC      hw/display/exynos4210_fimd.o
  CC      hw/display/framebuffer.o
  CC      hw/display/tc6393xb.o
  CC      hw/display/qxl.o
  CC      hw/display/qxl-logger.o
  CC      hw/display/qxl-render.o
  CC      hw/dma/pl080.o
  CC      hw/dma/pl330.o
  CC      hw/dma/i8257.o
  CC      hw/dma/xilinx_axidma.o
  CC      hw/dma/xlnx-zynq-devcfg.o
  CC      hw/dma/xlnx-zdma.o
  CC      hw/gpio/max7310.o
  CC      hw/gpio/pl061.o
  CC      hw/gpio/zaurus.o
  CC      hw/gpio/gpio_key.o
  CC      hw/i2c/core.o
  CC      hw/i2c/smbus.o
  CC      hw/i2c/smbus_eeprom.o
  CC      hw/i2c/i2c-ddc.o
  CC      hw/i2c/versatile_i2c.o
  CC      hw/i2c/smbus_ich9.o
  CC      hw/i2c/pm_smbus.o
  CC      hw/i2c/bitbang_i2c.o
  CC      hw/i2c/exynos4210_i2c.o
  CC      hw/i2c/imx_i2c.o
  CC      hw/i2c/aspeed_i2c.o
  CC      hw/ide/core.o
  CC      hw/ide/atapi.o
  CC      hw/ide/qdev.o
  CC      hw/ide/pci.o
  CC      hw/ide/isa.o
  CC      hw/ide/piix.o
  CC      hw/ide/microdrive.o
  CC      hw/ide/ahci.o
  CC      hw/ide/ich.o
  CC      hw/ide/ahci-allwinner.o
  CC      hw/input/hid.o
  CC      hw/input/lm832x.o
  CC      hw/input/pckbd.o
  CC      hw/input/pl050.o
  CC      hw/input/ps2.o
  CC      hw/input/stellaris_input.o
  CC      hw/input/tsc2005.o
  CC      hw/input/virtio-input.o
  CC      hw/input/virtio-input-hid.o
  CC      hw/input/virtio-input-host.o
  CC      hw/intc/i8259_common.o
  CC      hw/intc/i8259.o
  CC      hw/intc/pl190.o
  CC      hw/intc/xlnx-pmu-iomod-intc.o
  CC      hw/intc/xlnx-zynqmp-ipi.o
  CC      hw/intc/imx_avic.o
  CC      hw/intc/imx_gpcv2.o
  CC      hw/intc/realview_gic.o
  CC      hw/intc/ioapic_common.o
  CC      hw/intc/arm_gic_common.o
  CC      hw/intc/arm_gic.o
  CC      hw/intc/arm_gicv2m.o
  CC      hw/intc/arm_gicv3_common.o
  CC      hw/intc/arm_gicv3.o
  CC      hw/intc/arm_gicv3_dist.o
  CC      hw/intc/arm_gicv3_redist.o
  CC      hw/intc/arm_gicv3_its_common.o
  CC      hw/intc/intc.o
  CC      hw/ipack/ipack.o
  CC      hw/ipack/tpci200.o
  CC      hw/ipmi/ipmi.o
  CC      hw/ipmi/ipmi_bmc_sim.o
  CC      hw/ipmi/ipmi_bmc_extern.o
  CC      hw/ipmi/isa_ipmi_kcs.o
  CC      hw/ipmi/isa_ipmi_bt.o
  CC      hw/isa/isa-bus.o
  CC      hw/isa/isa-superio.o
  CC      hw/isa/smc37c669-superio.o
  CC      hw/isa/apm.o
  CC      hw/mem/pc-dimm.o
  CC      hw/mem/memory-device.o
  CC      hw/mem/nvdimm.o
  CC      hw/misc/applesmc.o
  CC      hw/misc/max111x.o
  CC      hw/misc/tmp105.o
  CC      hw/misc/tmp421.o
  CC      hw/misc/debugexit.o
  CC      hw/misc/sga.o
  CC      hw/misc/pc-testdev.o
  CC      hw/misc/pci-testdev.o
  CC      hw/misc/edu.o
  CC      hw/misc/pca9552.o
  CC      hw/misc/unimp.o
  CC      hw/misc/vmcoreinfo.o
  CC      hw/misc/arm_l2x0.o
  CC      hw/misc/arm_integrator_debug.o
  CC      hw/misc/a9scu.o
  CC      hw/misc/arm11scu.o
  CC      hw/net/xen_nic.o
  CC      hw/net/ne2000.o
  CC      hw/net/eepro100.o
  CC      hw/net/pcnet-pci.o
  CC      hw/net/pcnet.o
  CC      hw/net/e1000.o
  CC      hw/net/e1000x_common.o
  CC      hw/net/net_tx_pkt.o
  CC      hw/net/net_rx_pkt.o
  CC      hw/net/e1000e.o
  CC      hw/net/e1000e_core.o
  CC      hw/net/rtl8139.o
  CC      hw/net/vmxnet3.o
  CC      hw/net/smc91c111.o
  CC      hw/net/lan9118.o
  CC      hw/net/ne2000-isa.o
  CC      hw/net/xgmac.o
  CC      hw/net/xilinx_axienet.o
  CC      hw/net/allwinner_emac.o
  CC      hw/net/imx_fec.o
  CC      hw/net/cadence_gem.o
  CC      hw/net/stellaris_enet.o
  CC      hw/net/ftgmac100.o
  CC      hw/net/rocker/rocker.o
  CC      hw/net/rocker/rocker_fp.o
  CC      hw/net/rocker/rocker_desc.o
  CC      hw/net/rocker/rocker_world.o
  CC      hw/net/rocker/rocker_of_dpa.o
  CC      hw/net/can/can_sja1000.o
  CC      hw/net/can/can_kvaser_pci.o
  CC      hw/net/can/can_pcm3680_pci.o
  CC      hw/net/can/can_mioe3680_pci.o
  CC      hw/nvram/eeprom93xx.o
  CC      hw/nvram/eeprom_at24c.o
  CC      hw/nvram/fw_cfg.o
  CC      hw/nvram/chrp_nvram.o
  CC      hw/pci-bridge/pci_bridge_dev.o
  CC      hw/pci-bridge/pcie_root_port.o
  CC      hw/pci-bridge/gen_pcie_root_port.o
  CC      hw/pci-bridge/pcie_pci_bridge.o
  CC      hw/pci-bridge/pci_expander_bridge.o
  CC      hw/pci-bridge/xio3130_upstream.o
  CC      hw/pci-bridge/xio3130_downstream.o
  CC      hw/pci-bridge/ioh3420.o
  CC      hw/pci-bridge/i82801b11.o
  CC      hw/pci-host/pam.o
  CC      hw/pci-host/versatile.o
  CC      hw/pci-host/piix.o
  CC      hw/pci-host/q35.o
  CC      hw/pci-host/gpex.o
  CC      hw/pci-host/designware.o
  CC      hw/pci/pci.o
  CC      hw/pci/pci_bridge.o
  CC      hw/pci/msix.o
  CC      hw/pci/msi.o
  CC      hw/pci/shpc.o
  CC      hw/pci/slotid_cap.o
  CC      hw/pci/pci_host.o
  CC      hw/pci/pcie_host.o
  CC      hw/pci/pcie.o
  CC      hw/pci/pcie_aer.o
  CC      hw/pci/pcie_port.o
  CC      hw/pci/pci-stub.o
  CC      hw/pcmcia/pcmcia.o
  CC      hw/scsi/scsi-disk.o
  CC      hw/scsi/scsi-generic.o
  CC      hw/scsi/scsi-bus.o
  CC      hw/scsi/lsi53c895a.o
  CC      hw/scsi/mptsas.o
  CC      hw/scsi/mptconfig.o
  CC      hw/scsi/mptendian.o
  CC      hw/scsi/megasas.o
  CC      hw/scsi/vmw_pvscsi.o
  CC      hw/scsi/esp.o
  CC      hw/scsi/esp-pci.o
  CC      hw/sd/pl181.o
  CC      hw/sd/ssi-sd.o
  CC      hw/sd/sd.o
  CC      hw/sd/core.o
  CC      hw/sd/sdmmc-internal.o
  CC      hw/sd/sdhci.o
  CC      hw/smbios/smbios.o
  CC      hw/smbios/smbios_type_38.o
  CC      hw/smbios/smbios-stub.o
  CC      hw/smbios/smbios_type_38-stub.o
  CC      hw/ssi/pl022.o
  CC      hw/ssi/ssi.o
  CC      hw/ssi/xilinx_spips.o
  CC      hw/ssi/aspeed_smc.o
  CC      hw/ssi/stm32f2xx_spi.o
  CC      hw/ssi/mss-spi.o
  CC      hw/timer/arm_timer.o
  CC      hw/timer/arm_mptimer.o
  CC      hw/timer/armv7m_systick.o
  CC      hw/timer/a9gtimer.o
  CC      hw/timer/cadence_ttc.o
  CC      hw/timer/ds1338.o
  CC      hw/timer/hpet.o
  CC      hw/timer/i8254_common.o
  CC      hw/timer/i8254.o
  CC      hw/timer/pl031.o
  CC      hw/timer/twl92230.o
  CC      hw/timer/imx_epit.o
  CC      hw/timer/imx_gpt.o
  CC      hw/timer/xlnx-zynqmp-rtc.o
  CC      hw/timer/stm32f2xx_timer.o
  CC      hw/timer/aspeed_timer.o
  CC      hw/timer/cmsdk-apb-timer.o
  CC      hw/timer/mss-timer.o
  CC      hw/tpm/tpm_util.o
  CC      hw/tpm/tpm_tis.o
  CC      hw/tpm/tpm_crb.o
  CC      hw/tpm/tpm_passthrough.o
  CC      hw/tpm/tpm_emulator.o
  CC      hw/usb/core.o
  CC      hw/usb/combined-packet.o
  CC      hw/usb/bus.o
  CC      hw/usb/libhw.o
  CC      hw/usb/desc.o
  CC      hw/usb/desc-msos.o
  CC      hw/usb/hcd-uhci.o
  CC      hw/usb/hcd-ohci.o
  CC      hw/usb/hcd-ehci.o
  CC      hw/usb/hcd-ehci-pci.o
  CC      hw/usb/hcd-ehci-sysbus.o
  CC      hw/usb/hcd-xhci.o
  CC      hw/usb/hcd-xhci-nec.o
  CC      hw/usb/hcd-musb.o
  CC      hw/usb/dev-hub.o
  CC      hw/usb/dev-hid.o
  CC      hw/usb/dev-wacom.o
  CC      hw/usb/dev-storage.o
  CC      hw/usb/dev-uas.o
  CC      hw/usb/dev-audio.o
  CC      hw/usb/dev-serial.o
  CC      hw/usb/dev-network.o
  CC      hw/usb/dev-bluetooth.o
  CC      hw/usb/dev-smartcard-reader.o
  CC      hw/usb/ccid-card-passthru.o
  CC      hw/usb/ccid-card-emulated.o
  CC      hw/usb/dev-mtp.o
  CC      hw/usb/host-stub.o
  CC      hw/virtio/virtio-bus.o
  CC      hw/virtio/virtio-rng.o
  CC      hw/virtio/virtio-pci.o
  CC      hw/virtio/virtio-mmio.o
  CC      hw/virtio/vhost-stub.o
  CC      hw/watchdog/watchdog.o
  CC      hw/watchdog/wdt_i6300esb.o
  CC      hw/watchdog/wdt_ib700.o
  CC      hw/watchdog/wdt_aspeed.o
  CC      hw/xen/xen_backend.o
  CC      hw/xen/xen_devconfig.o
  CC      hw/xen/xen_pvdev.o
  CC      hw/xen/xen-common.o
  CC      migration/migration.o
  CC      migration/socket.o
  CC      migration/fd.o
  CC      migration/exec.o
  CC      migration/tls.o
  CC      migration/channel.o
  CC      migration/savevm.o
  CC      migration/colo-comm.o
  CC      migration/colo.o
  CC      migration/colo-failover.o
  CC      migration/vmstate.o
  CC      migration/vmstate-types.o
  CC      migration/page_cache.o
  CC      migration/qemu-file.o
  CC      migration/global_state.o
  CC      migration/qemu-file-channel.o
  CC      migration/xbzrle.o
  CC      migration/postcopy-ram.o
  CC      migration/qjson.o
  CC      migration/block-dirty-bitmap.o
  CC      migration/rdma.o
  CC      migration/block.o
  CC      net/net.o
  CC      net/queue.o
  CC      net/checksum.o
  CC      net/util.o
  CC      net/hub.o
  CC      net/socket.o
  CC      net/dump.o
  CC      net/eth.o
  CC      net/l2tpv3.o
  CC      net/vhost-user.o
  CC      net/slirp.o
  CC      net/filter.o
  CC      net/filter-buffer.o
  CC      net/filter-mirror.o
  CC      net/colo-compare.o
  CC      net/colo.o
  CC      net/filter-rewriter.o
  CC      net/filter-replay.o
  CC      net/tap.o
  CC      net/tap-linux.o
  CC      net/can/can_core.o
  CC      net/can/can_host.o
  CC      net/can/can_socketcan.o
  CC      qom/cpu.o
  CC      replay/replay.o
  CC      replay/replay-internal.o
  CC      replay/replay-events.o
  CC      replay/replay-time.o
  CC      replay/replay-input.o
  CC      replay/replay-char.o
  CC      replay/replay-snapshot.o
  CC      replay/replay-net.o
  CC      replay/replay-audio.o
  CC      slirp/cksum.o
  CC      slirp/if.o
  CC      slirp/ip_icmp.o
  CC      slirp/ip6_icmp.o
  CC      slirp/ip6_input.o
  CC      slirp/ip6_output.o
  CC      slirp/ip_input.o
  CC      slirp/ip_output.o
  CC      slirp/dnssearch.o
  CC      slirp/dhcpv6.o
  CC      slirp/slirp.o
  CC      slirp/mbuf.o
  CC      slirp/misc.o
  CC      slirp/sbuf.o
  CC      slirp/socket.o
  CC      slirp/tcp_input.o
  CC      slirp/tcp_output.o
  CC      slirp/tcp_subr.o
  CC      slirp/tcp_timer.o
  CC      slirp/udp.o
  CC      slirp/udp6.o
  CC      slirp/bootp.o
  CC      slirp/tftp.o
  CC      slirp/arp_table.o
  CC      slirp/ndp_table.o
  CC      slirp/ncsi.o
  CC      ui/keymaps.o
  CC      ui/console.o
  CC      ui/cursor.o
  CC      ui/qemu-pixman.o
  CC      ui/input.o
  CC      ui/input-keymap.o
  CC      ui/input-legacy.o
  CC      ui/input-linux.o
  CC      ui/spice-core.o
  CC      ui/spice-input.o
  CC      ui/spice-display.o
  CC      ui/vnc.o
  CC      ui/vnc-enc-zlib.o
  CC      ui/vnc-enc-hextile.o
  CC      ui/vnc-enc-tight.o
  CC      ui/vnc-palette.o
  CC      ui/vnc-enc-zrle.o
  CC      ui/vnc-auth-vencrypt.o
  CC      ui/vnc-ws.o
  CC      ui/vnc-jobs.o
  VERT    ui/shader/texture-blit-vert.h
  VERT    ui/shader/texture-blit-flip-vert.h
  FRAG    ui/shader/texture-blit-frag.h
  CC      ui/console-gl.o
  CC      ui/egl-helpers.o
  CC      ui/egl-context.o
  CC      ui/egl-headless.o
  CC      audio/ossaudio.o
  CC      ui/sdl.o
  CC      ui/sdl_zoom.o
  CC      ui/x_keymap.o
  CC      ui/gtk.o
  CC      ui/gtk-egl.o
  CC      ui/curses.o
  CC      chardev/char.o
  CC      chardev/char-fd.o
  CC      chardev/char-fe.o
  CC      chardev/char-file.o
  CC      chardev/char-io.o
  CC      chardev/char-mux.o
  CC      chardev/char-null.o
  CC      chardev/char-parallel.o
  CC      chardev/char-pipe.o
  CC      chardev/char-pty.o
  CC      chardev/char-ringbuf.o
  CC      chardev/char-serial.o
  CC      chardev/char-socket.o
  CC      chardev/char-stdio.o
  CC      chardev/char-udp.o
  LINK    tests/qemu-iotests/socket_scm_helper
  CC      qga/commands.o
  CC      qga/guest-agent-command-state.o
  AS      optionrom/multiboot.o
  AS      optionrom/linuxboot.o
  CC      optionrom/linuxboot_dma.o
  AS      optionrom/kvmvapic.o
  BUILD   optionrom/multiboot.img
  BUILD   optionrom/linuxboot.img
  BUILD   optionrom/kvmvapic.img
  CC      qga/main.o
  BUILD   optionrom/multiboot.raw
  BUILD   optionrom/linuxboot.raw
  BUILD   optionrom/linuxboot_dma.img
  BUILD   optionrom/kvmvapic.raw
  SIGN    optionrom/multiboot.bin
  SIGN    optionrom/linuxboot.bin
  BUILD   optionrom/linuxboot_dma.raw
  SIGN    optionrom/kvmvapic.bin
  CC      qga/commands-posix.o
  SIGN    optionrom/linuxboot_dma.bin
  CC      qga/channel-posix.o
  CC      qga/qapi-generated/qga-qapi-types.o
  CC      qga/qapi-generated/qga-qapi-visit.o
  CC      qga/qapi-generated/qga-qapi-commands.o
  AR      libqemuutil.a
  CC      qemu-img.o
  LINK    qemu-io
  LINK    scsi/qemu-pr-helper
  LINK    qemu-bridge-helper
  CC      ui/shader.o
  LINK    ivshmem-client
  LINK    ivshmem-server
  LINK    qemu-nbd
  LINK    qemu-ga
  GEN     x86_64-softmmu/hmp-commands.h
  GEN     x86_64-softmmu/hmp-commands-info.h
  GEN     x86_64-softmmu/config-target.h
  LINK    qemu-img
  GEN     aarch64-softmmu/hmp-commands-info.h
  GEN     aarch64-softmmu/config-target.h
  GEN     aarch64-softmmu/hmp-commands.h
  CC      x86_64-softmmu/tcg/tcg.o
  CC      x86_64-softmmu/tcg/tcg-op.o
  CC      x86_64-softmmu/exec.o
  CC      x86_64-softmmu/tcg/tcg-op-vec.o
  CC      x86_64-softmmu/tcg/tcg-op-gvec.o
  CC      x86_64-softmmu/tcg/tcg-common.o
  CC      aarch64-softmmu/exec.o
  CC      x86_64-softmmu/tcg/optimize.o
  CC      x86_64-softmmu/fpu/softfloat.o
  CC      x86_64-softmmu/disas.o
  GEN     x86_64-softmmu/gdbstub-xml.c
  CC      x86_64-softmmu/arch_init.o
  CC      x86_64-softmmu/cpus.o
  CC      x86_64-softmmu/monitor.o
  CC      x86_64-softmmu/gdbstub.o
  CC      x86_64-softmmu/balloon.o
  CC      x86_64-softmmu/ioport.o
  CC      aarch64-softmmu/tcg/tcg.o
  CC      x86_64-softmmu/numa.o
  CC      x86_64-softmmu/qtest.o
  CC      x86_64-softmmu/memory.o
  CC      x86_64-softmmu/memory_mapping.o
  CC      x86_64-softmmu/dump.o
  CC      x86_64-softmmu/win_dump.o
  CC      x86_64-softmmu/migration/ram.o
  CC      x86_64-softmmu/accel/accel.o
  CC      x86_64-softmmu/accel/kvm/kvm-all.o
  CC      x86_64-softmmu/accel/stubs/hax-stub.o
  CC      x86_64-softmmu/accel/stubs/hvf-stub.o
  CC      x86_64-softmmu/accel/stubs/whpx-stub.o
  CC      x86_64-softmmu/accel/tcg/tcg-all.o
  CC      x86_64-softmmu/accel/tcg/cputlb.o
  CC      x86_64-softmmu/accel/tcg/tcg-runtime.o
  CC      x86_64-softmmu/accel/tcg/tcg-runtime-gvec.o
  CC      aarch64-softmmu/tcg/tcg-op.o
  CC      x86_64-softmmu/accel/tcg/cpu-exec.o
  CC      x86_64-softmmu/accel/tcg/cpu-exec-common.o
  CC      x86_64-softmmu/accel/tcg/translate-all.o
  CC      x86_64-softmmu/accel/tcg/translator.o
  CC      x86_64-softmmu/hw/block/virtio-blk.o
  CC      x86_64-softmmu/hw/block/vhost-user-blk.o
  CC      aarch64-softmmu/tcg/tcg-op-vec.o
  CC      aarch64-softmmu/tcg/tcg-op-gvec.o
  CC      x86_64-softmmu/hw/block/dataplane/virtio-blk.o
  CC      aarch64-softmmu/tcg/tcg-common.o
  CC      aarch64-softmmu/tcg/optimize.o
  CC      aarch64-softmmu/fpu/softfloat.o
  CC      x86_64-softmmu/hw/char/virtio-serial-bus.o
  CC      x86_64-softmmu/hw/core/generic-loader.o
  CC      x86_64-softmmu/hw/core/null-machine.o
  CC      aarch64-softmmu/disas.o
  CC      x86_64-softmmu/hw/display/vga.o
  CC      x86_64-softmmu/hw/display/virtio-gpu.o
  GEN     aarch64-softmmu/gdbstub-xml.c
  CC      x86_64-softmmu/hw/display/virtio-gpu-3d.o
  CC      aarch64-softmmu/arch_init.o
  CC      x86_64-softmmu/hw/display/virtio-gpu-pci.o
  CC      x86_64-softmmu/hw/display/virtio-vga.o
  CC      aarch64-softmmu/cpus.o
  CC      aarch64-softmmu/monitor.o
  CC      x86_64-softmmu/hw/intc/apic.o
  CC      aarch64-softmmu/gdbstub.o
  CC      x86_64-softmmu/hw/intc/apic_common.o
  CC      x86_64-softmmu/hw/intc/ioapic.o
  CC      aarch64-softmmu/balloon.o
  CC      aarch64-softmmu/ioport.o
  CC      x86_64-softmmu/hw/isa/lpc_ich9.o
  CC      aarch64-softmmu/numa.o
  CC      aarch64-softmmu/qtest.o
  CC      aarch64-softmmu/memory.o
  CC      aarch64-softmmu/memory_mapping.o
  CC      aarch64-softmmu/dump.o
  CC      aarch64-softmmu/migration/ram.o
  CC      x86_64-softmmu/hw/misc/ivshmem.o
  CC      aarch64-softmmu/accel/accel.o
  CC      aarch64-softmmu/accel/stubs/hax-stub.o
  CC      aarch64-softmmu/accel/stubs/hvf-stub.o
  CC      aarch64-softmmu/accel/stubs/whpx-stub.o
  CC      aarch64-softmmu/accel/stubs/kvm-stub.o
  CC      x86_64-softmmu/hw/misc/pvpanic.o
  CC      aarch64-softmmu/accel/tcg/tcg-all.o
  CC      aarch64-softmmu/accel/tcg/cputlb.o
  CC      aarch64-softmmu/accel/tcg/tcg-runtime.o
  CC      aarch64-softmmu/accel/tcg/tcg-runtime-gvec.o
  CC      x86_64-softmmu/hw/misc/hyperv_testdev.o
  CC      aarch64-softmmu/accel/tcg/cpu-exec.o
  CC      aarch64-softmmu/accel/tcg/cpu-exec-common.o
  CC      aarch64-softmmu/accel/tcg/translate-all.o
  CC      x86_64-softmmu/hw/misc/mmio_interface.o
  CC      x86_64-softmmu/hw/net/virtio-net.o
  CC      aarch64-softmmu/accel/tcg/translator.o
  CC      x86_64-softmmu/hw/net/vhost_net.o
  CC      x86_64-softmmu/hw/rdma/rdma_utils.o
  CC      x86_64-softmmu/hw/rdma/rdma_backend.o
  CC      x86_64-softmmu/hw/rdma/rdma_rm.o
  CC      x86_64-softmmu/hw/rdma/vmw/pvrdma_dev_ring.o
  CC      aarch64-softmmu/hw/adc/stm32f2xx_adc.o
  CC      aarch64-softmmu/hw/block/virtio-blk.o
  CC      x86_64-softmmu/hw/rdma/vmw/pvrdma_cmd.o
  CC      x86_64-softmmu/hw/rdma/vmw/pvrdma_qp_ops.o
  CC      x86_64-softmmu/hw/rdma/vmw/pvrdma_main.o
  CC      aarch64-softmmu/hw/block/vhost-user-blk.o
  CC      x86_64-softmmu/hw/scsi/virtio-scsi.o
  CC      aarch64-softmmu/hw/block/dataplane/virtio-blk.o
  CC      x86_64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC      x86_64-softmmu/hw/scsi/vhost-scsi-common.o
  CC      aarch64-softmmu/hw/char/exynos4210_uart.o
  CC      x86_64-softmmu/hw/scsi/vhost-scsi.o
  CC      aarch64-softmmu/hw/char/omap_uart.o
  CC      aarch64-softmmu/hw/char/digic-uart.o
  CC      x86_64-softmmu/hw/scsi/vhost-user-scsi.o
  CC      x86_64-softmmu/hw/timer/mc146818rtc.o
  CC      aarch64-softmmu/hw/char/stm32f2xx_usart.o
  CC      x86_64-softmmu/hw/vfio/common.o
  CC      x86_64-softmmu/hw/vfio/pci.o
  CC      aarch64-softmmu/hw/char/bcm2835_aux.o
  CC      aarch64-softmmu/hw/char/virtio-serial-bus.o
  CC      x86_64-softmmu/hw/vfio/pci-quirks.o
  CC      aarch64-softmmu/hw/core/generic-loader.o
  CC      aarch64-softmmu/hw/core/null-machine.o
  CC      x86_64-softmmu/hw/vfio/display.o
  CC      aarch64-softmmu/hw/cpu/arm11mpcore.o
  CC      aarch64-softmmu/hw/cpu/realview_mpcore.o
  CC      aarch64-softmmu/hw/cpu/a9mpcore.o
  CC      x86_64-softmmu/hw/vfio/platform.o
  CC      aarch64-softmmu/hw/cpu/a15mpcore.o
  CC      x86_64-softmmu/hw/vfio/spapr.o
  CC      aarch64-softmmu/hw/display/omap_dss.o
  CC      aarch64-softmmu/hw/display/omap_lcdc.o
  CC      aarch64-softmmu/hw/display/pxa2xx_lcd.o
  CC      x86_64-softmmu/hw/virtio/virtio.o
  CC      x86_64-softmmu/hw/virtio/virtio-balloon.o
  CC      x86_64-softmmu/hw/virtio/virtio-crypto.o
  CC      aarch64-softmmu/hw/display/bcm2835_fb.o
  CC      x86_64-softmmu/hw/virtio/virtio-crypto-pci.o
  CC      aarch64-softmmu/hw/display/vga.o
  CC      aarch64-softmmu/hw/display/virtio-gpu.o
  CC      x86_64-softmmu/hw/virtio/vhost.o
  CC      x86_64-softmmu/hw/virtio/vhost-backend.o
  CC      x86_64-softmmu/hw/virtio/vhost-user.o
  CC      x86_64-softmmu/hw/virtio/vhost-vsock.o
  CC      x86_64-softmmu/hw/xen/xen-host-pci-device.o
  CC      aarch64-softmmu/hw/display/virtio-gpu-3d.o
  CC      x86_64-softmmu/hw/xen/xen_pt.o
  CC      aarch64-softmmu/hw/display/virtio-gpu-pci.o
  CC      aarch64-softmmu/hw/display/dpcd.o
  CC      aarch64-softmmu/hw/display/xlnx_dp.o
  CC      x86_64-softmmu/hw/xen/xen_pt_config_init.o
  CC      x86_64-softmmu/hw/xen/xen_pt_graphics.o
  CC      x86_64-softmmu/hw/xen/xen_pt_msi.o
  CC      x86_64-softmmu/hw/xen/xen_pt_load_rom.o
  CC      aarch64-softmmu/hw/dma/xlnx_dpdma.o
  CC      x86_64-softmmu/hw/i386/multiboot.o
  CC      x86_64-softmmu/hw/i386/pc.o
  CC      aarch64-softmmu/hw/dma/omap_dma.o
  CC      aarch64-softmmu/hw/dma/soc_dma.o
  CC      aarch64-softmmu/hw/dma/pxa2xx_dma.o
  CC      aarch64-softmmu/hw/dma/bcm2835_dma.o
  CC      x86_64-softmmu/hw/i386/pc_piix.o
  CC      x86_64-softmmu/hw/i386/pc_q35.o
  CC      x86_64-softmmu/hw/i386/pc_sysfw.o
  CC      x86_64-softmmu/hw/i386/x86-iommu.o
  CC      aarch64-softmmu/hw/gpio/omap_gpio.o
  CC      aarch64-softmmu/hw/gpio/imx_gpio.o
  CC      x86_64-softmmu/hw/i386/intel_iommu.o
  CC      x86_64-softmmu/hw/i386/amd_iommu.o
  CC      x86_64-softmmu/hw/i386/vmport.o
  CC      x86_64-softmmu/hw/i386/vmmouse.o
  CC      x86_64-softmmu/hw/i386/kvmvapic.o
  CC      x86_64-softmmu/hw/i386/acpi-build.o
  CC      aarch64-softmmu/hw/gpio/bcm2835_gpio.o
  CC      x86_64-softmmu/hw/i386/../xenpv/xen_machine_pv.o
  CC      x86_64-softmmu/hw/i386/kvm/clock.o
  CC      x86_64-softmmu/hw/i386/kvm/apic.o
  CC      x86_64-softmmu/hw/i386/kvm/i8259.o
  CC      aarch64-softmmu/hw/i2c/omap_i2c.o
  CC      aarch64-softmmu/hw/input/pxa2xx_keypad.o
  CC      x86_64-softmmu/hw/i386/kvm/ioapic.o
  CC      aarch64-softmmu/hw/input/tsc210x.o
  CC      aarch64-softmmu/hw/intc/armv7m_nvic.o
  CC      x86_64-softmmu/hw/i386/kvm/i8254.o
  CC      aarch64-softmmu/hw/intc/exynos4210_gic.o
  CC      aarch64-softmmu/hw/intc/exynos4210_combiner.o
  CC      x86_64-softmmu/hw/i386/xen/xen_platform.o
  CC      aarch64-softmmu/hw/intc/omap_intc.o
  CC      x86_64-softmmu/hw/i386/xen/xen_apic.o
  CC      x86_64-softmmu/hw/i386/xen/xen_pvdevice.o
  CC      aarch64-softmmu/hw/intc/bcm2835_ic.o
  CC      x86_64-softmmu/hw/i386/xen/xen-hvm.o
  CC      aarch64-softmmu/hw/intc/bcm2836_control.o
  CC      x86_64-softmmu/hw/i386/xen/xen-mapcache.o
  CC      x86_64-softmmu/target/i386/helper.o
  CC      x86_64-softmmu/target/i386/cpu.o
  CC      aarch64-softmmu/hw/intc/allwinner-a10-pic.o
  CC      aarch64-softmmu/hw/intc/aspeed_vic.o
  CC      aarch64-softmmu/hw/intc/arm_gicv3_cpuif.o
  CC      aarch64-softmmu/hw/misc/ivshmem.o
  CC      aarch64-softmmu/hw/misc/arm_sysctl.o
  CC      aarch64-softmmu/hw/misc/cbus.o
  CC      x86_64-softmmu/target/i386/gdbstub.o
  CC      x86_64-softmmu/target/i386/xsave_helper.o
  CC      x86_64-softmmu/target/i386/translate.o
  CC      aarch64-softmmu/hw/misc/exynos4210_pmu.o
  CC      aarch64-softmmu/hw/misc/exynos4210_clk.o
  CC      x86_64-softmmu/target/i386/bpt_helper.o
  CC      aarch64-softmmu/hw/misc/exynos4210_rng.o
  CC      x86_64-softmmu/target/i386/cc_helper.o
  CC      aarch64-softmmu/hw/misc/imx_ccm.o
  CC      aarch64-softmmu/hw/misc/imx31_ccm.o
  CC      aarch64-softmmu/hw/misc/imx25_ccm.o
  CC      x86_64-softmmu/target/i386/excp_helper.o
  CC      x86_64-softmmu/target/i386/fpu_helper.o
  CC      aarch64-softmmu/hw/misc/imx6_ccm.o
  CC      aarch64-softmmu/hw/misc/imx6_src.o
  CC      aarch64-softmmu/hw/misc/imx7_ccm.o
  CC      x86_64-softmmu/target/i386/int_helper.o
  CC      aarch64-softmmu/hw/misc/imx2_wdt.o
  CC      x86_64-softmmu/target/i386/mem_helper.o
  CC      aarch64-softmmu/hw/misc/imx7_snvs.o
  CC      aarch64-softmmu/hw/misc/imx7_gpr.o
  CC      aarch64-softmmu/hw/misc/mst_fpga.o
  CC      aarch64-softmmu/hw/misc/omap_clk.o
  CC      x86_64-softmmu/target/i386/misc_helper.o
  CC      aarch64-softmmu/hw/misc/omap_gpmc.o
  CC      aarch64-softmmu/hw/misc/omap_l4.o
  CC      x86_64-softmmu/target/i386/mpx_helper.o
  CC      x86_64-softmmu/target/i386/seg_helper.o
  CC      aarch64-softmmu/hw/misc/omap_sdrc.o
  CC      aarch64-softmmu/hw/misc/omap_tap.o
  CC      aarch64-softmmu/hw/misc/bcm2835_mbox.o
  CC      aarch64-softmmu/hw/misc/bcm2835_property.o
  CC      x86_64-softmmu/target/i386/smm_helper.o
  CC      aarch64-softmmu/hw/misc/bcm2835_rng.o
  CC      x86_64-softmmu/target/i386/svm_helper.o
  CC      aarch64-softmmu/hw/misc/zynq_slcr.o
  CC      aarch64-softmmu/hw/misc/zynq-xadc.o
  CC      aarch64-softmmu/hw/misc/stm32f2xx_syscfg.o
  CC      aarch64-softmmu/hw/misc/mps2-fpgaio.o
  CC      x86_64-softmmu/target/i386/machine.o
  CC      aarch64-softmmu/hw/misc/mps2-scc.o
  CC      aarch64-softmmu/hw/misc/tz-mpc.o
  CC      x86_64-softmmu/target/i386/arch_memory_mapping.o
  CC      aarch64-softmmu/hw/misc/tz-ppc.o
  CC      x86_64-softmmu/target/i386/arch_dump.o
  CC      aarch64-softmmu/hw/misc/iotkit-secctl.o
  CC      aarch64-softmmu/hw/misc/auxbus.o
  CC      x86_64-softmmu/target/i386/monitor.o
  CC      aarch64-softmmu/hw/misc/aspeed_scu.o
  CC      x86_64-softmmu/target/i386/hyperv.o
  CC      x86_64-softmmu/target/i386/kvm.o
  CC      aarch64-softmmu/hw/misc/aspeed_sdmc.o
  CC      aarch64-softmmu/hw/misc/mmio_interface.o
  CC      aarch64-softmmu/hw/misc/msf2-sysreg.o
  CC      aarch64-softmmu/hw/net/virtio-net.o
  CC      x86_64-softmmu/target/i386/sev.o
  CC      aarch64-softmmu/hw/net/vhost_net.o
  CC      aarch64-softmmu/hw/pcmcia/pxa2xx.o
  GEN     trace/generated-helpers.c
  CC      aarch64-softmmu/hw/rdma/rdma_utils.o
  CC      aarch64-softmmu/hw/rdma/rdma_backend.o
  CC      aarch64-softmmu/hw/rdma/rdma_rm.o
  CC      x86_64-softmmu/trace/control-target.o
  CC      aarch64-softmmu/hw/rdma/vmw/pvrdma_dev_ring.o
  CC      aarch64-softmmu/hw/rdma/vmw/pvrdma_cmd.o
  CC      x86_64-softmmu/gdbstub-xml.o
  CC      aarch64-softmmu/hw/rdma/vmw/pvrdma_qp_ops.o
  CC      aarch64-softmmu/hw/rdma/vmw/pvrdma_main.o
  CC      aarch64-softmmu/hw/scsi/virtio-scsi.o
  CC      x86_64-softmmu/trace/generated-helpers.o
  CC      aarch64-softmmu/hw/scsi/virtio-scsi-dataplane.o
  CC      aarch64-softmmu/hw/scsi/vhost-scsi-common.o
  CC      aarch64-softmmu/hw/scsi/vhost-scsi.o
  CC      aarch64-softmmu/hw/scsi/vhost-user-scsi.o
  CC      aarch64-softmmu/hw/sd/omap_mmc.o
  CC      aarch64-softmmu/hw/sd/pxa2xx_mmci.o
  CC      aarch64-softmmu/hw/sd/bcm2835_sdhost.o
  CC      aarch64-softmmu/hw/ssi/omap_spi.o
  CC      aarch64-softmmu/hw/ssi/imx_spi.o
  CC      aarch64-softmmu/hw/timer/exynos4210_mct.o
  CC      aarch64-softmmu/hw/timer/exynos4210_pwm.o
  CC      aarch64-softmmu/hw/timer/exynos4210_rtc.o
  CC      aarch64-softmmu/hw/timer/omap_gptimer.o
  CC      aarch64-softmmu/hw/timer/omap_synctimer.o
  CC      aarch64-softmmu/hw/timer/pxa2xx_timer.o
  CC      aarch64-softmmu/hw/timer/digic-timer.o
  CC      aarch64-softmmu/hw/timer/allwinner-a10-pit.o
  CC      aarch64-softmmu/hw/usb/tusb6010.o
  CC      aarch64-softmmu/hw/usb/chipidea.o
  CC      aarch64-softmmu/hw/vfio/common.o
  CC      aarch64-softmmu/hw/vfio/pci.o
  CC      aarch64-softmmu/hw/vfio/pci-quirks.o
  CC      aarch64-softmmu/hw/vfio/display.o
  CC      aarch64-softmmu/hw/vfio/platform.o
  CC      aarch64-softmmu/hw/vfio/calxeda-xgmac.o
  CC      aarch64-softmmu/hw/vfio/amd-xgbe.o
  CC      aarch64-softmmu/hw/vfio/spapr.o
  CC      aarch64-softmmu/hw/virtio/virtio.o
  CC      aarch64-softmmu/hw/virtio/virtio-balloon.o
  CC      aarch64-softmmu/hw/virtio/virtio-crypto.o
  CC      aarch64-softmmu/hw/virtio/virtio-crypto-pci.o
  CC      aarch64-softmmu/hw/virtio/vhost.o
  CC      aarch64-softmmu/hw/virtio/vhost-backend.o
  CC      aarch64-softmmu/hw/virtio/vhost-user.o
  LINK    x86_64-softmmu/qemu-system-x86_64
  CC      aarch64-softmmu/hw/virtio/vhost-vsock.o
  CC      aarch64-softmmu/hw/arm/boot.o
  CC      aarch64-softmmu/hw/arm/virt.o
  CC      aarch64-softmmu/hw/arm/sysbus-fdt.o
  CC      aarch64-softmmu/hw/arm/virt-acpi-build.o
  CC      aarch64-softmmu/hw/arm/digic_boards.o
  CC      aarch64-softmmu/hw/arm/exynos4_boards.o
  CC      aarch64-softmmu/hw/arm/highbank.o
  CC      aarch64-softmmu/hw/arm/integratorcp.o
  CC      aarch64-softmmu/hw/arm/mainstone.o
  CC      aarch64-softmmu/hw/arm/musicpal.o
  CC      aarch64-softmmu/hw/arm/netduino2.o
  CC      aarch64-softmmu/hw/arm/nseries.o
  CC      aarch64-softmmu/hw/arm/omap_sx1.o
  CC      aarch64-softmmu/hw/arm/palm.o
  CC      aarch64-softmmu/hw/arm/gumstix.o
  CC      aarch64-softmmu/hw/arm/spitz.o
  CC      aarch64-softmmu/hw/arm/tosa.o
  CC      aarch64-softmmu/hw/arm/z2.o
  CC      aarch64-softmmu/hw/arm/realview.o
  CC      aarch64-softmmu/hw/arm/stellaris.o
  CC      aarch64-softmmu/hw/arm/collie.o
  CC      aarch64-softmmu/hw/arm/vexpress.o
  CC      aarch64-softmmu/hw/arm/versatilepb.o
  CC      aarch64-softmmu/hw/arm/xilinx_zynq.o
  CC      aarch64-softmmu/hw/arm/armv7m.o
  CC      aarch64-softmmu/hw/arm/exynos4210.o
  CC      aarch64-softmmu/hw/arm/pxa2xx.o
  CC      aarch64-softmmu/hw/arm/pxa2xx_gpio.o
  CC      aarch64-softmmu/hw/arm/pxa2xx_pic.o
  CC      aarch64-softmmu/hw/arm/digic.o
  CC      aarch64-softmmu/hw/arm/omap1.o
  CC      aarch64-softmmu/hw/arm/omap2.o
  CC      aarch64-softmmu/hw/arm/strongarm.o
  CC      aarch64-softmmu/hw/arm/allwinner-a10.o
  CC      aarch64-softmmu/hw/arm/cubieboard.o
  CC      aarch64-softmmu/hw/arm/bcm2835_peripherals.o
  CC      aarch64-softmmu/hw/arm/bcm2836.o
  CC      aarch64-softmmu/hw/arm/raspi.o
  CC      aarch64-softmmu/hw/arm/stm32f205_soc.o
  CC      aarch64-softmmu/hw/arm/xlnx-zynqmp.o
  CC      aarch64-softmmu/hw/arm/xlnx-zcu102.o
  CC      aarch64-softmmu/hw/arm/fsl-imx25.o
  CC      aarch64-softmmu/hw/arm/imx25_pdk.o
  CC      aarch64-softmmu/hw/arm/fsl-imx31.o
  CC      aarch64-softmmu/hw/arm/kzm.o
  CC      aarch64-softmmu/hw/arm/fsl-imx6.o
  CC      aarch64-softmmu/hw/arm/sabrelite.o
  CC      aarch64-softmmu/hw/arm/aspeed_soc.o
  CC      aarch64-softmmu/hw/arm/aspeed.o
  CC      aarch64-softmmu/hw/arm/mps2.o
  CC      aarch64-softmmu/hw/arm/mps2-tz.o
  CC      aarch64-softmmu/hw/arm/msf2-soc.o
  CC      aarch64-softmmu/hw/arm/msf2-som.o
  CC      aarch64-softmmu/hw/arm/iotkit.o
  CC      aarch64-softmmu/hw/arm/fsl-imx7.o
  CC      aarch64-softmmu/hw/arm/mcimx7d-sabre.o
  CC      aarch64-softmmu/hw/arm/smmu-common.o
  CC      aarch64-softmmu/hw/arm/smmuv3.o
  CC      aarch64-softmmu/target/arm/arm-semi.o
  CC      aarch64-softmmu/target/arm/machine.o
  CC      aarch64-softmmu/target/arm/psci.o
  CC      aarch64-softmmu/target/arm/arch_dump.o
  CC      aarch64-softmmu/target/arm/monitor.o
  CC      aarch64-softmmu/target/arm/kvm-stub.o
  CC      aarch64-softmmu/target/arm/translate.o
  CC      aarch64-softmmu/target/arm/op_helper.o
  CC      aarch64-softmmu/target/arm/helper.o
  CC      aarch64-softmmu/target/arm/cpu.o
  CC      aarch64-softmmu/target/arm/neon_helper.o
  CC      aarch64-softmmu/target/arm/iwmmxt_helper.o
  CC      aarch64-softmmu/target/arm/vec_helper.o
  CC      aarch64-softmmu/target/arm/gdbstub.o
  CC      aarch64-softmmu/target/arm/cpu64.o
  CC      aarch64-softmmu/target/arm/translate-a64.o
  CC      aarch64-softmmu/target/arm/helper-a64.o
  CC      aarch64-softmmu/target/arm/gdbstub64.o
  CC      aarch64-softmmu/target/arm/crypto_helper.o
  CC      aarch64-softmmu/target/arm/arm-powerctl.o
  GEN     aarch64-softmmu/target/arm/decode-sve.inc.c
  CC      aarch64-softmmu/target/arm/sve_helper.o
  GEN     trace/generated-helpers.c
  CC      aarch64-softmmu/trace/control-target.o
  CC      aarch64-softmmu/gdbstub-xml.o
  CC      aarch64-softmmu/target/arm/translate-sve.o
  CC      aarch64-softmmu/trace/generated-helpers.o
/tmp/qemu-test/src/target/arm/sve_helper.c: In function 'sve_ld1_r':
/tmp/qemu-test/src/target/arm/sve_helper.c:4326:5: error: 'for' loop initial declarations are only allowed in C99 mode
     for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
     ^
/tmp/qemu-test/src/target/arm/sve_helper.c:4326:5: note: use option -std=c99 or -std=gnu99 to compile your code
make[1]: *** [target/arm/sve_helper.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [subdir-aarch64-softmmu] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 565, in <module>
    sys.exit(main())
  File "./tests/docker/docker.py", line 562, in main
    return args.cmdobj.run(args, argv)
  File "./tests/docker/docker.py", line 308, in run
    return Docker().run(argv, args.keep, quiet=args.quiet)
  File "./tests/docker/docker.py", line 276, in run
    quiet=quiet)
  File "./tests/docker/docker.py", line 183, in _do_check
    return subprocess.check_call(self._command + cmd, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=31b00c3aa2cd11e8b3b752540069c830', '-u', '1000', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2
make[1]: *** [tests/docker/Makefile.include:213: docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-6uc3dv26/src'
make: *** [tests/docker/Makefile.include:247: docker-run-test-quick@centos7] Error 2

real	3m54.181s
user	0m5.184s
sys	0m3.798s
=== OUTPUT END ===

Test command exited with code: 2


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
@ 2018-08-23 15:21   ` Peter Maydell
  2018-08-23 15:37     ` Richard Henderson
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 15:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The 16-byte load only uses 16 predicate bits.  But while
> reusing the other load infrastructure, we find other bits
> that are set and trigger an assert.  To avoid this and
> retain the assert, zero-extend the predicate that we pass
> to the LD1 helper.
>
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/translate-sve.c | 25 +++++++++++++++++++++++--
>  1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index d27bc8c946..bef6b8242d 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4765,12 +4765,33 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
>      unsigned vsz = vec_full_reg_size(s);
>      TCGv_ptr t_pg;
>      TCGv_i32 desc;
> +    int poff;
>
>      /* Load the first quadword using the normal predicated load helpers.  */
>      desc = tcg_const_i32(simd_desc(16, 16, zt));
> -    t_pg = tcg_temp_new_ptr();
>
> -    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
> +    poff = pred_full_reg_offset(s, pg);
> +    if (vsz > 16) {
> +        /*
> +         * Zero-extend the first 16 bits of the predicate into a temporary.
> +         * This avoids triggering an assert making sure we don't have bits
> +         * set within a predicate beyond VQ, but we have lowered VQ to 1
> +         * for this load operation.
> +         */
> +        TCGv_i64 tmp = tcg_temp_new_i64();
> +#ifdef HOST_WORDS_BIGENDIAN
> +        poff += 6;
> +#endif
> +        tcg_gen_ld16u_i64(tmp, cpu_env, poff);
> +
> +        poff = offsetof(CPUARMState, vfp.preg_tmp);
> +        tcg_gen_st_i64(tmp, cpu_env, poff);
> +        tcg_temp_free_i64(tmp);
> +    }
> +
> +    t_pg = tcg_temp_new_ptr();
> +    tcg_gen_addi_ptr(t_pg, cpu_env, poff);

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

The bigendian #ifdef in the middle of the code is a little
ugly, though -- I don't suppose it's possible to avoid it
(or abstract it away) somehow?

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ
  2018-08-23 15:21   ` Peter Maydell
@ 2018-08-23 15:37     ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-23 15:37 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 08/23/2018 08:21 AM, Peter Maydell wrote:
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> The 16-byte load only uses 16 predicate bits.  But while
>> reusing the other load infrastructure, we find other bits
>> that are set and trigger an assert.  To avoid this and
>> retain the assert, zero-extend the predicate that we pass
>> to the LD1 helper.
>>
>> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>  target/arm/translate-sve.c | 25 +++++++++++++++++++++++--
>>  1 file changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
>> index d27bc8c946..bef6b8242d 100644
>> --- a/target/arm/translate-sve.c
>> +++ b/target/arm/translate-sve.c
>> @@ -4765,12 +4765,33 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
>>      unsigned vsz = vec_full_reg_size(s);
>>      TCGv_ptr t_pg;
>>      TCGv_i32 desc;
>> +    int poff;
>>
>>      /* Load the first quadword using the normal predicated load helpers.  */
>>      desc = tcg_const_i32(simd_desc(16, 16, zt));
>> -    t_pg = tcg_temp_new_ptr();
>>
>> -    tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
>> +    poff = pred_full_reg_offset(s, pg);
>> +    if (vsz > 16) {
>> +        /*
>> +         * Zero-extend the first 16 bits of the predicate into a temporary.
>> +         * This avoids triggering an assert making sure we don't have bits
>> +         * set within a predicate beyond VQ, but we have lowered VQ to 1
>> +         * for this load operation.
>> +         */
>> +        TCGv_i64 tmp = tcg_temp_new_i64();
>> +#ifdef HOST_WORDS_BIGENDIAN
>> +        poff += 6;
>> +#endif
>> +        tcg_gen_ld16u_i64(tmp, cpu_env, poff);
>> +
>> +        poff = offsetof(CPUARMState, vfp.preg_tmp);
>> +        tcg_gen_st_i64(tmp, cpu_env, poff);
>> +        tcg_temp_free_i64(tmp);
>> +    }
>> +
>> +    t_pg = tcg_temp_new_ptr();
>> +    tcg_gen_addi_ptr(t_pg, cpu_env, poff);
> 
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> 
> The bigendian #ifdef in the middle of the code is a little
> ugly, though -- I don't suppose it's possible to avoid it
> (or abstract it away) somehow?

Adding a helper function for this one use didn't seem worthwhile.  But I
certainly can duplicate the form of vec_reg_offset if you prefer.


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
  2018-08-10  9:13   ` Alex Bennée
@ 2018-08-23 16:01   ` Peter Maydell
  1 sibling, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Uses tlb_vaddr_to_host for correct operation with softmmu.
> Optimize for accesses within a single page or pair of pages.
>
> Perf report comparison for cortex-strings test-strlen
> with aarch64-linux-user:
>
> before:
>    1.59%  qemu-aarch64  qemu-aarch64  [.] do_sve_ld1bb_r
>    0.86%  qemu-aarch64  qemu-aarch64  [.] do_sve_ldff1bb_r
> after:
>    0.09%  qemu-aarch64  qemu-aarch64  [.] helper_sve_ldff1bb_r
>    0.01%  qemu-aarch64  qemu-aarch64  [.] sve_ld1bb_host
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/sve_helper.c | 839 ++++++++++++++++++++++++++++++++--------
>  1 file changed, 675 insertions(+), 164 deletions(-)

Oof, this is a large patch...

> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index e03f954a26..4ca9412e20 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1688,6 +1688,45 @@ static void swap_memmove(void *vd, void *vs, size_t n)
>      }
>  }
>
> +/* Similarly for memset of 0.  */
> +static void swap_memzero(void *vd, size_t n)
> +{
> +    uintptr_t d = (uintptr_t)vd;
> +    uintptr_t o = (d | n) & 7;
> +    size_t i;
> +
> +    if (likely(n == 0)) {

Why is "caller asked us to do nothing" the likely case?

> +        return;
> +    }
> +#ifndef HOST_WORDS_BIGENDIAN
> +    o = 0;
> +#endif
> +    switch (o) {
> +    case 0:
> +        memset(vd, 0, n);
> +        break;
> +
> +    case 4:
> +        for (i = 0; i < n; i += 4) {
> +            *(uint32_t *)H1_4(d + i) = 0;
> +        }
> +        break;
> +
> +    case 2:
> +    case 6:
> +        for (i = 0; i < n; i += 2) {
> +            *(uint16_t *)H1_2(d + i) = 0;
> +        }
> +        break;
> +
> +    default:
> +        for (i = 0; i < n; i++) {
> +            *(uint8_t *)H1(d + i) = 0;
> +        }
> +        break;
> +    }
> +}
> +
>  void HELPER(sve_ext)(void *vd, void *vn, void *vm, uint32_t desc)
>  {
>      intptr_t opr_sz = simd_oprsz(desc);
> @@ -3927,32 +3966,438 @@ void HELPER(sve_fcmla_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc)
>  /*
>   * Load contiguous data, protected by a governing predicate.
>   */
> -#define DO_LD1(NAME, FN, TYPEE, TYPEM, H)                  \
> -static void do_##NAME(CPUARMState *env, void *vd, void *vg, \
> -                      target_ulong addr, intptr_t oprsz,   \
> -                      uintptr_t ra)                        \
> -{                                                          \
> -    intptr_t i = 0;                                        \
> -    do {                                                   \
> -        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));    \
> -        do {                                               \
> -            TYPEM m = 0;                                   \
> -            if (pg & 1) {                                  \
> -                m = FN(env, addr, ra);                     \
> -            }                                              \
> -            *(TYPEE *)(vd + H(i)) = m;                     \
> -            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);      \
> -            addr += sizeof(TYPEM);                         \
> -        } while (i & 15);                                  \
> -    } while (i < oprsz);                                   \
> -}                                                          \
> -void HELPER(NAME)(CPUARMState *env, void *vg,              \
> -                  target_ulong addr, uint32_t desc)        \
> -{                                                          \
> -    do_##NAME(env, &env->vfp.zregs[simd_data(desc)], vg,   \
> -              addr, simd_oprsz(desc), GETPC());            \
> +
> +/* Load elements into VD, controlled by VG, from HOST+MEM_OFS.
> + * Memory is valid through MEM_MAX.  The register element indicies
> + * are inferred from MEM_OFS, as modified by the types for which
> + * the helper is built.  Return the MEM_OFS of the first element
> + * not loaded (which is MEM_MAX if they are all loaded).

"@vd", "@mem_ofs" is the usual style.

> + *
> + * For softmmu, we have fully validated the guest page.  For user-only,
> + * we cannot fully validate without taking the mmap lock, but since we
> + * know the access is within one host page, if any access is valid they
> + * all must be valid.  However, it may be that no access is valid and
> + * they have all been predicated false.
> + */
> +typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
> +                                 intptr_t mem_ofs, intptr_t mem_max);
> +
> +/* Load one element into VD+REG_OFF from (ENV,VADDR,RA).
> + * The controlling predicate is known to be true.
> + */
> +typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
> +                            target_ulong vaddr, int mmu_idx, uintptr_t ra);
> +
> +/*
> + * Generate the above primitives.
> + */
> +
> +#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
> +static intptr_t sve_##NAME##_host(void *vd, void *vg, void *host,           \
> +                                  intptr_t mem_off, const intptr_t mem_max) \
> +{                                                                           \
> +    intptr_t reg_off = mem_off * (sizeof(TYPEE) / sizeof(TYPEM));           \
> +    uint64_t *pg = vg;                                                      \
> +    while (mem_off + sizeof(TYPEM) <= mem_max) {                            \
> +        TYPEM val = 0;                                                      \
> +        if (likely((pg[reg_off >> 6] >> (reg_off & 63)) & 1)) {             \
> +            val = HOST(host + mem_off);                                     \
> +        }                                                                   \
> +        *(TYPEE *)(vd + H(reg_off)) = val;                                  \
> +        mem_off += sizeof(TYPEM), reg_off += sizeof(TYPEE);                 \
> +    }                                                                       \
> +    return mem_off;                                                         \
>  }
>
> +#ifdef CONFIG_SOFTMMU
> +#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
> +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
> +                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
> +{                                                                           \
> +    TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
> +    TYPEM val = TLB(env, addr, oi, ra);                                     \
> +    *(TYPEE *)(vd + H(reg_off)) = val;                                      \
> +}
> +#else
> +#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB)                  \
> +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off,  \
> +                             target_ulong addr, int mmu_idx, uintptr_t ra)  \
> +{                                                                           \
> +    TYPEM val = HOST(g2h(addr));                                            \
> +    *(TYPEE *)(vd + H(reg_off)) = val;                                      \
> +}
> +#endif
> +
> +DO_LD_TLB(ld1bb, H1, uint8_t, uint8_t, ldub_p, 0, helper_ret_ldub_mmu)
> +
> +#define DO_LD_PRIM_1(NAME, H, TE, TM)                   \
> +    DO_LD_HOST(NAME, H, TE, TM, ldub_p)                 \
> +    DO_LD_TLB(NAME, H, TE, TM, ldub_p, 0, helper_ret_ldub_mmu)
> +
> +DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
> +DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t,  int8_t)
> +DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
> +DO_LD_PRIM_1(ld1bss, H1_4, uint32_t,  int8_t)
> +DO_LD_PRIM_1(ld1bdu,     , uint64_t, uint8_t)
> +DO_LD_PRIM_1(ld1bds,     , uint64_t,  int8_t)
> +
> +#define DO_LD_PRIM_2(NAME, end, MOEND, H, TE, TM, PH, PT)  \
> +    DO_LD_HOST(NAME##_##end, H, TE, TM, PH##_##end##_p)    \
> +    DO_LD_TLB(NAME##_##end, H, TE, TM, PH##_##end##_p,     \
> +              MOEND, helper_##end##_##PT##_mmu)
> +
> +DO_LD_PRIM_2(ld1hh,  le, MO_LE, H1_2, uint16_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hsu, le, MO_LE, H1_4, uint32_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hss, le, MO_LE, H1_4, uint32_t,  int16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hdu, le, MO_LE,     , uint64_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hds, le, MO_LE,     , uint64_t,  int16_t, lduw, lduw)
> +
> +DO_LD_PRIM_2(ld1ss,  le, MO_LE, H1_4, uint32_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sdu, le, MO_LE,     , uint64_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sds, le, MO_LE,     , uint64_t,  int32_t, ldl, ldul)

It's a shame that we have two different conventions here
that mean one part of this is using 'ldl' and the other 'ldul'...

> +
> +DO_LD_PRIM_2(ld1dd,  le, MO_LE,     , uint64_t, uint64_t, ldq, ldq)
> +
> +DO_LD_PRIM_2(ld1hh,  be, MO_BE, H1_2, uint16_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hsu, be, MO_BE, H1_4, uint32_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hss, be, MO_BE, H1_4, uint32_t,  int16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hdu, be, MO_BE,     , uint64_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hds, be, MO_BE,     , uint64_t,  int16_t, lduw, lduw)
> +
> +DO_LD_PRIM_2(ld1ss,  be, MO_BE, H1_4, uint32_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sdu, be, MO_BE,     , uint64_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sds, be, MO_BE,     , uint64_t,  int32_t, ldl, ldul)
> +
> +DO_LD_PRIM_2(ld1dd,  be, MO_BE,     , uint64_t, uint64_t, ldq, ldq)
> +
> +#undef DO_LD_TLB
> +#undef DO_LD_HOST
> +#undef DO_LD_PRIM_1
> +#undef DO_LD_PRIM_2
> +
> +/*
> + * Special case contiguous loads of bytes to accellerate strings.

"accelerate"

> + *
> + * The assumption is that the governing predicate will be mostly true.
> + * When it is not all true, it has been set by whilelo and so has a
> + * block of true elements followed by a block of false elements.
> + * Thus anything we can do to handle as many bytes as possible in one
> + * step will pay dividends.
> + *
> + * Because of how vector registers are represented in CPUARMState,
> + * each block of 8 can be read with a little-endian load to be stored
> + * into the vector register in host-endian order.
> + *
> + * TODO: For LE host and LE guest (by far the most common combination),
> + * the only difference for other non-extending loads is the controlling
> + * predicate.  Even for other combinations, it might be fastest to use
> + * this primitive to block load all of the data and then reorder the
> + * bytes afterward.
> + */
> +
> +/* For user-only, conditionally load and mask from HOST, returning 0
> + * if the predicate is false.  This is required because, as described
> + * above, we have not fully validated the page, and faults are not
> + * permitted when the predicate is false.
> + * For softmmu, we never arrive here with invalid host memory; just mask.
> + */
> +static inline uint64_t ldq_le_pred_b(uint8_t pg, void *host)
> +{
> +#ifdef CONFIG_USER_ONLY
> +    if (pg == 0) {
> +        return 0;
> +    }
> +#endif
> +    return ldq_le_p(host) & expand_pred_b(pg);
> +}
> +
> +static inline uint8_t ldub_pred(uint8_t pg, void *host)
> +{
> +#ifdef CONFIG_USER_ONLY
> +    return pg & 1 ? ldub_p(host) : 0;
> +#else
> +    return ldub_p(host) & -(pg & 1);
> +#endif
> +}
> +
> +static intptr_t sve_ld1bb_host(void *vd, void *vg, void *host,
> +                               intptr_t off, const intptr_t max)
> +{
> +    uint64_t *d = vd;
> +    uint8_t *g = vg;
> +
> +    /* Assuming OFF and MAX may be misaligned, but also the most common
> +     * case is an entire vector register: OFF == 0, MAX % 16 == 0.
> +     */
> +    if (likely(off + 8 <= max)) {
> +        const intptr_t max_div_8 = max >> 3;
> +        intptr_t off_div_8 = off >> 3;
> +        uint64_t data;
> +
> +        if (unlikely(off & 63)) {
> +            /* Align for a loop-of-8.  We know from the range check
> +             * above that we have enough remaining to load 8 bytes.
> +             */
> +            if (unlikely(off & 7)) {
> +                int off_7 = off & 7;
> +                uint8_t pg = g[H1(off_div_8)] >> off_7;
> +
> +                off_7 *= 8;
> +                data = ldq_le_pred_b(pg, host + off);
> +                data = deposit64(d[off_div_8], off_7, 64 - off_7, data);
> +                d[off_div_8] = data;
> +
> +                off_div_8 += 1;
> +            }
> +
> +            /* If there are not sufficient bytes to align for 64
> +             * and also execute that loop at least once, skip to tail.
> +             */
> +            if (ROUND_UP(off_div_8, 8) + 8 > max_div_8) {
> +                goto skip_64;
> +            }
> +
> +            /* Align for the loop-of-64.  */
> +            if (unlikely(off_div_8 & 7)) {
> +                do {
> +                    uint8_t pg = g[off_div_8];
> +                    data = ldq_le_pred_b(pg, host + off_div_8 * 8);
> +                    d[off_div_8] = data;
> +                } while (++off_div_8 & 7);
> +            }
> +        }
> +
> +        /* While we have blocks of 64 remaining, we can perform tests
> +         * against large blocks of predicates at once.
> +         */
> +        for (; off_div_8 + 8 <= max_div_8; off_div_8 += 8) {
> +            uint64_t pg = *(uint64_t *)(g + off_div_8);
> +            if (likely(pg == -1ULL)) {
> +#ifndef HOST_WORDS_BIGENDIAN
> +                memcpy(d + off_div_8, host + off_div_8 * 8, 64);
> +#else
> +                intptr_t j;
> +                for (j = 0; j < 8; j++) {
> +                    data = ldq_le_p(host + (off_div_8 + j) * 8);
> +                    d[off_div_8 + j] = data;
> +                }
> +#endif
> +            } else if (pg == 0) {
> +                memset(d + off_div_8, 0, 64);
> +            } else {
> +                intptr_t j;
> +                for (j = 0; j < 8; j++) {
> +                    data = ldq_le_pred_b(pg >> (j * 8),
> +                                         host + (off_div_8 + j) * 8);
> +                    d[off_div_8 + j] = data;
> +                }
> +            }
> +        }
> +
> + skip_64:
> +        /* Final tail or a copy smaller than 64 bytes.  */
> +        for (; off_div_8 < max_div_8; off_div_8++) {
> +            uint8_t pg = g[H1(off_div_8)];
> +            data = ldq_le_pred_b(pg, host + off_div_8 * 8);
> +            d[off_div_8] = data;
> +        }
> +
> +        /* Restore using OFF.  */
> +        off = off_div_8 * 8;
> +    }
> +
> +    /* Final tail or a really small copy.  */
> +    if (unlikely(off < max)) {
> +        do {
> +            uint8_t pg = g[H1(off >> 3)] >> (off & 7);
> +            ((uint8_t *)vd)[H1(off)] = ldub_pred(pg, host + off);
> +        } while (++off < max);
> +    }
> +
> +    return max;
> +}

This is an awful lot of cleverness for optimisation purposes.
Can't we start with "simple but works" and add the optimisation
later?

> +
> +/* Skip through a sequence of inactive elements in the guarding predicate VG,
> + * beginning at REG_OFF bounded by REG_MAX.  Return the offset of the active
> + * element >= REG_OFF, or REG_MAX if there were no active elements at all.
> + */
> +static intptr_t find_next_active(uint64_t *vg, intptr_t reg_off,
> +                                 intptr_t reg_max, int esz)
> +{
> +    uint64_t pg_mask = pred_esz_masks[esz];
> +    uint64_t pg = (vg[reg_off >> 6] & pg_mask) >> (reg_off & 63);
> +
> +    /* In normal usage, the first element is active.  */
> +    if (likely(pg & 1)) {
> +        return reg_off;
> +    }
> +
> +    if (pg == 0) {
> +        reg_off &= -64;
> +        do {
> +            reg_off += 64;
> +            if (unlikely(reg_off >= reg_max)) {
> +                /* The entire predicate was false.  */
> +                return reg_max;
> +            }
> +            pg = vg[reg_off >> 6] & pg_mask;
> +        } while (pg == 0);
> +    }
> +    reg_off += ctz64(pg);
> +
> +    /* We should never see an out of range predicate bit set.  */
> +    tcg_debug_assert(reg_off < reg_max);
> +    return reg_off;
> +}
> +
> +/* Return the maximum offset <= MEM_MAX which is still within the page
> + * referenced by BASE+MEM_OFF.
> + */
> +static intptr_t max_for_page(target_ulong base, intptr_t mem_off,
> +                             intptr_t mem_max)
> +{
> +    target_ulong addr = base + mem_off;
> +    intptr_t split = -(intptr_t)(addr | TARGET_PAGE_MASK);
> +    return MIN(split, mem_max - mem_off) + mem_off;
> +}
> +
> +static inline void set_helper_retaddr(uintptr_t ra)
> +{
> +#ifdef CONFIG_USER_ONLY
> +    helper_retaddr = ra;
> +#endif
> +}
> +
> +static inline bool test_host_page(void *host)
> +{
> +#ifdef CONFIG_USER_ONLY
> +    return true;
> +#else
> +    return likely(host != NULL);
> +#endif
> +}
> +
> +/*
> + * Common helper for all contiguous one-register predicated loads.
> + */
> +static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
> +                      uint32_t desc, const uintptr_t retaddr,
> +                      const int esz, const int msz,
> +                      sve_ld1_host_fn *host_fn,
> +                      sve_ld1_tlb_fn *tlb_fn)
> +{
> +    void *vd = &env->vfp.zregs[simd_data(desc)];
> +    const int diffsz = esz - msz;
> +    const intptr_t reg_max = simd_oprsz(desc);
> +    const intptr_t mem_max = reg_max >> diffsz;
> +    const int mmu_idx = cpu_mmu_index(env, false);
> +    ARMVectorReg scratch;
> +    void *host, *result;
> +    intptr_t split;
> +
> +    set_helper_retaddr(retaddr);
> +
> +    host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> +    if (test_host_page(host)) {
> +        split = max_for_page(addr, 0, mem_max);
> +        if (likely(split == mem_max)) {
> +            /* The load is entirely within a valid page.  For softmmu,
> +             * no faults.  For user-only, if the first byte does not
> +             * fault then none of them will fault, so Vd will never be
> +             * partially modified.
> +             */
> +            host_fn(vd, vg, host, 0, mem_max);
> +            set_helper_retaddr(0);
> +            return;
> +        }
> +    }
> +
> +    /* Perform the predicated read into a temporary, thus ensuring
> +     * if the load of the last element faults, Vd is not modified.
> +     */
> +    result = &scratch;
> +#ifdef CONFIG_USER_ONLY
> +    host_fn(vd, vg, host, 0, mem_max);
> +#else
> +    memset(result, 0, reg_max);
> +    for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
> +         reg_off < reg_max;
> +         reg_off = find_next_active(vg, reg_off, reg_max, esz)) {
> +        intptr_t mem_off = reg_off >> diffsz;
> +
> +        split = max_for_page(addr, mem_off, mem_max);
> +        if (msz == 0 || split - mem_off >= (1 << msz)) {
> +            /* At least one whole element on this page.  */
> +            host = tlb_vaddr_to_host(env, addr + mem_off,
> +                                     MMU_DATA_LOAD, mmu_idx);
> +            if (host) {
> +                mem_off = host_fn(result, vg, host - mem_off, mem_off, split);
> +                reg_off = mem_off << diffsz;
> +                continue;
> +            }
> +        }
> +
> +        /* Perform one normal read.  This may fault, longjmping out to the
> +         * main loop in order to raise an exception.  It may succeed, and
> +         * as a side-effect load the TLB entry for the next round.  Finally,
> +         * in the extremely unlikely case we're performing this operation
> +         * on I/O memory, it may succeed but not bring in the TLB entry.
> +         * But even then we have still made forward progress.
> +         */
> +        tlb_fn(env, result, reg_off, addr + mem_off, mmu_idx, retaddr);
> +        reg_off += 1 << esz;
> +    }
> +#endif
> +
> +    set_helper_retaddr(0);
> +    memcpy(vd, result, reg_max);
> +}
> +
> +#define DO_LD1_1(NAME, ESZ) \
> +void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
> +                            target_ulong addr, uint32_t desc)  \
> +{                                                              \
> +    sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, 0,            \
> +              sve_##NAME##_host, sve_##NAME##_tlb);            \
> +}
> +
> +/* TODO: Propagate the endian check back to the translator.  */
> +#define DO_LD1_2(NAME, ESZ, MSZ) \
> +void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg,        \
> +                            target_ulong addr, uint32_t desc)  \
> +{                                                              \
> +    if (arm_cpu_data_is_big_endian(env)) {                     \
> +        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
> +                  sve_##NAME##_be_host, sve_##NAME##_be_tlb);  \
> +    } else {                                                   \
> +        sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,      \
> +                  sve_##NAME##_le_host, sve_##NAME##_le_tlb);  \
> +    }                                                          \
> +}
> +
> +DO_LD1_1(ld1bb,  0)
> +DO_LD1_1(ld1bhu, 1)
> +DO_LD1_1(ld1bhs, 1)
> +DO_LD1_1(ld1bsu, 2)
> +DO_LD1_1(ld1bss, 2)
> +DO_LD1_1(ld1bdu, 3)
> +DO_LD1_1(ld1bds, 3)
> +
> +DO_LD1_2(ld1hh,  1, 1)
> +DO_LD1_2(ld1hsu, 2, 1)
> +DO_LD1_2(ld1hss, 2, 1)
> +DO_LD1_2(ld1hdu, 3, 1)
> +DO_LD1_2(ld1hds, 3, 1)
> +
> +DO_LD1_2(ld1ss,  2, 2)
> +DO_LD1_2(ld1sdu, 3, 2)
> +DO_LD1_2(ld1sds, 3, 2)
> +
> +DO_LD1_2(ld1dd,  3, 3)
> +
> +#undef DO_LD1_1
> +#undef DO_LD1_2
> +
>  #define DO_LD2(NAME, FN, TYPEE, TYPEM, H)                  \
>  void HELPER(NAME)(CPUARMState *env, void *vg,              \
>                    target_ulong addr, uint32_t desc)        \
> @@ -4037,52 +4482,40 @@ void HELPER(NAME)(CPUARMState *env, void *vg,              \
>      }                                                      \
>  }
>
> -DO_LD1(sve_ld1bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
> -DO_LD1(sve_ld1bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
> -DO_LD1(sve_ld1bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
> -DO_LD1(sve_ld1bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
> -DO_LD1(sve_ld1bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
> -DO_LD1(sve_ld1bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
> -
> -DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
> -DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int16_t, H1_4)
> -DO_LD1(sve_ld1hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
> -DO_LD1(sve_ld1hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
> -
> -DO_LD1(sve_ld1sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
> -DO_LD1(sve_ld1sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
> -
> -DO_LD1(sve_ld1bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
>  DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
>  DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
>  DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
>
> -DO_LD1(sve_ld1hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
>  DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
>  DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
>  DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
>
> -DO_LD1(sve_ld1ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
>  DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
>  DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
>  DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
>
> -DO_LD1(sve_ld1dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
>  DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
>  DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
>  DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
>
> -#undef DO_LD1
>  #undef DO_LD2
>  #undef DO_LD3
>  #undef DO_LD4
>
>  /*
>   * Load contiguous data, first-fault and no-fault.
> + *
> + * For user-only, one could argue that we should hold the mmap_lock during
> + * the operation so that there is no race between page_check_range and the
> + * load operation.  However, unmapping pages out from under operating thread

missing "an" ?

> + * is extrodinarily unlikely.  This theoretical race condition also affects

"extraordinarily"

> + * linux-user/ in its get_user/put_user macros.
> + *
> + * TODO: Construct some helpers, written in assembly, that interact with
> + * handle_cpu_signal to produce memory ops which can properly report errors
> + * without racing.
>   */
>
> -#ifdef CONFIG_USER_ONLY
> -
>  /* Fault on byte I.  All bits in FFR from I are cleared.  The vector
>   * result from I is CONSTRAINED UNPREDICTABLE; we choose the MERGE
>   * option, which leaves subsequent data unchanged.
> @@ -4092,147 +4525,225 @@ static void record_fault(CPUARMState *env, uintptr_t i, uintptr_t oprsz)
>      uint64_t *ffr = env->vfp.pregs[FFR_PRED_NUM].p;
>
>      if (i & 63) {
> -        ffr[i / 64] &= MAKE_64BIT_MASK(0, i & 63);
> +        ffr[i >> 6] &= MAKE_64BIT_MASK(0, i & 63);
>          i = ROUND_UP(i, 64);
>      }
>      for (; i < oprsz; i += 64) {
> -        ffr[i / 64] = 0;
> +        ffr[i >> 6] = 0;
>      }
>  }

Should be in different patch ?

>
> -/* Hold the mmap lock during the operation so that there is no race
> - * between page_check_range and the load operation.  We expect the
> - * usual case to have no faults at all, so we check the whole range
> - * first and if successful defer to the normal load operation.
> - *
> - * TODO: Change mmap_lock to a rwlock so that multiple readers
> - * can run simultaneously.  This will probably help other uses
> - * within QEMU as well.
> +/*
> + * Common helper for all contiguous first-fault loads.
>   */
> -#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H)                             \
> -static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg,    \
> -                               target_ulong addr, intptr_t oprsz,       \
> -                               bool first, uintptr_t ra)                \
> -{                                                                       \
> -    intptr_t i = 0;                                                     \
> -    do {                                                                \
> -        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
> -        do {                                                            \
> -            TYPEM m = 0;                                                \
> -            if (pg & 1) {                                               \
> -                if (!first &&                                           \
> -                    unlikely(page_check_range(addr, sizeof(TYPEM),      \
> -                                              PAGE_READ))) {            \
> -                    record_fault(env, i, oprsz);                        \
> -                    return;                                             \
> -                }                                                       \
> -                m = FN(env, addr, ra);                                  \
> -                first = false;                                          \
> -            }                                                           \
> -            *(TYPEE *)(vd + H(i)) = m;                                  \
> -            i += sizeof(TYPEE), pg >>= sizeof(TYPEE);                   \
> -            addr += sizeof(TYPEM);                                      \
> -        } while (i & 15);                                               \
> -    } while (i < oprsz);                                                \
> -}                                                                       \
> -void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg,                \
> -                             target_ulong addr, uint32_t desc)          \
> -{                                                                       \
> -    intptr_t oprsz = simd_oprsz(desc);                                  \
> -    unsigned rd = simd_data(desc);                                      \
> -    void *vd = &env->vfp.zregs[rd];                                     \
> -    mmap_lock();                                                        \
> -    if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) {        \
> -        do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC());            \
> -    } else {                                                            \
> -        do_sve_ldff1##PART(env, vd, vg, addr, oprsz, true, GETPC());    \
> -    }                                                                   \
> -    mmap_unlock();                                                      \
> -}
> +static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
> +                        uint32_t desc, const uintptr_t retaddr,
> +                        const int esz, const int msz,
> +                        sve_ld1_host_fn *host_fn,
> +                        sve_ld1_tlb_fn *tlb_fn)
> +{
> +    void *vd = &env->vfp.zregs[simd_data(desc)];
> +    const int diffsz = esz - msz;
> +    const intptr_t reg_max = simd_oprsz(desc);
> +    const intptr_t mem_max = reg_max >> diffsz;
> +    const int mmu_idx = cpu_mmu_index(env, false);
> +    intptr_t split, reg_off, mem_off;
> +    void *host;
>
> -/* No-fault loads are like first-fault loads without the
> - * first faulting special case.
> - */
> -#define DO_LDNF1(PART)                                                  \
> -void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg,                \
> -                             target_ulong addr, uint32_t desc)          \
> -{                                                                       \
> -    intptr_t oprsz = simd_oprsz(desc);                                  \
> -    unsigned rd = simd_data(desc);                                      \
> -    void *vd = &env->vfp.zregs[rd];                                     \
> -    mmap_lock();                                                        \
> -    if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) {        \
> -        do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC());            \
> -    } else {                                                            \
> -        do_sve_ldff1##PART(env, vd, vg, addr, oprsz, false, GETPC());   \
> -    }                                                                   \
> -    mmap_unlock();                                                      \
> -}
> +    set_helper_retaddr(retaddr);
>
> +    split = max_for_page(addr, 0, mem_max);
> +    if (likely(split == mem_max)) {
> +        /* The entire operation is within one page.  */
> +        host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> +        if (test_host_page(host)) {
> +            mem_off = host_fn(vd, vg, host, 0, mem_max);
> +            tcg_debug_assert(mem_off == mem_max);
> +            set_helper_retaddr(0);
> +            return;
> +        }
> +    }
> +
> +    /* Skip to the first true predicate.  */
> +    reg_off = find_next_active(vg, 0, reg_max, esz);
> +    if (unlikely(reg_off == reg_max)) {
> +        /* The entire predicate was false; no load occurs.  */
> +        set_helper_retaddr(0);
> +        memset(vd, 0, reg_max);
> +        return;
> +    }
> +    mem_off = reg_off >> diffsz;
> +
> +#ifdef CONFIG_USER_ONLY
> +    /* The page(s) containing this first element at ADDR+MEM_OFF must
> +     * be valid.  Considering that this first element may be misaligned
> +     * and cross a page boundary itself, take the rest of the page from
> +     * the last byte of the element.
> +     */
> +    split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
> +    mem_off = host_fn(vd, vg, g2h(addr), mem_off, split);
> +
> +    /* After any fault, zero any leading predicated false elts.  */
> +    swap_memzero(vd, reg_off);
> +    reg_off = mem_off << diffsz;
>  #else
> +    /* Perform one normal read, which will fault or not.
> +     * But it is likely to bring the page into the tlb.
> +     */
> +    tlb_fn(env, vd, reg_off, addr + mem_off, mmu_idx, retaddr);
>
> -/* TODO: System mode is not yet supported.
> - * This would probably use tlb_vaddr_to_host.
> - */
> -#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H)                     \
> -void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg,        \
> -                  target_ulong addr, uint32_t desc)             \
> -{                                                               \
> -    g_assert_not_reached();                                     \
> -}
> -
> -#define DO_LDNF1(PART)                                          \
> -void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg,        \
> -                  target_ulong addr, uint32_t desc)             \
> -{                                                               \
> -    g_assert_not_reached();                                     \
> -}
> +    /* After any fault, zero any leading predicated false elts.  */
> +    swap_memzero(vd, reg_off);
> +    mem_off += 1 << msz;
> +    reg_off += 1 << esz;
>
> +    /* Try again to read the balance of the page.  */
> +    split = max_for_page(addr, mem_off - 1, mem_max);
> +    if (split >= (1 << msz)) {
> +        host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
> +        if (host) {
> +            mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
> +            reg_off = mem_off << diffsz;
> +        }
> +    }
>  #endif
>
> -DO_LDFF1(bb_r,  cpu_ldub_data_ra, uint8_t, uint8_t, H1)
> -DO_LDFF1(bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
> -DO_LDFF1(bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
> -DO_LDFF1(bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
> -DO_LDFF1(bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
> -DO_LDFF1(bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
> -DO_LDFF1(bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
> +    set_helper_retaddr(0);
> +    record_fault(env, reg_off, reg_max);
> +}
>
> -DO_LDFF1(hh_r,  cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
> -DO_LDFF1(hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
> -DO_LDFF1(hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4)
> -DO_LDFF1(hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
> -DO_LDFF1(hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
> +/*
> + * Common helper for all contiguous no-fault loads.
> + */
> +static void sve_ldnf1_r(CPUARMState *env, void *vg, const target_ulong addr,
> +                        uint32_t desc, const int esz, const int msz,
> +                        sve_ld1_host_fn *host_fn)
> +{
> +    void *vd = &env->vfp.zregs[simd_data(desc)];
> +    const int diffsz = esz - msz;
> +    const intptr_t reg_max = simd_oprsz(desc);
> +    const intptr_t mem_max = reg_max >> diffsz;
> +    intptr_t split, reg_off, mem_off;
> +    void *host;
>
> -DO_LDFF1(ss_r,  cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
> -DO_LDFF1(sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
> -DO_LDFF1(sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
> +#ifdef CONFIG_USER_ONLY
> +    /* Do not set helper_retaddr as there should be no fault.  */
> +    host = g2h(addr);
> +    if (likely(page_check_range(addr, mem_max, PAGE_READ) == 0)) {
> +        /* The entire operation is valid.  */
> +        host_fn(vd, vg, host, 0, mem_max);
> +        return;
> +    }
> +#else
> +    const int mmu_idx = extract32(desc, SIMD_DATA_SHIFT, 4);
> +    /* Unless we can load the entire vector from the same page,
> +     * we need to search for the first active element.
> +     */
> +    split = max_for_page(addr, 0, mem_max);
> +    if (likely(split == mem_max)) {
> +        host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> +        if (host) {
> +            host_fn(vd, vg, host, 0, mem_max);
> +            return;
> +        }
> +    }
> +#endif
>
> -DO_LDFF1(dd_r,  cpu_ldq_data_ra, uint64_t, uint64_t, )
> +    /* There will be no fault, so we may modify in advance.  */
> +    memset(vd, 0, reg_max);
>
> -#undef DO_LDFF1
> +    /* Skip to the first true predicate.  */
> +    reg_off = find_next_active(vg, 0, reg_max, esz);
> +    if (unlikely(reg_off == reg_max)) {
> +        /* The entire predicate was false; no load occurs.  */
> +        return;
> +    }
> +    mem_off = reg_off >> diffsz;
>
> -DO_LDNF1(bb_r)
> -DO_LDNF1(bhu_r)
> -DO_LDNF1(bhs_r)
> -DO_LDNF1(bsu_r)
> -DO_LDNF1(bss_r)
> -DO_LDNF1(bdu_r)
> -DO_LDNF1(bds_r)
> +#ifdef CONFIG_USER_ONLY
> +    if (page_check_range(addr + mem_off, 1 << msz, PAGE_READ) == 0) {
> +        /* At least one load is valid; take the rest of the page.  */
> +        split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
> +        mem_off = host_fn(vd, vg, host, mem_off, split);
> +        reg_off = mem_off << diffsz;
> +    }
> +#else
> +    /* If the address is not in the TLB, we have no way to bring the
> +     * entry into the TLB without also risking a fault.  Note that
> +     * the corollary is that we never load from an address not in RAM.
> +     * ??? This last may be out of spec.

Yes, it is out of spec, in a slightly weird corner case.
Per the MemNF/MemSingleNF pseudocode, an NF load from
Device memory mustn't actually go externally -- it returns
UNKNOWN data instead. But if you map non-RAM with Normal
memory attributes and do an NF load then it should really
access the bus. (Nobody's going to actually do this in the
real world, obviously.)

To do this fully correctly you would need to do a get_phys_addr()
then an address_space_ld*(), checking for either VA->PA failure or
getting back a non-MEMTX_OK result on the physical access.
You'd also need to make get_phys_addr() tell you the Arm memory
attributes so you could avoid doing the access on Device memory.

There are probably also annoying special cases with interactions
with watchpoints set up via the architectural debug.

We can probably ignore all this for now apart from making
a TODO comment...

> +     */
> +    host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
> +    split = max_for_page(addr, mem_off, mem_max);
> +    if (host && split >= (1 << msz)) {
> +        mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
> +        reg_off = mem_off << diffsz;
> +    }
> +#endif
>
> -DO_LDNF1(hh_r)
> -DO_LDNF1(hsu_r)
> -DO_LDNF1(hss_r)
> -DO_LDNF1(hdu_r)
> -DO_LDNF1(hds_r)
> +    record_fault(env, reg_off, reg_max);
> +}
>
> -DO_LDNF1(ss_r)
> -DO_LDNF1(sdu_r)
> -DO_LDNF1(sds_r)
> +#define DO_LDFF1_LDNF1_1(PART, ESZ) \
> +void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg,            \
> +                                 target_ulong addr, uint32_t desc)      \
> +{                                                                       \
> +    sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, 0,                   \
> +                sve_ld1##PART##_host, sve_ld1##PART##_tlb);             \
> +}                                                                       \
> +void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
> +                                 target_ulong addr, uint32_t desc)      \
> +{                                                                       \
> +    sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host);     \
> +}
>
> -DO_LDNF1(dd_r)
> +/* TODO: Propagate the endian check back to the translator.  */
> +#define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
> +void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg,            \
> +                                 target_ulong addr, uint32_t desc)      \
> +{                                                                       \
> +    if (arm_cpu_data_is_big_endian(env)) {                              \
> +        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
> +                    sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb);   \
> +    } else {                                                            \
> +        sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ,             \
> +                    sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb);   \
> +    }                                                                   \
> +}                                                                       \
> +void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg,            \
> +                                 target_ulong addr, uint32_t desc)      \
> +{                                                                       \
> +    if (arm_cpu_data_is_big_endian(env)) {                              \
> +        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
> +                    sve_ld1##PART##_be_host);                           \
> +    } else {                                                            \
> +        sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ,                      \
> +                    sve_ld1##PART##_le_host);                           \
> +    }                                                                   \
> +}
>
> -#undef DO_LDNF1
> +DO_LDFF1_LDNF1_1(bb,  0)
> +DO_LDFF1_LDNF1_1(bhu, 1)
> +DO_LDFF1_LDNF1_1(bhs, 1)
> +DO_LDFF1_LDNF1_1(bsu, 2)
> +DO_LDFF1_LDNF1_1(bss, 2)
> +DO_LDFF1_LDNF1_1(bdu, 3)
> +DO_LDFF1_LDNF1_1(bds, 3)
> +
> +DO_LDFF1_LDNF1_2(hh,  1, 1)
> +DO_LDFF1_LDNF1_2(hsu, 2, 1)
> +DO_LDFF1_LDNF1_2(hss, 2, 1)
> +DO_LDFF1_LDNF1_2(hdu, 3, 1)
> +DO_LDFF1_LDNF1_2(hds, 3, 1)
> +
> +DO_LDFF1_LDNF1_2(ss,  2, 2)
> +DO_LDFF1_LDNF1_2(sdu, 3, 2)
> +DO_LDFF1_LDNF1_2(sds, 3, 2)
> +
> +DO_LDFF1_LDNF1_2(dd,  3, 3)
> +
> +#undef DO_LDFF1_LDNF1_1
> +#undef DO_LDFF1_LDNF1_2
>
>  /*
>   * Store contiguous data, protected by a governing predicate.

Generally the code looks ok, but there was so much of it that
I kind of stopped checking the details...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r
  2018-08-09  4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
@ 2018-08-23 16:04   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:04 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Use the same *_tlb primitives as we use for ld1.  This is not
> a significant change, but does (for linux-user) hoist the set
> of helper_retaddr, and (for softmmu) hoist the computation of
> the current mmu_idx outside the loop.
>
> This does fix the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
@ 2018-08-23 16:06   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This fixes the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
@ 2018-08-23 16:08   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This fixes the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  84 +++++++++----
>  target/arm/sve_helper.c    | 218 +++++++++++++++++++++++----------
>  target/arm/translate-sve.c | 244 +++++++++++++++++++++++++------------
>  3 files changed, 380 insertions(+), 166 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
@ 2018-08-23 16:09   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This fixes the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  52 ++++++++++----
>  target/arm/sve_helper.c    | 139 ++++++++++++++++++++++++-------------
>  target/arm/translate-sve.c |  74 +++++++++++++-------
>  3 files changed, 177 insertions(+), 88 deletions(-)
>

> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 76d3f021e4..0a4756bff9 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -5235,61 +5235,100 @@ DO_LDFF1_ZPZ_D(sve_ldffsds_zd, uint64_t, int32_t,  cpu_ldl_data_ra)
>
>  /* Stores with a vector index.  */
>
> -#define DO_ST1_ZPZ_S(NAME, TYPEI, FN)                                   \
> -void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm,       \
> -                  target_ulong base, uint32_t desc)                     \
> -{                                                                       \
> -    intptr_t i, oprsz = simd_oprsz(desc);                               \
> -    unsigned scale = simd_data(desc);                                   \
> -    uintptr_t ra = GETPC();                                             \
> -    for (i = 0; i < oprsz; ) {                                          \
> -        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));                 \
> -        do {                                                            \
> -            if (likely(pg & 1)) {                                       \
> -                target_ulong off = *(TYPEI *)(vm + H1_4(i));            \
> -                uint32_t d = *(uint32_t *)(vd + H1_4(i));               \
> -                FN(env, base + (off << scale), d, ra);                  \
> -            }                                                           \
> -            i += sizeof(uint32_t), pg >>= sizeof(uint32_t);             \
> -        } while (i & 15);                                               \
> -    }                                                                   \
> +static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
> +                       target_ulong base, uint32_t desc, uintptr_t ra,
> +                       zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
> +{
> +    const int mmu_idx = cpu_mmu_index(env, false);
> +    intptr_t i, oprsz = simd_oprsz(desc);
> +    unsigned scale = simd_data(desc);
> +
> +    set_helper_retaddr(ra);
> +    for (i = 0; i < oprsz; ) {
> +        uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
> +        do {
> +            if (pg & 1) {

Is dropping the "likely()" off this conditional intentional ?


> +                target_ulong off = off_fn(vm, i);
> +                tlb_fn(env, vd, i, base + (off << scale), mmu_idx, ra);
> +            }
> +            i += 4, pg >>= 4;
> +        } while (i & 15);
> +    }
> +    set_helper_retaddr(0);
>  }
>
Either way,
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
@ 2018-08-23 16:10   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:10 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This implements the feature for softmmu, and moves the
> main loop out of a macro and into a function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/helper-sve.h    |  84 ++++++++---
>  target/arm/sve_helper.c    | 290 +++++++++++++++++++++++++++----------
>  target/arm/translate-sve.c |  84 +++++------
>  3 files changed, 321 insertions(+), 137 deletions(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers
  2018-08-09  4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
@ 2018-08-23 16:23   ` Peter Maydell
  0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:23 UTC (permalink / raw)
  To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> There is quite a lot of code required to compute cpu_mem_index,
> or even put together the full TCGMemOpIdx.  This can easily be
> done at translation time.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/internals.h     |   5 ++
>  target/arm/sve_helper.c    | 138 +++++++++++++++++++------------------
>  target/arm/translate-sve.c |  67 +++++++++++-------
>  3 files changed, 121 insertions(+), 89 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode
  2018-08-17 16:22   ` Peter Maydell
@ 2018-08-25 19:41     ` Richard Henderson
  0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-25 19:41 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée

On 08/17/2018 09:22 AM, Peter Maydell wrote:
>> +    /*
>> +     * When FP is enabled, but SVE is disabled, the effective len is 0.
>> +     * ??? How should sve_exception_el interact with AArch32 state?
>> +     * That isn't included in the CheckSVEEnabled pseudocode, so is the
>> +     * host kernel required to explicitly disable SVE for an EL using aa32?
>> +     */
> I'm not clear what you're asking here. If the EL is AArch32
> then why does it make a difference if SVE is enabled or disabled?
> You can't get at it...
> 
>> +    old_len = (sve_exception_el(env, old_el)
>> +               ? 0 : sve_zcr_len_for_el(env, old_el));
>> +    new_len = (sve_exception_el(env, new_el)
>> +               ? 0 : sve_zcr_len_for_el(env, new_el));

Yes the registers are inaccessible.  But...

It may be that we must produce old_len/new_len == 0 if old_el/new_el is in
32-bit mode, so that the high part of the SVE registers are zeroed.

However, it may be UNDEFINED what happens if the OS switches to an el in 32-bit
mode while CPACR.ZEN == 3.  And if so, then there may be no point in adding an
additional test here.

So far I have re-worded the comment as:

     * ??? Do we need a conditional for old_el/new_el in aa32 state?
     * That isn't included in the CheckSVEEnabled pseudocode, so is the
     * host kernel required to explicitly disable SVE for an EL using aa32?

Thoughts on the underlying issue?


r~

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2018-08-25 19:41 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-09  4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE " Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
2018-08-17 15:50   ` Peter Maydell
2018-08-09  4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
2018-08-17 15:57   ` Peter Maydell
2018-08-09  4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
2018-08-17 16:02   ` Peter Maydell
2018-08-17 16:47     ` Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
2018-08-17 16:03   ` Peter Maydell
2018-08-17 16:51     ` Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
2018-08-17 16:03   ` Peter Maydell
2018-08-17 16:10     ` Laurent Desnogues
2018-08-17 16:23       ` Peter Maydell
2018-08-09  4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
2018-08-09 18:01   ` Alex Bennée
2018-08-09 18:50     ` Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
2018-08-17 16:22   ` Peter Maydell
2018-08-25 19:41     ` Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
2018-08-17 16:35   ` Peter Maydell
2018-08-09  4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
2018-08-23 15:21   ` Peter Maydell
2018-08-23 15:37     ` Richard Henderson
2018-08-09  4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
2018-08-10  9:13   ` Alex Bennée
2018-08-10 19:15     ` Richard Henderson
2018-08-23 16:01   ` Peter Maydell
2018-08-09  4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
2018-08-23 16:04   ` Peter Maydell
2018-08-09  4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
2018-08-23 16:06   ` Peter Maydell
2018-08-09  4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
2018-08-11  5:40   ` Philippe Mathieu-Daudé
2018-08-09  4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
2018-08-11  5:41   ` Philippe Mathieu-Daudé
2018-08-09  4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
2018-08-23 16:08   ` Peter Maydell
2018-08-09  4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
2018-08-23 16:09   ` Peter Maydell
2018-08-09  4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
2018-08-23 16:10   ` Peter Maydell
2018-08-09  4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
2018-08-23 16:23   ` Peter Maydell
2018-08-09  5:48 ` [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Laurent Desnogues
2018-08-18  9:15 ` no-reply
2018-08-18 10:01 ` no-reply

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).