* [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
@ 2018-08-09 4:21 Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
` (22 more replies)
0 siblings, 23 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
This is my current set of patches for running SVE in system mode.
The first half deal with the system registers that affect SVE.
I recall that Peter has said he'd like the first patch to be
done a different way, but we haven't had a chance to talk about
what form it should take. I've left it as-is since it does what
I need for now.
The second half re-implement the SVE memory operations.
The FF and NF loads had been stubbed out. Getting those to work
requires some infrastructure that can be reused to speed up normal
loads -- one guest-to-host tlb lookup can be reused for the rest
of the page.
r~
Based-on: <20180809034033.10579-1-richard.henderson@linaro.org>
Richard Henderson (20):
target/arm: Set ISAR bits for -cpu max
target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
target/arm: Define ID_AA64ZFR0_EL1
target/arm: Adjust sve_exception_el
target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
target/arm: Fix arm_current_el for user-only
target/arm: Fix is_a64 for user-only
target/arm: Pass in current_el to fp and sve_exception_el
target/arm: Handle SVE vector length changes in system mode
target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
target/arm: Clear unused predicate bits for LD1RQ
target/arm: Rewrite helper_sve_ld1*_r using pages
target/arm: Rewrite helper_sve_ld[234]*_r
target/arm: Rewrite helper_sve_st[1234]*_r
target/arm: Split contiguous loads for endianness
target/arm: Split contiguous stores for endianness
target/arm: Rewrite vector gather loads
target/arm: Rewrite vector gather stores
target/arm: Rewrite vector gather first-fault loads
target/arm: Pass TCGMemOpIdx to sve memory helpers
target/arm/cpu.h | 47 +-
target/arm/helper-sve.h | 385 +++++--
target/arm/internals.h | 5 +
target/arm/cpu.c | 24 +-
target/arm/cpu64.c | 93 +-
target/arm/helper.c | 237 +++--
target/arm/op_helper.c | 1 +
target/arm/sve_helper.c | 2062 +++++++++++++++++++++++++-----------
target/arm/translate-a64.c | 8 +-
target/arm/translate-sve.c | 670 ++++++++----
10 files changed, 2453 insertions(+), 1079 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE " Richard Henderson
` (21 subsequent siblings)
22 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
For the supported extensions, fill in the appropriate bits in
ID_ISAR5, ID_ISAR6, ID_AA64ISAR0, ID_AA64ISAR1.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.c | 24 +++++++++++++++++-------
target/arm/cpu64.c | 36 ++++++++++++++++++++++++++++--------
2 files changed, 45 insertions(+), 15 deletions(-)
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index b25898ed4c..71daa39e86 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1802,19 +1802,29 @@ static void arm_max_initfn(Object *obj)
kvm_arm_set_cpu_features_from_host(cpu);
} else {
cortex_a15_initfn(obj);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_AES);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 4, 4, 2);
+ set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 8, 4, 1);
+ set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 12, 4, 1);
+ set_feature(&cpu->env, ARM_FEATURE_CRC);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 16, 4, 1);
+ set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 24, 4, 1);
+ set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 28, 4, 1);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
+ cpu->id_isar6 = deposit32(cpu->id_isar6, 4, 4, 1);
+
#ifdef CONFIG_USER_ONLY
/* We don't set these in system emulation mode for the moment,
* since we don't correctly set the ID registers to advertise them,
*/
set_feature(&cpu->env, ARM_FEATURE_V8);
- set_feature(&cpu->env, ARM_FEATURE_V8_AES);
- set_feature(&cpu->env, ARM_FEATURE_V8_SHA1);
- set_feature(&cpu->env, ARM_FEATURE_V8_SHA256);
set_feature(&cpu->env, ARM_FEATURE_V8_PMULL);
- set_feature(&cpu->env, ARM_FEATURE_CRC);
- set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
- set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
- set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
#endif
}
}
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 800bff780e..4d629bb99b 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -254,6 +254,34 @@ static void aarch64_max_initfn(Object *obj)
kvm_arm_set_cpu_features_from_host(cpu);
} else {
aarch64_a57_initfn(obj);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_SHA512);
+ cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 12, 4, 2);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_ATOMICS);
+ cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 20, 4, 2);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
+ cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 28, 4, 1);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 24, 4, 1);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_SHA3);
+ cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 32, 4, 1);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_SM3);
+ cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 36, 4, 1);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
+ cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 40, 4, 1);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
+ cpu->id_aa64isar0 = deposit64(cpu->id_aa64isar0, 44, 4, 1);
+ cpu->id_isar6 = deposit32(cpu->id_isar6, 4, 4, 1);
+
+ set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
+ cpu->id_aa64isar1 = deposit64(cpu->id_aa64isar1, 16, 4, 1);
+ cpu->id_isar5 = deposit32(cpu->id_isar5, 28, 4, 1);
+
#ifdef CONFIG_USER_ONLY
/* We don't set these in system emulation mode for the moment,
* since we don't correctly set the ID registers to advertise them,
@@ -261,15 +289,7 @@ static void aarch64_max_initfn(Object *obj)
* whereas the architecture requires them to be present in both if
* present in either.
*/
- set_feature(&cpu->env, ARM_FEATURE_V8_SHA512);
- set_feature(&cpu->env, ARM_FEATURE_V8_SHA3);
- set_feature(&cpu->env, ARM_FEATURE_V8_SM3);
- set_feature(&cpu->env, ARM_FEATURE_V8_SM4);
- set_feature(&cpu->env, ARM_FEATURE_V8_ATOMICS);
- set_feature(&cpu->env, ARM_FEATURE_V8_RDM);
- set_feature(&cpu->env, ARM_FEATURE_V8_DOTPROD);
set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
- set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
set_feature(&cpu->env, ARM_FEATURE_SVE);
/* For usermode -cpu max we can use a larger and more efficient DCZ
* blocksize since we don't have to follow what the hardware does.
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
` (20 subsequent siblings)
22 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
This it a hair out of spec in that we have and advertise, support
for fp16 in aarch64 mode, but do not have nor advertise the same
in aarch32 mode. Rationale as commented.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu64.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 4d629bb99b..ae650b608e 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -282,15 +282,24 @@ static void aarch64_max_initfn(Object *obj)
cpu->id_aa64isar1 = deposit64(cpu->id_aa64isar1, 16, 4, 1);
cpu->id_isar5 = deposit32(cpu->id_isar5, 28, 4, 1);
-#ifdef CONFIG_USER_ONLY
- /* We don't set these in system emulation mode for the moment,
- * since we don't correctly set the ID registers to advertise them,
- * and in some cases they're only available in AArch64 and not AArch32,
- * whereas the architecture requires them to be present in both if
- * present in either.
+ /* TODO: This is not yet implemented for AArch32, whereas the
+ * architecture requires a feature to be present in both if
+ * it is present in either. However, it is required by SVE,
+ * so we don't want to leave it out of AArch64 state.
+ *
+ * Practically, the Linux kernel does not query the MVFR1 bit
+ * nor expose this as a HWCAP bit to AArch32 userland. Thus
+ * userland, if it wanted to use fp16, would have to probe for
+ * support by executing an insn and checking for SIGILL.
+ * At which point it will get the correct answer: unsupported.
*/
set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
+ cpu->id_aa64pfr0 = deposit64(cpu->id_aa64pfr0, 20, 4, 1);
+
set_feature(&cpu->env, ARM_FEATURE_SVE);
+ cpu->id_aa64pfr0 = deposit64(cpu->id_aa64pfr0, 32, 4, 1);
+
+#ifdef CONFIG_USER_ONLY
/* For usermode -cpu max we can use a larger and more efficient DCZ
* blocksize since we don't have to follow what the hardware does.
*/
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE " Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-17 15:50 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
` (19 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Given that the only field defined for this new register may only
be 0, we don't actually need to change anything except the name.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index c24c66d43e..61a79e4c44 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4956,9 +4956,10 @@ void register_cp_regs_for_features(ARMCPU *cpu)
.opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 3,
.access = PL1_R, .type = ARM_CP_CONST,
.resetvalue = 0 },
- { .name = "ID_AA64PFR4_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
+ { .name = "ID_AA64ZFR0_EL1", .state = ARM_CP_STATE_AA64,
.opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 4,
.access = PL1_R, .type = ARM_CP_CONST,
+ /* At present, only SVEver == 0 is defined anyway. */
.resetvalue = 0 },
{ .name = "ID_AA64PFR5_EL1_RESERVED", .state = ARM_CP_STATE_AA64,
.opc0 = 3, .opc1 = 0, .crn = 0, .crm = 4, .opc2 = 5,
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (2 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-17 15:57 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
` (18 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Check for EL3 before testing CPTR_EL3.EZ. Return 0 when the exception
should be routed via AdvSIMDFPAccessTrap. Mirror the structure of
CheckSVEEnabled more closely.
Fixes: 5be5e8eda78
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.c | 96 ++++++++++++++++++++++-----------------------
1 file changed, 46 insertions(+), 50 deletions(-)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 61a79e4c44..26e9098c5f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4338,67 +4338,63 @@ static const ARMCPRegInfo debug_lpae_cp_reginfo[] = {
REGINFO_SENTINEL
};
-/* Return the exception level to which SVE-disabled exceptions should
- * be taken, or 0 if SVE is enabled.
+/* Return the exception level to which exceptions should be taken
+ * via SVEAccessTrap. If an exception should be routed through
+ * AArch64.AdvSIMDFPAccessTrap, return 0; fp_exception_el should
+ * take care of raising that exception.
+ * C.f. the ARM pseudocode function CheckSVEEnabled.
*/
static int sve_exception_el(CPUARMState *env)
{
#ifndef CONFIG_USER_ONLY
unsigned current_el = arm_current_el(env);
- /* The CPACR.ZEN controls traps to EL1:
- * 0, 2 : trap EL0 and EL1 accesses
- * 1 : trap only EL0 accesses
- * 3 : trap no accesses
+ if (current_el <= 1) {
+ bool disabled = false;
+
+ /* The CPACR.ZEN controls traps to EL1:
+ * 0, 2 : trap EL0 and EL1 accesses
+ * 1 : trap only EL0 accesses
+ * 3 : trap no accesses
+ */
+ if (!extract32(env->cp15.cpacr_el1, 16, 1)) {
+ disabled = true;
+ } else if (!extract32(env->cp15.cpacr_el1, 17, 1)) {
+ disabled = current_el == 0;
+ }
+ if (disabled) {
+ /* route_to_el2 */
+ return (arm_feature(env, ARM_FEATURE_EL2)
+ && !arm_is_secure(env)
+ && (env->cp15.hcr_el2 & HCR_TGE) ? 2 : 1);
+ }
+
+ /* Check CPACR.FPEN. */
+ if (!extract32(env->cp15.cpacr_el1, 20, 1)) {
+ disabled = true;
+ } else if (!extract32(env->cp15.cpacr_el1, 21, 1)) {
+ disabled = current_el == 0;
+ }
+ if (disabled) {
+ return 0;
+ }
+ }
+
+ /* CPTR_EL2. Since TZ and TFP are positive,
+ * they will be zero when EL2 is not present.
*/
- switch (extract32(env->cp15.cpacr_el1, 16, 2)) {
- default:
- if (current_el <= 1) {
- /* Trap to PL1, which might be EL1 or EL3 */
- if (arm_is_secure(env) && !arm_el_is_aa64(env, 3)) {
- return 3;
- }
- return 1;
+ if (current_el <= 2 && !arm_is_secure_below_el3(env)) {
+ if (env->cp15.cptr_el[2] & CPTR_TZ) {
+ return 2;
}
- break;
- case 1:
- if (current_el == 0) {
- return 1;
+ if (env->cp15.cptr_el[2] & CPTR_TFP) {
+ return 0;
}
- break;
- case 3:
- break;
}
- /* Similarly for CPACR.FPEN, after having checked ZEN. */
- switch (extract32(env->cp15.cpacr_el1, 20, 2)) {
- default:
- if (current_el <= 1) {
- if (arm_is_secure(env) && !arm_el_is_aa64(env, 3)) {
- return 3;
- }
- return 1;
- }
- break;
- case 1:
- if (current_el == 0) {
- return 1;
- }
- break;
- case 3:
- break;
- }
-
- /* CPTR_EL2. Check both TZ and TFP. */
- if (current_el <= 2
- && (env->cp15.cptr_el[2] & (CPTR_TFP | CPTR_TZ))
- && !arm_is_secure_below_el3(env)) {
- return 2;
- }
-
- /* CPTR_EL3. Check both EZ and TFP. */
- if (!(env->cp15.cptr_el[3] & CPTR_EZ)
- || (env->cp15.cptr_el[3] & CPTR_TFP)) {
+ /* CPTR_EL3. Since EZ is negative we must check for EL3. */
+ if (arm_feature(env, ARM_FEATURE_EL3)
+ && !(env->cp15.cptr_el[3] & CPTR_EZ)) {
return 3;
}
#endif
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (3 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-17 16:02 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
` (17 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Unlike aa32, endianness cannot be adjusted by userland in aa64.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 9526ed27cb..2d6d7d03aa 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2709,8 +2709,6 @@ static inline bool arm_sctlr_b(CPUARMState *env)
/* Return true if the processor is in big-endian mode. */
static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
{
- int cur_el;
-
/* In 32bit endianness is determined by looking at CPSR's E bit */
if (!is_a64(env)) {
return
@@ -2729,15 +2727,24 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
arm_sctlr_b(env) ||
#endif
((env->uncached_cpsr & CPSR_E) ? 1 : 0);
+ } else {
+#ifdef CONFIG_USER_ONLY
+ /* AArch64 does not have a SETEND instruction; endianness
+ * for usermode is fixed at compile-time.
+ */
+# ifdef TARGET_WORDS_BIGENDIAN
+ return true;
+# else
+ return false;
+# endif
+#else
+ int cur_el = arm_current_el(env);
+ if (cur_el == 0) {
+ return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
+ }
+ return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
+#endif
}
-
- cur_el = arm_current_el(env);
-
- if (cur_el == 0) {
- return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
- }
-
- return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
}
#include "exec/cpu-all.h"
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (4 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-17 16:03 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
` (16 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Saves about 12k code size in qemu-aarch64.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2d6d7d03aa..aedaf2631e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1958,6 +1958,9 @@ static inline bool arm_v7m_is_handler_mode(CPUARMState *env)
*/
static inline int arm_current_el(CPUARMState *env)
{
+#ifdef CONFIG_USER_ONLY
+ return 0;
+#else
if (arm_feature(env, ARM_FEATURE_M)) {
return arm_v7m_is_handler_mode(env) ||
!(env->v7m.control[env->v7m.secure] & 1);
@@ -1984,6 +1987,7 @@ static inline int arm_current_el(CPUARMState *env)
return 1;
}
+#endif
}
typedef struct ARMCPRegInfo ARMCPRegInfo;
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (5 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-17 16:03 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
` (15 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Saves about 8k code size in qemu-aarch64.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index aedaf2631e..ed51a2f5aa 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -918,7 +918,15 @@ void aarch64_sync_64_to_32(CPUARMState *env);
static inline bool is_a64(CPUARMState *env)
{
+#ifdef CONFIG_USER_ONLY
+# ifdef TARGET_AARCH64
+ return true;
+# else
+ return false;
+# endif
+#else
return env->aarch64;
+#endif
}
/* you can call this signal handler from your SIGBUS and SIGSEGV
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (6 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-09 18:01 ` Alex Bennée
2018-08-09 4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
` (14 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
We are going to want to determine whether sve is enabled
for EL than current.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 26e9098c5f..290b1a849e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4344,12 +4344,10 @@ static const ARMCPRegInfo debug_lpae_cp_reginfo[] = {
* take care of raising that exception.
* C.f. the ARM pseudocode function CheckSVEEnabled.
*/
-static int sve_exception_el(CPUARMState *env)
+static int sve_exception_el(CPUARMState *env, int el)
{
#ifndef CONFIG_USER_ONLY
- unsigned current_el = arm_current_el(env);
-
- if (current_el <= 1) {
+ if (el <= 1) {
bool disabled = false;
/* The CPACR.ZEN controls traps to EL1:
@@ -4360,7 +4358,7 @@ static int sve_exception_el(CPUARMState *env)
if (!extract32(env->cp15.cpacr_el1, 16, 1)) {
disabled = true;
} else if (!extract32(env->cp15.cpacr_el1, 17, 1)) {
- disabled = current_el == 0;
+ disabled = el == 0;
}
if (disabled) {
/* route_to_el2 */
@@ -4373,7 +4371,7 @@ static int sve_exception_el(CPUARMState *env)
if (!extract32(env->cp15.cpacr_el1, 20, 1)) {
disabled = true;
} else if (!extract32(env->cp15.cpacr_el1, 21, 1)) {
- disabled = current_el == 0;
+ disabled = el == 0;
}
if (disabled) {
return 0;
@@ -4383,7 +4381,7 @@ static int sve_exception_el(CPUARMState *env)
/* CPTR_EL2. Since TZ and TFP are positive,
* they will be zero when EL2 is not present.
*/
- if (current_el <= 2 && !arm_is_secure_below_el3(env)) {
+ if (el <= 2 && !arm_is_secure_below_el3(env)) {
if (env->cp15.cptr_el[2] & CPTR_TZ) {
return 2;
}
@@ -12318,11 +12316,10 @@ uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
/* Return the exception level to which FP-disabled exceptions should
* be taken, or 0 if FP is enabled.
*/
-static inline int fp_exception_el(CPUARMState *env)
+static int fp_exception_el(CPUARMState *env, int cur_el)
{
#ifndef CONFIG_USER_ONLY
int fpen;
- int cur_el = arm_current_el(env);
/* CPACR and the CPTR registers don't exist before v6, so FP is
* always accessible
@@ -12385,11 +12382,12 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
target_ulong *cs_base, uint32_t *pflags)
{
ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
- int fp_el = fp_exception_el(env);
+ int current_el = arm_current_el(env);
+ int fp_el = fp_exception_el(env, current_el);
uint32_t flags;
if (is_a64(env)) {
- int sve_el = sve_exception_el(env);
+ int sve_el = sve_exception_el(env, current_el);
uint32_t zcr_len;
*pc = env->pc;
@@ -12404,7 +12402,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
if (sve_el != 0 && fp_el == 0) {
zcr_len = 0;
} else {
- int current_el = arm_current_el(env);
ARMCPU *cpu = arm_env_get_cpu(env);
zcr_len = cpu->sve_max_vq - 1;
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (7 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-17 16:22 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
` (13 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
SVE vector length can change when changing EL, or when writing
to one of the ZCR_ELn registers.
For correctness, our implementation requires that predicate bits
that are inaccessible are never set. Which means noticing length
changes and zeroing the appropriate register bits.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 4 ++
target/arm/cpu64.c | 42 --------------
target/arm/helper.c | 127 ++++++++++++++++++++++++++++++++++++-----
target/arm/op_helper.c | 1 +
4 files changed, 119 insertions(+), 55 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index ed51a2f5aa..18b3c92c2e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -910,6 +910,10 @@ int arm_cpu_write_elf32_note(WriteCoreDumpFunction f, CPUState *cs,
int aarch64_cpu_gdb_read_register(CPUState *cpu, uint8_t *buf, int reg);
int aarch64_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
+void aarch64_sve_change_el(CPUARMState *env, int old_el, int new_el);
+#else
+static inline void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq) { }
+static inline void aarch64_sve_change_el(CPUARMState *env, int o, int n) { }
#endif
target_ulong do_arm_semihosting(CPUARMState *env);
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index ae650b608e..16272f1358 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -439,45 +439,3 @@ static void aarch64_cpu_register_types(void)
}
type_init(aarch64_cpu_register_types)
-
-/* The manual says that when SVE is enabled and VQ is widened the
- * implementation is allowed to zero the previously inaccessible
- * portion of the registers. The corollary to that is that when
- * SVE is enabled and VQ is narrowed we are also allowed to zero
- * the now inaccessible portion of the registers.
- *
- * The intent of this is that no predicate bit beyond VQ is ever set.
- * Which means that some operations on predicate registers themselves
- * may operate on full uint64_t or even unrolled across the maximum
- * uint64_t[4]. Performing 4 bits of host arithmetic unconditionally
- * may well be cheaper than conditionals to restrict the operation
- * to the relevant portion of a uint16_t[16].
- *
- * TODO: Need to call this for changes to the real system registers
- * and EL state changes.
- */
-void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
-{
- int i, j;
- uint64_t pmask;
-
- assert(vq >= 1 && vq <= ARM_MAX_VQ);
- assert(vq <= arm_env_get_cpu(env)->sve_max_vq);
-
- /* Zap the high bits of the zregs. */
- for (i = 0; i < 32; i++) {
- memset(&env->vfp.zregs[i].d[2 * vq], 0, 16 * (ARM_MAX_VQ - vq));
- }
-
- /* Zap the high bits of the pregs and ffr. */
- pmask = 0;
- if (vq & 3) {
- pmask = ~(-1ULL << (16 * (vq & 3)));
- }
- for (j = vq / 4; j < ARM_MAX_VQ / 4; j++) {
- for (i = 0; i < 17; ++i) {
- env->vfp.pregs[i].p[j] &= pmask;
- }
- pmask = 0;
- }
-}
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 290b1a849e..fb79b27cf6 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4399,11 +4399,44 @@ static int sve_exception_el(CPUARMState *env, int el)
return 0;
}
+/*
+ * Given that SVE is enabled, return the vector length for EL.
+ */
+static uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+{
+ ARMCPU *cpu = arm_env_get_cpu(env);
+ uint32_t zcr_len = cpu->sve_max_vq - 1;
+
+ if (el <= 1) {
+ zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
+ }
+ if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+ zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
+ }
+ if (el < 3 && arm_feature(env, ARM_FEATURE_EL3)) {
+ zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
+ }
+ return zcr_len;
+}
+
static void zcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
{
+ int cur_el = arm_current_el(env);
+ int old_len = sve_zcr_len_for_el(env, cur_el);
+ int new_len;
+
/* Bits other than [3:0] are RAZ/WI. */
raw_write(env, ri, value & 0xf);
+
+ /*
+ * Because we arrived here, we know both FP and SVE are enabled;
+ * otherwise we would have trapped access to the ZCR_ELn register.
+ */
+ new_len = sve_zcr_len_for_el(env, cur_el);
+ if (new_len < old_len) {
+ aarch64_sve_narrow_vq(env, new_len + 1);
+ }
}
static const ARMCPRegInfo zcr_el1_reginfo = {
@@ -8100,8 +8133,11 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
unsigned int new_el = env->exception.target_el;
target_ulong addr = env->cp15.vbar_el[new_el];
unsigned int new_mode = aarch64_pstate_mode(new_el, true);
+ unsigned int cur_el = arm_current_el(env);
- if (arm_current_el(env) < new_el) {
+ aarch64_sve_change_el(env, cur_el, new_el);
+
+ if (cur_el < new_el) {
/* Entry vector offset depends on whether the implemented EL
* immediately lower than the target level is using AArch32 or AArch64
*/
@@ -12402,18 +12438,7 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
if (sve_el != 0 && fp_el == 0) {
zcr_len = 0;
} else {
- ARMCPU *cpu = arm_env_get_cpu(env);
-
- zcr_len = cpu->sve_max_vq - 1;
- if (current_el <= 1) {
- zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[1]);
- }
- if (current_el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
- zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[2]);
- }
- if (current_el < 3 && arm_feature(env, ARM_FEATURE_EL3)) {
- zcr_len = MIN(zcr_len, 0xf & (uint32_t)env->vfp.zcr_el[3]);
- }
+ zcr_len = sve_zcr_len_for_el(env, current_el);
}
flags |= zcr_len << ARM_TBFLAG_ZCR_LEN_SHIFT;
} else {
@@ -12467,3 +12492,79 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
*pflags = flags;
*cs_base = 0;
}
+
+#ifdef TARGET_AARCH64
+/*
+ * The manual says that when SVE is enabled and VQ is widened the
+ * implementation is allowed to zero the previously inaccessible
+ * portion of the registers. The corollary to that is that when
+ * SVE is enabled and VQ is narrowed we are also allowed to zero
+ * the now inaccessible portion of the registers.
+ *
+ * The intent of this is that no predicate bit beyond VQ is ever set.
+ * Which means that some operations on predicate registers themselves
+ * may operate on full uint64_t or even unrolled across the maximum
+ * uint64_t[4]. Performing 4 bits of host arithmetic unconditionally
+ * may well be cheaper than conditionals to restrict the operation
+ * to the relevant portion of a uint16_t[16].
+ */
+void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq)
+{
+ int i, j;
+ uint64_t pmask;
+
+ assert(vq >= 1 && vq <= ARM_MAX_VQ);
+ assert(vq <= arm_env_get_cpu(env)->sve_max_vq);
+
+ /* Zap the high bits of the zregs. */
+ for (i = 0; i < 32; i++) {
+ memset(&env->vfp.zregs[i].d[2 * vq], 0, 16 * (ARM_MAX_VQ - vq));
+ }
+
+ /* Zap the high bits of the pregs and ffr. */
+ pmask = 0;
+ if (vq & 3) {
+ pmask = ~(-1ULL << (16 * (vq & 3)));
+ }
+ for (j = vq / 4; j < ARM_MAX_VQ / 4; j++) {
+ for (i = 0; i < 17; ++i) {
+ env->vfp.pregs[i].p[j] &= pmask;
+ }
+ pmask = 0;
+ }
+}
+
+/*
+ * Notice a change in SVE vector size when changing EL.
+ */
+void aarch64_sve_change_el(CPUARMState *env, int old_el, int new_el)
+{
+ int old_len, new_len;
+
+ /* Nothing to do if no SVE. */
+ if (!arm_feature(env, ARM_FEATURE_SVE)) {
+ return;
+ }
+
+ /* Nothing to do if FP is disabled in either EL. */
+ if (fp_exception_el(env, old_el) || fp_exception_el(env, new_el)) {
+ return;
+ }
+
+ /*
+ * When FP is enabled, but SVE is disabled, the effective len is 0.
+ * ??? How should sve_exception_el interact with AArch32 state?
+ * That isn't included in the CheckSVEEnabled pseudocode, so is the
+ * host kernel required to explicitly disable SVE for an EL using aa32?
+ */
+ old_len = (sve_exception_el(env, old_el)
+ ? 0 : sve_zcr_len_for_el(env, old_el));
+ new_len = (sve_exception_el(env, new_el)
+ ? 0 : sve_zcr_len_for_el(env, new_el));
+
+ /* When changing vector length, clear inaccessible state. */
+ if (new_len < old_len) {
+ aarch64_sve_narrow_vq(env, new_len + 1);
+ }
+}
+#endif
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index f728f25e4b..b9f920b3c4 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -1068,6 +1068,7 @@ void HELPER(exception_return)(CPUARMState *env)
"AArch64 EL%d PC 0x%" PRIx64 "\n",
cur_el, new_el, env->pc);
}
+ aarch64_sve_change_el(env, cur_el, new_el);
qemu_mutex_lock_iothread();
arm_call_el_change_hook(arm_env_get_cpu(env));
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (8 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-17 16:35 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
` (12 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Use the existing helpers to determine if (1) the fpu is enabled,
(2) sve state is enabled, and (3) the current sve vector length.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 4 ++++
target/arm/helper.c | 6 +++---
target/arm/translate-a64.c | 8 ++++++--
3 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 18b3c92c2e..33d06f2340 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -920,6 +920,10 @@ target_ulong do_arm_semihosting(CPUARMState *env);
void aarch64_sync_32_to_64(CPUARMState *env);
void aarch64_sync_64_to_32(CPUARMState *env);
+int fp_exception_el(CPUARMState *env, int cur_el);
+int sve_exception_el(CPUARMState *env, int cur_el);
+uint32_t sve_zcr_len_for_el(CPUARMState *env, int el);
+
static inline bool is_a64(CPUARMState *env)
{
#ifdef CONFIG_USER_ONLY
diff --git a/target/arm/helper.c b/target/arm/helper.c
index fb79b27cf6..64ff71b722 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4344,7 +4344,7 @@ static const ARMCPRegInfo debug_lpae_cp_reginfo[] = {
* take care of raising that exception.
* C.f. the ARM pseudocode function CheckSVEEnabled.
*/
-static int sve_exception_el(CPUARMState *env, int el)
+int sve_exception_el(CPUARMState *env, int el)
{
#ifndef CONFIG_USER_ONLY
if (el <= 1) {
@@ -4402,7 +4402,7 @@ static int sve_exception_el(CPUARMState *env, int el)
/*
* Given that SVE is enabled, return the vector length for EL.
*/
-static uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
+uint32_t sve_zcr_len_for_el(CPUARMState *env, int el)
{
ARMCPU *cpu = arm_env_get_cpu(env);
uint32_t zcr_len = cpu->sve_max_vq - 1;
@@ -12352,7 +12352,7 @@ uint32_t HELPER(crc32c)(uint32_t acc, uint32_t val, uint32_t bytes)
/* Return the exception level to which FP-disabled exceptions should
* be taken, or 0 if FP is enabled.
*/
-static int fp_exception_el(CPUARMState *env, int cur_el)
+int fp_exception_el(CPUARMState *env, int cur_el)
{
#ifndef CONFIG_USER_ONLY
int fpen;
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index b29dc49c4f..4a0ca8c906 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -166,11 +166,15 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
cpu_fprintf(f, "\n");
return;
}
+ if (fp_exception_el(env, el) != 0) {
+ cpu_fprintf(f, " FPU disabled\n");
+ return;
+ }
cpu_fprintf(f, " FPCR=%08x FPSR=%08x\n",
vfp_get_fpcr(env), vfp_get_fpsr(env));
- if (arm_feature(env, ARM_FEATURE_SVE)) {
- int j, zcr_len = env->vfp.zcr_el[1] & 0xf; /* fix for system mode */
+ if (arm_feature(env, ARM_FEATURE_SVE) && sve_exception_el(env, el) == 0) {
+ int j, zcr_len = sve_zcr_len_for_el(env, el);
for (i = 0; i <= FFR_PRED_NUM; i++) {
bool eol;
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (9 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-23 15:21 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
` (11 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
The 16-byte load only uses 16 predicate bits. But while
reusing the other load infrastructure, we find other bits
that are set and trigger an assert. To avoid this and
retain the assert, zero-extend the predicate that we pass
to the LD1 helper.
Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate-sve.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d27bc8c946..bef6b8242d 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4765,12 +4765,33 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
unsigned vsz = vec_full_reg_size(s);
TCGv_ptr t_pg;
TCGv_i32 desc;
+ int poff;
/* Load the first quadword using the normal predicated load helpers. */
desc = tcg_const_i32(simd_desc(16, 16, zt));
- t_pg = tcg_temp_new_ptr();
- tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
+ poff = pred_full_reg_offset(s, pg);
+ if (vsz > 16) {
+ /*
+ * Zero-extend the first 16 bits of the predicate into a temporary.
+ * This avoids triggering an assert making sure we don't have bits
+ * set within a predicate beyond VQ, but we have lowered VQ to 1
+ * for this load operation.
+ */
+ TCGv_i64 tmp = tcg_temp_new_i64();
+#ifdef HOST_WORDS_BIGENDIAN
+ poff += 6;
+#endif
+ tcg_gen_ld16u_i64(tmp, cpu_env, poff);
+
+ poff = offsetof(CPUARMState, vfp.preg_tmp);
+ tcg_gen_st_i64(tmp, cpu_env, poff);
+ tcg_temp_free_i64(tmp);
+ }
+
+ t_pg = tcg_temp_new_ptr();
+ tcg_gen_addi_ptr(t_pg, cpu_env, poff);
+
fns[msz](cpu_env, t_pg, addr, desc);
tcg_temp_free_ptr(t_pg);
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (10 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-10 9:13 ` Alex Bennée
2018-08-23 16:01 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
` (10 subsequent siblings)
22 siblings, 2 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Uses tlb_vaddr_to_host for correct operation with softmmu.
Optimize for accesses within a single page or pair of pages.
Perf report comparison for cortex-strings test-strlen
with aarch64-linux-user:
before:
1.59% qemu-aarch64 qemu-aarch64 [.] do_sve_ld1bb_r
0.86% qemu-aarch64 qemu-aarch64 [.] do_sve_ldff1bb_r
after:
0.09% qemu-aarch64 qemu-aarch64 [.] helper_sve_ldff1bb_r
0.01% qemu-aarch64 qemu-aarch64 [.] sve_ld1bb_host
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/sve_helper.c | 839 ++++++++++++++++++++++++++++++++--------
1 file changed, 675 insertions(+), 164 deletions(-)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index e03f954a26..4ca9412e20 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1688,6 +1688,45 @@ static void swap_memmove(void *vd, void *vs, size_t n)
}
}
+/* Similarly for memset of 0. */
+static void swap_memzero(void *vd, size_t n)
+{
+ uintptr_t d = (uintptr_t)vd;
+ uintptr_t o = (d | n) & 7;
+ size_t i;
+
+ if (likely(n == 0)) {
+ return;
+ }
+#ifndef HOST_WORDS_BIGENDIAN
+ o = 0;
+#endif
+ switch (o) {
+ case 0:
+ memset(vd, 0, n);
+ break;
+
+ case 4:
+ for (i = 0; i < n; i += 4) {
+ *(uint32_t *)H1_4(d + i) = 0;
+ }
+ break;
+
+ case 2:
+ case 6:
+ for (i = 0; i < n; i += 2) {
+ *(uint16_t *)H1_2(d + i) = 0;
+ }
+ break;
+
+ default:
+ for (i = 0; i < n; i++) {
+ *(uint8_t *)H1(d + i) = 0;
+ }
+ break;
+ }
+}
+
void HELPER(sve_ext)(void *vd, void *vn, void *vm, uint32_t desc)
{
intptr_t opr_sz = simd_oprsz(desc);
@@ -3927,32 +3966,438 @@ void HELPER(sve_fcmla_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc)
/*
* Load contiguous data, protected by a governing predicate.
*/
-#define DO_LD1(NAME, FN, TYPEE, TYPEM, H) \
-static void do_##NAME(CPUARMState *env, void *vd, void *vg, \
- target_ulong addr, intptr_t oprsz, \
- uintptr_t ra) \
-{ \
- intptr_t i = 0; \
- do { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- TYPEM m = 0; \
- if (pg & 1) { \
- m = FN(env, addr, ra); \
- } \
- *(TYPEE *)(vd + H(i)) = m; \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += sizeof(TYPEM); \
- } while (i & 15); \
- } while (i < oprsz); \
-} \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- do_##NAME(env, &env->vfp.zregs[simd_data(desc)], vg, \
- addr, simd_oprsz(desc), GETPC()); \
+
+/* Load elements into VD, controlled by VG, from HOST+MEM_OFS.
+ * Memory is valid through MEM_MAX. The register element indicies
+ * are inferred from MEM_OFS, as modified by the types for which
+ * the helper is built. Return the MEM_OFS of the first element
+ * not loaded (which is MEM_MAX if they are all loaded).
+ *
+ * For softmmu, we have fully validated the guest page. For user-only,
+ * we cannot fully validate without taking the mmap lock, but since we
+ * know the access is within one host page, if any access is valid they
+ * all must be valid. However, it may be that no access is valid and
+ * they have all been predicated false.
+ */
+typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
+ intptr_t mem_ofs, intptr_t mem_max);
+
+/* Load one element into VD+REG_OFF from (ENV,VADDR,RA).
+ * The controlling predicate is known to be true.
+ */
+typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
+ target_ulong vaddr, int mmu_idx, uintptr_t ra);
+
+/*
+ * Generate the above primitives.
+ */
+
+#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
+static intptr_t sve_##NAME##_host(void *vd, void *vg, void *host, \
+ intptr_t mem_off, const intptr_t mem_max) \
+{ \
+ intptr_t reg_off = mem_off * (sizeof(TYPEE) / sizeof(TYPEM)); \
+ uint64_t *pg = vg; \
+ while (mem_off + sizeof(TYPEM) <= mem_max) { \
+ TYPEM val = 0; \
+ if (likely((pg[reg_off >> 6] >> (reg_off & 63)) & 1)) { \
+ val = HOST(host + mem_off); \
+ } \
+ *(TYPEE *)(vd + H(reg_off)) = val; \
+ mem_off += sizeof(TYPEM), reg_off += sizeof(TYPEE); \
+ } \
+ return mem_off; \
}
+#ifdef CONFIG_SOFTMMU
+#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
+ target_ulong addr, int mmu_idx, uintptr_t ra) \
+{ \
+ TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
+ TYPEM val = TLB(env, addr, oi, ra); \
+ *(TYPEE *)(vd + H(reg_off)) = val; \
+}
+#else
+#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
+ target_ulong addr, int mmu_idx, uintptr_t ra) \
+{ \
+ TYPEM val = HOST(g2h(addr)); \
+ *(TYPEE *)(vd + H(reg_off)) = val; \
+}
+#endif
+
+DO_LD_TLB(ld1bb, H1, uint8_t, uint8_t, ldub_p, 0, helper_ret_ldub_mmu)
+
+#define DO_LD_PRIM_1(NAME, H, TE, TM) \
+ DO_LD_HOST(NAME, H, TE, TM, ldub_p) \
+ DO_LD_TLB(NAME, H, TE, TM, ldub_p, 0, helper_ret_ldub_mmu)
+
+DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
+DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t, int8_t)
+DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
+DO_LD_PRIM_1(ld1bss, H1_4, uint32_t, int8_t)
+DO_LD_PRIM_1(ld1bdu, , uint64_t, uint8_t)
+DO_LD_PRIM_1(ld1bds, , uint64_t, int8_t)
+
+#define DO_LD_PRIM_2(NAME, end, MOEND, H, TE, TM, PH, PT) \
+ DO_LD_HOST(NAME##_##end, H, TE, TM, PH##_##end##_p) \
+ DO_LD_TLB(NAME##_##end, H, TE, TM, PH##_##end##_p, \
+ MOEND, helper_##end##_##PT##_mmu)
+
+DO_LD_PRIM_2(ld1hh, le, MO_LE, H1_2, uint16_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hsu, le, MO_LE, H1_4, uint32_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hss, le, MO_LE, H1_4, uint32_t, int16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hdu, le, MO_LE, , uint64_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hds, le, MO_LE, , uint64_t, int16_t, lduw, lduw)
+
+DO_LD_PRIM_2(ld1ss, le, MO_LE, H1_4, uint32_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sdu, le, MO_LE, , uint64_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sds, le, MO_LE, , uint64_t, int32_t, ldl, ldul)
+
+DO_LD_PRIM_2(ld1dd, le, MO_LE, , uint64_t, uint64_t, ldq, ldq)
+
+DO_LD_PRIM_2(ld1hh, be, MO_BE, H1_2, uint16_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hsu, be, MO_BE, H1_4, uint32_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hss, be, MO_BE, H1_4, uint32_t, int16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hdu, be, MO_BE, , uint64_t, uint16_t, lduw, lduw)
+DO_LD_PRIM_2(ld1hds, be, MO_BE, , uint64_t, int16_t, lduw, lduw)
+
+DO_LD_PRIM_2(ld1ss, be, MO_BE, H1_4, uint32_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sdu, be, MO_BE, , uint64_t, uint32_t, ldl, ldul)
+DO_LD_PRIM_2(ld1sds, be, MO_BE, , uint64_t, int32_t, ldl, ldul)
+
+DO_LD_PRIM_2(ld1dd, be, MO_BE, , uint64_t, uint64_t, ldq, ldq)
+
+#undef DO_LD_TLB
+#undef DO_LD_HOST
+#undef DO_LD_PRIM_1
+#undef DO_LD_PRIM_2
+
+/*
+ * Special case contiguous loads of bytes to accellerate strings.
+ *
+ * The assumption is that the governing predicate will be mostly true.
+ * When it is not all true, it has been set by whilelo and so has a
+ * block of true elements followed by a block of false elements.
+ * Thus anything we can do to handle as many bytes as possible in one
+ * step will pay dividends.
+ *
+ * Because of how vector registers are represented in CPUARMState,
+ * each block of 8 can be read with a little-endian load to be stored
+ * into the vector register in host-endian order.
+ *
+ * TODO: For LE host and LE guest (by far the most common combination),
+ * the only difference for other non-extending loads is the controlling
+ * predicate. Even for other combinations, it might be fastest to use
+ * this primitive to block load all of the data and then reorder the
+ * bytes afterward.
+ */
+
+/* For user-only, conditionally load and mask from HOST, returning 0
+ * if the predicate is false. This is required because, as described
+ * above, we have not fully validated the page, and faults are not
+ * permitted when the predicate is false.
+ * For softmmu, we never arrive here with invalid host memory; just mask.
+ */
+static inline uint64_t ldq_le_pred_b(uint8_t pg, void *host)
+{
+#ifdef CONFIG_USER_ONLY
+ if (pg == 0) {
+ return 0;
+ }
+#endif
+ return ldq_le_p(host) & expand_pred_b(pg);
+}
+
+static inline uint8_t ldub_pred(uint8_t pg, void *host)
+{
+#ifdef CONFIG_USER_ONLY
+ return pg & 1 ? ldub_p(host) : 0;
+#else
+ return ldub_p(host) & -(pg & 1);
+#endif
+}
+
+static intptr_t sve_ld1bb_host(void *vd, void *vg, void *host,
+ intptr_t off, const intptr_t max)
+{
+ uint64_t *d = vd;
+ uint8_t *g = vg;
+
+ /* Assuming OFF and MAX may be misaligned, but also the most common
+ * case is an entire vector register: OFF == 0, MAX % 16 == 0.
+ */
+ if (likely(off + 8 <= max)) {
+ const intptr_t max_div_8 = max >> 3;
+ intptr_t off_div_8 = off >> 3;
+ uint64_t data;
+
+ if (unlikely(off & 63)) {
+ /* Align for a loop-of-8. We know from the range check
+ * above that we have enough remaining to load 8 bytes.
+ */
+ if (unlikely(off & 7)) {
+ int off_7 = off & 7;
+ uint8_t pg = g[H1(off_div_8)] >> off_7;
+
+ off_7 *= 8;
+ data = ldq_le_pred_b(pg, host + off);
+ data = deposit64(d[off_div_8], off_7, 64 - off_7, data);
+ d[off_div_8] = data;
+
+ off_div_8 += 1;
+ }
+
+ /* If there are not sufficient bytes to align for 64
+ * and also execute that loop at least once, skip to tail.
+ */
+ if (ROUND_UP(off_div_8, 8) + 8 > max_div_8) {
+ goto skip_64;
+ }
+
+ /* Align for the loop-of-64. */
+ if (unlikely(off_div_8 & 7)) {
+ do {
+ uint8_t pg = g[off_div_8];
+ data = ldq_le_pred_b(pg, host + off_div_8 * 8);
+ d[off_div_8] = data;
+ } while (++off_div_8 & 7);
+ }
+ }
+
+ /* While we have blocks of 64 remaining, we can perform tests
+ * against large blocks of predicates at once.
+ */
+ for (; off_div_8 + 8 <= max_div_8; off_div_8 += 8) {
+ uint64_t pg = *(uint64_t *)(g + off_div_8);
+ if (likely(pg == -1ULL)) {
+#ifndef HOST_WORDS_BIGENDIAN
+ memcpy(d + off_div_8, host + off_div_8 * 8, 64);
+#else
+ intptr_t j;
+ for (j = 0; j < 8; j++) {
+ data = ldq_le_p(host + (off_div_8 + j) * 8);
+ d[off_div_8 + j] = data;
+ }
+#endif
+ } else if (pg == 0) {
+ memset(d + off_div_8, 0, 64);
+ } else {
+ intptr_t j;
+ for (j = 0; j < 8; j++) {
+ data = ldq_le_pred_b(pg >> (j * 8),
+ host + (off_div_8 + j) * 8);
+ d[off_div_8 + j] = data;
+ }
+ }
+ }
+
+ skip_64:
+ /* Final tail or a copy smaller than 64 bytes. */
+ for (; off_div_8 < max_div_8; off_div_8++) {
+ uint8_t pg = g[H1(off_div_8)];
+ data = ldq_le_pred_b(pg, host + off_div_8 * 8);
+ d[off_div_8] = data;
+ }
+
+ /* Restore using OFF. */
+ off = off_div_8 * 8;
+ }
+
+ /* Final tail or a really small copy. */
+ if (unlikely(off < max)) {
+ do {
+ uint8_t pg = g[H1(off >> 3)] >> (off & 7);
+ ((uint8_t *)vd)[H1(off)] = ldub_pred(pg, host + off);
+ } while (++off < max);
+ }
+
+ return max;
+}
+
+/* Skip through a sequence of inactive elements in the guarding predicate VG,
+ * beginning at REG_OFF bounded by REG_MAX. Return the offset of the active
+ * element >= REG_OFF, or REG_MAX if there were no active elements at all.
+ */
+static intptr_t find_next_active(uint64_t *vg, intptr_t reg_off,
+ intptr_t reg_max, int esz)
+{
+ uint64_t pg_mask = pred_esz_masks[esz];
+ uint64_t pg = (vg[reg_off >> 6] & pg_mask) >> (reg_off & 63);
+
+ /* In normal usage, the first element is active. */
+ if (likely(pg & 1)) {
+ return reg_off;
+ }
+
+ if (pg == 0) {
+ reg_off &= -64;
+ do {
+ reg_off += 64;
+ if (unlikely(reg_off >= reg_max)) {
+ /* The entire predicate was false. */
+ return reg_max;
+ }
+ pg = vg[reg_off >> 6] & pg_mask;
+ } while (pg == 0);
+ }
+ reg_off += ctz64(pg);
+
+ /* We should never see an out of range predicate bit set. */
+ tcg_debug_assert(reg_off < reg_max);
+ return reg_off;
+}
+
+/* Return the maximum offset <= MEM_MAX which is still within the page
+ * referenced by BASE+MEM_OFF.
+ */
+static intptr_t max_for_page(target_ulong base, intptr_t mem_off,
+ intptr_t mem_max)
+{
+ target_ulong addr = base + mem_off;
+ intptr_t split = -(intptr_t)(addr | TARGET_PAGE_MASK);
+ return MIN(split, mem_max - mem_off) + mem_off;
+}
+
+static inline void set_helper_retaddr(uintptr_t ra)
+{
+#ifdef CONFIG_USER_ONLY
+ helper_retaddr = ra;
+#endif
+}
+
+static inline bool test_host_page(void *host)
+{
+#ifdef CONFIG_USER_ONLY
+ return true;
+#else
+ return likely(host != NULL);
+#endif
+}
+
+/*
+ * Common helper for all contiguous one-register predicated loads.
+ */
+static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
+ uint32_t desc, const uintptr_t retaddr,
+ const int esz, const int msz,
+ sve_ld1_host_fn *host_fn,
+ sve_ld1_tlb_fn *tlb_fn)
+{
+ void *vd = &env->vfp.zregs[simd_data(desc)];
+ const int diffsz = esz - msz;
+ const intptr_t reg_max = simd_oprsz(desc);
+ const intptr_t mem_max = reg_max >> diffsz;
+ const int mmu_idx = cpu_mmu_index(env, false);
+ ARMVectorReg scratch;
+ void *host, *result;
+ intptr_t split;
+
+ set_helper_retaddr(retaddr);
+
+ host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
+ if (test_host_page(host)) {
+ split = max_for_page(addr, 0, mem_max);
+ if (likely(split == mem_max)) {
+ /* The load is entirely within a valid page. For softmmu,
+ * no faults. For user-only, if the first byte does not
+ * fault then none of them will fault, so Vd will never be
+ * partially modified.
+ */
+ host_fn(vd, vg, host, 0, mem_max);
+ set_helper_retaddr(0);
+ return;
+ }
+ }
+
+ /* Perform the predicated read into a temporary, thus ensuring
+ * if the load of the last element faults, Vd is not modified.
+ */
+ result = &scratch;
+#ifdef CONFIG_USER_ONLY
+ host_fn(vd, vg, host, 0, mem_max);
+#else
+ memset(result, 0, reg_max);
+ for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
+ reg_off < reg_max;
+ reg_off = find_next_active(vg, reg_off, reg_max, esz)) {
+ intptr_t mem_off = reg_off >> diffsz;
+
+ split = max_for_page(addr, mem_off, mem_max);
+ if (msz == 0 || split - mem_off >= (1 << msz)) {
+ /* At least one whole element on this page. */
+ host = tlb_vaddr_to_host(env, addr + mem_off,
+ MMU_DATA_LOAD, mmu_idx);
+ if (host) {
+ mem_off = host_fn(result, vg, host - mem_off, mem_off, split);
+ reg_off = mem_off << diffsz;
+ continue;
+ }
+ }
+
+ /* Perform one normal read. This may fault, longjmping out to the
+ * main loop in order to raise an exception. It may succeed, and
+ * as a side-effect load the TLB entry for the next round. Finally,
+ * in the extremely unlikely case we're performing this operation
+ * on I/O memory, it may succeed but not bring in the TLB entry.
+ * But even then we have still made forward progress.
+ */
+ tlb_fn(env, result, reg_off, addr + mem_off, mmu_idx, retaddr);
+ reg_off += 1 << esz;
+ }
+#endif
+
+ set_helper_retaddr(0);
+ memcpy(vd, result, reg_max);
+}
+
+#define DO_LD1_1(NAME, ESZ) \
+void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, 0, \
+ sve_##NAME##_host, sve_##NAME##_tlb); \
+}
+
+/* TODO: Propagate the endian check back to the translator. */
+#define DO_LD1_2(NAME, ESZ, MSZ) \
+void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ if (arm_cpu_data_is_big_endian(env)) { \
+ sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_##NAME##_be_host, sve_##NAME##_be_tlb); \
+ } else { \
+ sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_##NAME##_le_host, sve_##NAME##_le_tlb); \
+ } \
+}
+
+DO_LD1_1(ld1bb, 0)
+DO_LD1_1(ld1bhu, 1)
+DO_LD1_1(ld1bhs, 1)
+DO_LD1_1(ld1bsu, 2)
+DO_LD1_1(ld1bss, 2)
+DO_LD1_1(ld1bdu, 3)
+DO_LD1_1(ld1bds, 3)
+
+DO_LD1_2(ld1hh, 1, 1)
+DO_LD1_2(ld1hsu, 2, 1)
+DO_LD1_2(ld1hss, 2, 1)
+DO_LD1_2(ld1hdu, 3, 1)
+DO_LD1_2(ld1hds, 3, 1)
+
+DO_LD1_2(ld1ss, 2, 2)
+DO_LD1_2(ld1sdu, 3, 2)
+DO_LD1_2(ld1sds, 3, 2)
+
+DO_LD1_2(ld1dd, 3, 3)
+
+#undef DO_LD1_1
+#undef DO_LD1_2
+
#define DO_LD2(NAME, FN, TYPEE, TYPEM, H) \
void HELPER(NAME)(CPUARMState *env, void *vg, \
target_ulong addr, uint32_t desc) \
@@ -4037,52 +4482,40 @@ void HELPER(NAME)(CPUARMState *env, void *vg, \
} \
}
-DO_LD1(sve_ld1bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
-DO_LD1(sve_ld1bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
-DO_LD1(sve_ld1bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
-DO_LD1(sve_ld1bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
-DO_LD1(sve_ld1bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
-DO_LD1(sve_ld1bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
-
-DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
-DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int16_t, H1_4)
-DO_LD1(sve_ld1hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
-DO_LD1(sve_ld1hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
-
-DO_LD1(sve_ld1sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
-DO_LD1(sve_ld1sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
-
-DO_LD1(sve_ld1bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
-DO_LD1(sve_ld1hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
-DO_LD1(sve_ld1ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
-DO_LD1(sve_ld1dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
-#undef DO_LD1
#undef DO_LD2
#undef DO_LD3
#undef DO_LD4
/*
* Load contiguous data, first-fault and no-fault.
+ *
+ * For user-only, one could argue that we should hold the mmap_lock during
+ * the operation so that there is no race between page_check_range and the
+ * load operation. However, unmapping pages out from under operating thread
+ * is extrodinarily unlikely. This theoretical race condition also affects
+ * linux-user/ in its get_user/put_user macros.
+ *
+ * TODO: Construct some helpers, written in assembly, that interact with
+ * handle_cpu_signal to produce memory ops which can properly report errors
+ * without racing.
*/
-#ifdef CONFIG_USER_ONLY
-
/* Fault on byte I. All bits in FFR from I are cleared. The vector
* result from I is CONSTRAINED UNPREDICTABLE; we choose the MERGE
* option, which leaves subsequent data unchanged.
@@ -4092,147 +4525,225 @@ static void record_fault(CPUARMState *env, uintptr_t i, uintptr_t oprsz)
uint64_t *ffr = env->vfp.pregs[FFR_PRED_NUM].p;
if (i & 63) {
- ffr[i / 64] &= MAKE_64BIT_MASK(0, i & 63);
+ ffr[i >> 6] &= MAKE_64BIT_MASK(0, i & 63);
i = ROUND_UP(i, 64);
}
for (; i < oprsz; i += 64) {
- ffr[i / 64] = 0;
+ ffr[i >> 6] = 0;
}
}
-/* Hold the mmap lock during the operation so that there is no race
- * between page_check_range and the load operation. We expect the
- * usual case to have no faults at all, so we check the whole range
- * first and if successful defer to the normal load operation.
- *
- * TODO: Change mmap_lock to a rwlock so that multiple readers
- * can run simultaneously. This will probably help other uses
- * within QEMU as well.
+/*
+ * Common helper for all contiguous first-fault loads.
*/
-#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) \
-static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg, \
- target_ulong addr, intptr_t oprsz, \
- bool first, uintptr_t ra) \
-{ \
- intptr_t i = 0; \
- do { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- TYPEM m = 0; \
- if (pg & 1) { \
- if (!first && \
- unlikely(page_check_range(addr, sizeof(TYPEM), \
- PAGE_READ))) { \
- record_fault(env, i, oprsz); \
- return; \
- } \
- m = FN(env, addr, ra); \
- first = false; \
- } \
- *(TYPEE *)(vd + H(i)) = m; \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += sizeof(TYPEM); \
- } while (i & 15); \
- } while (i < oprsz); \
-} \
-void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t oprsz = simd_oprsz(desc); \
- unsigned rd = simd_data(desc); \
- void *vd = &env->vfp.zregs[rd]; \
- mmap_lock(); \
- if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) { \
- do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC()); \
- } else { \
- do_sve_ldff1##PART(env, vd, vg, addr, oprsz, true, GETPC()); \
- } \
- mmap_unlock(); \
-}
+static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
+ uint32_t desc, const uintptr_t retaddr,
+ const int esz, const int msz,
+ sve_ld1_host_fn *host_fn,
+ sve_ld1_tlb_fn *tlb_fn)
+{
+ void *vd = &env->vfp.zregs[simd_data(desc)];
+ const int diffsz = esz - msz;
+ const intptr_t reg_max = simd_oprsz(desc);
+ const intptr_t mem_max = reg_max >> diffsz;
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t split, reg_off, mem_off;
+ void *host;
-/* No-fault loads are like first-fault loads without the
- * first faulting special case.
- */
-#define DO_LDNF1(PART) \
-void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t oprsz = simd_oprsz(desc); \
- unsigned rd = simd_data(desc); \
- void *vd = &env->vfp.zregs[rd]; \
- mmap_lock(); \
- if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) { \
- do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC()); \
- } else { \
- do_sve_ldff1##PART(env, vd, vg, addr, oprsz, false, GETPC()); \
- } \
- mmap_unlock(); \
-}
+ set_helper_retaddr(retaddr);
+ split = max_for_page(addr, 0, mem_max);
+ if (likely(split == mem_max)) {
+ /* The entire operation is within one page. */
+ host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
+ if (test_host_page(host)) {
+ mem_off = host_fn(vd, vg, host, 0, mem_max);
+ tcg_debug_assert(mem_off == mem_max);
+ set_helper_retaddr(0);
+ return;
+ }
+ }
+
+ /* Skip to the first true predicate. */
+ reg_off = find_next_active(vg, 0, reg_max, esz);
+ if (unlikely(reg_off == reg_max)) {
+ /* The entire predicate was false; no load occurs. */
+ set_helper_retaddr(0);
+ memset(vd, 0, reg_max);
+ return;
+ }
+ mem_off = reg_off >> diffsz;
+
+#ifdef CONFIG_USER_ONLY
+ /* The page(s) containing this first element at ADDR+MEM_OFF must
+ * be valid. Considering that this first element may be misaligned
+ * and cross a page boundary itself, take the rest of the page from
+ * the last byte of the element.
+ */
+ split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
+ mem_off = host_fn(vd, vg, g2h(addr), mem_off, split);
+
+ /* After any fault, zero any leading predicated false elts. */
+ swap_memzero(vd, reg_off);
+ reg_off = mem_off << diffsz;
#else
+ /* Perform one normal read, which will fault or not.
+ * But it is likely to bring the page into the tlb.
+ */
+ tlb_fn(env, vd, reg_off, addr + mem_off, mmu_idx, retaddr);
-/* TODO: System mode is not yet supported.
- * This would probably use tlb_vaddr_to_host.
- */
-#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) \
-void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- g_assert_not_reached(); \
-}
-
-#define DO_LDNF1(PART) \
-void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- g_assert_not_reached(); \
-}
+ /* After any fault, zero any leading predicated false elts. */
+ swap_memzero(vd, reg_off);
+ mem_off += 1 << msz;
+ reg_off += 1 << esz;
+ /* Try again to read the balance of the page. */
+ split = max_for_page(addr, mem_off - 1, mem_max);
+ if (split >= (1 << msz)) {
+ host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
+ if (host) {
+ mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
+ reg_off = mem_off << diffsz;
+ }
+ }
#endif
-DO_LDFF1(bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
-DO_LDFF1(bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
-DO_LDFF1(bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
-DO_LDFF1(bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
-DO_LDFF1(bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
-DO_LDFF1(bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
-DO_LDFF1(bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
+ set_helper_retaddr(0);
+ record_fault(env, reg_off, reg_max);
+}
-DO_LDFF1(hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
-DO_LDFF1(hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
-DO_LDFF1(hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4)
-DO_LDFF1(hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
-DO_LDFF1(hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
+/*
+ * Common helper for all contiguous no-fault loads.
+ */
+static void sve_ldnf1_r(CPUARMState *env, void *vg, const target_ulong addr,
+ uint32_t desc, const int esz, const int msz,
+ sve_ld1_host_fn *host_fn)
+{
+ void *vd = &env->vfp.zregs[simd_data(desc)];
+ const int diffsz = esz - msz;
+ const intptr_t reg_max = simd_oprsz(desc);
+ const intptr_t mem_max = reg_max >> diffsz;
+ intptr_t split, reg_off, mem_off;
+ void *host;
-DO_LDFF1(ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
-DO_LDFF1(sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
-DO_LDFF1(sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
+#ifdef CONFIG_USER_ONLY
+ /* Do not set helper_retaddr as there should be no fault. */
+ host = g2h(addr);
+ if (likely(page_check_range(addr, mem_max, PAGE_READ) == 0)) {
+ /* The entire operation is valid. */
+ host_fn(vd, vg, host, 0, mem_max);
+ return;
+ }
+#else
+ const int mmu_idx = extract32(desc, SIMD_DATA_SHIFT, 4);
+ /* Unless we can load the entire vector from the same page,
+ * we need to search for the first active element.
+ */
+ split = max_for_page(addr, 0, mem_max);
+ if (likely(split == mem_max)) {
+ host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
+ if (host) {
+ host_fn(vd, vg, host, 0, mem_max);
+ return;
+ }
+ }
+#endif
-DO_LDFF1(dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
+ /* There will be no fault, so we may modify in advance. */
+ memset(vd, 0, reg_max);
-#undef DO_LDFF1
+ /* Skip to the first true predicate. */
+ reg_off = find_next_active(vg, 0, reg_max, esz);
+ if (unlikely(reg_off == reg_max)) {
+ /* The entire predicate was false; no load occurs. */
+ return;
+ }
+ mem_off = reg_off >> diffsz;
-DO_LDNF1(bb_r)
-DO_LDNF1(bhu_r)
-DO_LDNF1(bhs_r)
-DO_LDNF1(bsu_r)
-DO_LDNF1(bss_r)
-DO_LDNF1(bdu_r)
-DO_LDNF1(bds_r)
+#ifdef CONFIG_USER_ONLY
+ if (page_check_range(addr + mem_off, 1 << msz, PAGE_READ) == 0) {
+ /* At least one load is valid; take the rest of the page. */
+ split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
+ mem_off = host_fn(vd, vg, host, mem_off, split);
+ reg_off = mem_off << diffsz;
+ }
+#else
+ /* If the address is not in the TLB, we have no way to bring the
+ * entry into the TLB without also risking a fault. Note that
+ * the corollary is that we never load from an address not in RAM.
+ * ??? This last may be out of spec.
+ */
+ host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
+ split = max_for_page(addr, mem_off, mem_max);
+ if (host && split >= (1 << msz)) {
+ mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
+ reg_off = mem_off << diffsz;
+ }
+#endif
-DO_LDNF1(hh_r)
-DO_LDNF1(hsu_r)
-DO_LDNF1(hss_r)
-DO_LDNF1(hdu_r)
-DO_LDNF1(hds_r)
+ record_fault(env, reg_off, reg_max);
+}
-DO_LDNF1(ss_r)
-DO_LDNF1(sdu_r)
-DO_LDNF1(sds_r)
+#define DO_LDFF1_LDNF1_1(PART, ESZ) \
+void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, 0, \
+ sve_ld1##PART##_host, sve_ld1##PART##_tlb); \
+} \
+void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host); \
+}
-DO_LDNF1(dd_r)
+/* TODO: Propagate the endian check back to the translator. */
+#define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
+void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ if (arm_cpu_data_is_big_endian(env)) { \
+ sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb); \
+ } else { \
+ sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb); \
+ } \
+} \
+void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ if (arm_cpu_data_is_big_endian(env)) { \
+ sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
+ sve_ld1##PART##_be_host); \
+ } else { \
+ sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
+ sve_ld1##PART##_le_host); \
+ } \
+}
-#undef DO_LDNF1
+DO_LDFF1_LDNF1_1(bb, 0)
+DO_LDFF1_LDNF1_1(bhu, 1)
+DO_LDFF1_LDNF1_1(bhs, 1)
+DO_LDFF1_LDNF1_1(bsu, 2)
+DO_LDFF1_LDNF1_1(bss, 2)
+DO_LDFF1_LDNF1_1(bdu, 3)
+DO_LDFF1_LDNF1_1(bds, 3)
+
+DO_LDFF1_LDNF1_2(hh, 1, 1)
+DO_LDFF1_LDNF1_2(hsu, 2, 1)
+DO_LDFF1_LDNF1_2(hss, 2, 1)
+DO_LDFF1_LDNF1_2(hdu, 3, 1)
+DO_LDFF1_LDNF1_2(hds, 3, 1)
+
+DO_LDFF1_LDNF1_2(ss, 2, 2)
+DO_LDFF1_LDNF1_2(sdu, 3, 2)
+DO_LDFF1_LDNF1_2(sds, 3, 2)
+
+DO_LDFF1_LDNF1_2(dd, 3, 3)
+
+#undef DO_LDFF1_LDNF1_1
+#undef DO_LDFF1_LDNF1_2
/*
* Store contiguous data, protected by a governing predicate.
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (11 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
@ 2018-08-09 4:21 ` Richard Henderson
2018-08-23 16:04 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
` (9 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:21 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
Use the same *_tlb primitives as we use for ld1. This is not
a significant change, but does (for linux-user) hoist the set
of helper_retaddr, and (for softmmu) hoist the computation of
the current mmu_idx outside the loop.
This does fix the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/sve_helper.c | 210 ++++++++++++++++++++++------------------
1 file changed, 117 insertions(+), 93 deletions(-)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4ca9412e20..5cc7de5077 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4398,109 +4398,133 @@ DO_LD1_2(ld1dd, 3, 3)
#undef DO_LD1_1
#undef DO_LD1_2
-#define DO_LD2(NAME, FN, TYPEE, TYPEM, H) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- void *d1 = &env->vfp.zregs[rd]; \
- void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- TYPEM m1 = 0, m2 = 0; \
- if (pg & 1) { \
- m1 = FN(env, addr, ra); \
- m2 = FN(env, addr + sizeof(TYPEM), ra); \
- } \
- *(TYPEE *)(d1 + H(i)) = m1; \
- *(TYPEE *)(d2 + H(i)) = m2; \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += 2 * sizeof(TYPEM); \
- } while (i & 15); \
- } \
+/*
+ * Common helpers for all contiguous 2,3,4-register predicated loads.
+ */
+static void sve_ld2_r(CPUARMState *env, void *vg, target_ulong addr,
+ uint32_t desc, int size, uintptr_t ra,
+ sve_ld1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
+ unsigned rd = simd_data(desc);
+ ARMVectorReg scratch[2] = { };
+
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
+ tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+ }
+ i += size, pg >>= size;
+ addr += 2 * size;
+ } while (i & 15);
+ }
+ set_helper_retaddr(0);
+
+ /* Wait until all exceptions have been raised to write back. */
+ memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz);
+ memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz);
}
-#define DO_LD3(NAME, FN, TYPEE, TYPEM, H) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- void *d1 = &env->vfp.zregs[rd]; \
- void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \
- void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- TYPEM m1 = 0, m2 = 0, m3 = 0; \
- if (pg & 1) { \
- m1 = FN(env, addr, ra); \
- m2 = FN(env, addr + sizeof(TYPEM), ra); \
- m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \
- } \
- *(TYPEE *)(d1 + H(i)) = m1; \
- *(TYPEE *)(d2 + H(i)) = m2; \
- *(TYPEE *)(d3 + H(i)) = m3; \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += 3 * sizeof(TYPEM); \
- } while (i & 15); \
- } \
+static void sve_ld3_r(CPUARMState *env, void *vg, target_ulong addr,
+ uint32_t desc, int size, uintptr_t ra,
+ sve_ld1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
+ unsigned rd = simd_data(desc);
+ ARMVectorReg scratch[3] = { };
+
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
+ tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+ tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
+ }
+ i += size, pg >>= size;
+ addr += 3 * size;
+ } while (i & 15);
+ }
+ set_helper_retaddr(0);
+
+ /* Wait until all exceptions have been raised to write back. */
+ memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz);
+ memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz);
+ memcpy(&env->vfp.zregs[(rd + 2) & 31], &scratch[2], oprsz);
}
-#define DO_LD4(NAME, FN, TYPEE, TYPEM, H) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- void *d1 = &env->vfp.zregs[rd]; \
- void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \
- void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \
- void *d4 = &env->vfp.zregs[(rd + 3) & 31]; \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- TYPEM m1 = 0, m2 = 0, m3 = 0, m4 = 0; \
- if (pg & 1) { \
- m1 = FN(env, addr, ra); \
- m2 = FN(env, addr + sizeof(TYPEM), ra); \
- m3 = FN(env, addr + 2 * sizeof(TYPEM), ra); \
- m4 = FN(env, addr + 3 * sizeof(TYPEM), ra); \
- } \
- *(TYPEE *)(d1 + H(i)) = m1; \
- *(TYPEE *)(d2 + H(i)) = m2; \
- *(TYPEE *)(d3 + H(i)) = m3; \
- *(TYPEE *)(d4 + H(i)) = m4; \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += 4 * sizeof(TYPEM); \
- } while (i & 15); \
- } \
+static void sve_ld4_r(CPUARMState *env, void *vg, target_ulong addr,
+ uint32_t desc, int size, uintptr_t ra,
+ sve_ld1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
+ unsigned rd = simd_data(desc);
+ ARMVectorReg scratch[4] = { };
+
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
+ tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+ tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
+ tlb_fn(env, &scratch[3], i, addr + 3 * size, mmu_idx, ra);
+ }
+ i += size, pg >>= size;
+ addr += 4 * size;
+ } while (i & 15);
+ }
+ set_helper_retaddr(0);
+
+ /* Wait until all exceptions have been raised to write back. */
+ memcpy(&env->vfp.zregs[rd], &scratch[0], oprsz);
+ memcpy(&env->vfp.zregs[(rd + 1) & 31], &scratch[1], oprsz);
+ memcpy(&env->vfp.zregs[(rd + 2) & 31], &scratch[2], oprsz);
+ memcpy(&env->vfp.zregs[(rd + 3) & 31], &scratch[3], oprsz);
}
-DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
-DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
-DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
+#define DO_LDN_1(N) \
+void __attribute__((flatten)) HELPER(sve_ld##N##bb_r) \
+ (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
+{ \
+ sve_ld##N##_r(env, vg, addr, desc, 1, GETPC(), sve_ld1bb_tlb); \
+}
-DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
-DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
-DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
+#define DO_LDN_2(N, SUFF, SIZE) \
+void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_r) \
+ (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
+{ \
+ sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(), \
+ arm_cpu_data_is_big_endian(env) \
+ ? sve_ld1##SUFF##_be_tlb : sve_ld1##SUFF##_le_tlb); \
+}
-DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
-DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
-DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
+DO_LDN_1(2)
+DO_LDN_1(3)
+DO_LDN_1(4)
-DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
-DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
-DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
+DO_LDN_2(2, hh, 2)
+DO_LDN_2(3, hh, 2)
+DO_LDN_2(4, hh, 2)
-#undef DO_LD2
-#undef DO_LD3
-#undef DO_LD4
+DO_LDN_2(2, ss, 4)
+DO_LDN_2(3, ss, 4)
+DO_LDN_2(4, ss, 4)
+
+DO_LDN_2(2, dd, 8)
+DO_LDN_2(3, dd, 8)
+DO_LDN_2(4, dd, 8)
+
+#undef DO_LDN_1
+#undef DO_LDN_2
/*
* Load contiguous data, first-fault and no-fault.
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (12 preceding siblings ...)
2018-08-09 4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
@ 2018-08-09 4:22 ` Richard Henderson
2018-08-23 16:06 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
` (8 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:22 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
This fixes the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/sve_helper.c | 351 ++++++++++++++++++++--------------------
1 file changed, 172 insertions(+), 179 deletions(-)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 5cc7de5077..4eae6569cc 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -3987,6 +3987,7 @@ typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
*/
typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
target_ulong vaddr, int mmu_idx, uintptr_t ra);
+typedef sve_ld1_tlb_fn sve_st1_tlb_fn;
/*
* Generate the above primitives.
@@ -4772,214 +4773,206 @@ DO_LDFF1_LDNF1_2(dd, 3, 3)
/*
* Store contiguous data, protected by a governing predicate.
*/
-#define DO_ST1(NAME, FN, TYPEE, TYPEM, H) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- void *vd = &env->vfp.zregs[rd]; \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- if (pg & 1) { \
- TYPEM m = *(TYPEE *)(vd + H(i)); \
- FN(env, addr, m, ra); \
- } \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += sizeof(TYPEM); \
- } while (i & 15); \
- } \
+
+#ifdef CONFIG_SOFTMMU
+#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
+ target_ulong addr, int mmu_idx, uintptr_t ra) \
+{ \
+ TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
+ TLB(env, addr, *(TYPEM *)(vd + H(reg_off)), oi, ra); \
}
-
-#define DO_ST1_D(NAME, FN, TYPEM) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc) / 8; \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- uint64_t *d = &env->vfp.zregs[rd].d[0]; \
- uint8_t *pg = vg; \
- for (i = 0; i < oprsz; i += 1) { \
- if (pg[H1(i)] & 1) { \
- FN(env, addr, d[i], ra); \
- } \
- addr += sizeof(TYPEM); \
- } \
+#else
+#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
+static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
+ target_ulong addr, int mmu_idx, uintptr_t ra) \
+{ \
+ HOST(g2h(addr), *(TYPEM *)(vd + H(reg_off))); \
}
+#endif
-#define DO_ST2(NAME, FN, TYPEE, TYPEM, H) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- void *d1 = &env->vfp.zregs[rd]; \
- void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- if (pg & 1) { \
- TYPEM m1 = *(TYPEE *)(d1 + H(i)); \
- TYPEM m2 = *(TYPEE *)(d2 + H(i)); \
- FN(env, addr, m1, ra); \
- FN(env, addr + sizeof(TYPEM), m2, ra); \
- } \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += 2 * sizeof(TYPEM); \
- } while (i & 15); \
- } \
-}
+DO_ST_TLB(st1bb, H1, uint8_t, stb_p, 0, helper_ret_stb_mmu)
+DO_ST_TLB(st1bh, H1_2, uint16_t, stb_p, 0, helper_ret_stb_mmu)
+DO_ST_TLB(st1bs, H1_4, uint32_t, stb_p, 0, helper_ret_stb_mmu)
+DO_ST_TLB(st1bd, , uint64_t, stb_p, 0, helper_ret_stb_mmu)
-#define DO_ST3(NAME, FN, TYPEE, TYPEM, H) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- void *d1 = &env->vfp.zregs[rd]; \
- void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \
- void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- if (pg & 1) { \
- TYPEM m1 = *(TYPEE *)(d1 + H(i)); \
- TYPEM m2 = *(TYPEE *)(d2 + H(i)); \
- TYPEM m3 = *(TYPEE *)(d3 + H(i)); \
- FN(env, addr, m1, ra); \
- FN(env, addr + sizeof(TYPEM), m2, ra); \
- FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \
- } \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += 3 * sizeof(TYPEM); \
- } while (i & 15); \
- } \
-}
+DO_ST_TLB(st1hh_le, H1_2, uint16_t, stw_le_p, MO_LE, helper_le_stw_mmu)
+DO_ST_TLB(st1hs_le, H1_4, uint32_t, stw_le_p, MO_LE, helper_le_stw_mmu)
+DO_ST_TLB(st1hd_le, , uint64_t, stw_le_p, MO_LE, helper_le_stw_mmu)
-#define DO_ST4(NAME, FN, TYPEE, TYPEM, H) \
-void HELPER(NAME)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- intptr_t ra = GETPC(); \
- unsigned rd = simd_data(desc); \
- void *d1 = &env->vfp.zregs[rd]; \
- void *d2 = &env->vfp.zregs[(rd + 1) & 31]; \
- void *d3 = &env->vfp.zregs[(rd + 2) & 31]; \
- void *d4 = &env->vfp.zregs[(rd + 3) & 31]; \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- if (pg & 1) { \
- TYPEM m1 = *(TYPEE *)(d1 + H(i)); \
- TYPEM m2 = *(TYPEE *)(d2 + H(i)); \
- TYPEM m3 = *(TYPEE *)(d3 + H(i)); \
- TYPEM m4 = *(TYPEE *)(d4 + H(i)); \
- FN(env, addr, m1, ra); \
- FN(env, addr + sizeof(TYPEM), m2, ra); \
- FN(env, addr + 2 * sizeof(TYPEM), m3, ra); \
- FN(env, addr + 3 * sizeof(TYPEM), m4, ra); \
- } \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- addr += 4 * sizeof(TYPEM); \
- } while (i & 15); \
- } \
-}
+DO_ST_TLB(st1ss_le, H1_4, uint32_t, stl_le_p, MO_LE, helper_le_stl_mmu)
+DO_ST_TLB(st1sd_le, , uint64_t, stl_le_p, MO_LE, helper_le_stl_mmu)
-DO_ST1(sve_st1bh_r, cpu_stb_data_ra, uint16_t, uint8_t, H1_2)
-DO_ST1(sve_st1bs_r, cpu_stb_data_ra, uint32_t, uint8_t, H1_4)
-DO_ST1_D(sve_st1bd_r, cpu_stb_data_ra, uint8_t)
+DO_ST_TLB(st1dd_le, , uint64_t, stq_le_p, MO_LE, helper_le_stq_mmu)
-DO_ST1(sve_st1hs_r, cpu_stw_data_ra, uint32_t, uint16_t, H1_4)
-DO_ST1_D(sve_st1hd_r, cpu_stw_data_ra, uint16_t)
+DO_ST_TLB(st1hh_be, H1_2, uint16_t, stw_be_p, MO_BE, helper_be_stw_mmu)
+DO_ST_TLB(st1hs_be, H1_4, uint32_t, stw_be_p, MO_BE, helper_be_stw_mmu)
+DO_ST_TLB(st1hd_be, , uint64_t, stw_be_p, MO_BE, helper_be_stw_mmu)
-DO_ST1_D(sve_st1sd_r, cpu_stl_data_ra, uint32_t)
+DO_ST_TLB(st1ss_be, H1_4, uint32_t, stl_be_p, MO_BE, helper_be_stl_mmu)
+DO_ST_TLB(st1sd_be, , uint64_t, stl_be_p, MO_BE, helper_be_stl_mmu)
-DO_ST1(sve_st1bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
-DO_ST2(sve_st2bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
-DO_ST3(sve_st3bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
-DO_ST4(sve_st4bb_r, cpu_stb_data_ra, uint8_t, uint8_t, H1)
+DO_ST_TLB(st1dd_be, , uint64_t, stq_be_p, MO_BE, helper_be_stq_mmu)
-DO_ST1(sve_st1hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
-DO_ST2(sve_st2hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
-DO_ST3(sve_st3hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
-DO_ST4(sve_st4hh_r, cpu_stw_data_ra, uint16_t, uint16_t, H1_2)
+#undef DO_ST_TLB
-DO_ST1(sve_st1ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-DO_ST2(sve_st2ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-DO_ST3(sve_st3ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-DO_ST4(sve_st4ss_r, cpu_stl_data_ra, uint32_t, uint32_t, H1_4)
-
-DO_ST1_D(sve_st1dd_r, cpu_stq_data_ra, uint64_t)
-
-void HELPER(sve_st2dd_r)(CPUARMState *env, void *vg,
- target_ulong addr, uint32_t desc)
+/*
+ * Common helpers for all contiguous 1,2,3,4-register predicated stores.
+ */
+static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr,
+ uint32_t desc, const uintptr_t ra,
+ const int esize, const int msize,
+ sve_st1_tlb_fn *tlb_fn)
{
- intptr_t i, oprsz = simd_oprsz(desc) / 8;
- intptr_t ra = GETPC();
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
unsigned rd = simd_data(desc);
- uint64_t *d1 = &env->vfp.zregs[rd].d[0];
- uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
- uint8_t *pg = vg;
+ void *vd = &env->vfp.zregs[rd];
- for (i = 0; i < oprsz; i += 1) {
- if (pg[H1(i)] & 1) {
- cpu_stq_data_ra(env, addr, d1[i], ra);
- cpu_stq_data_ra(env, addr + 8, d2[i], ra);
- }
- addr += 2 * 8;
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ tlb_fn(env, vd, i, addr, mmu_idx, ra);
+ }
+ i += esize, pg >>= esize;
+ addr += msize;
+ } while (i & 15);
}
+ set_helper_retaddr(0);
}
-void HELPER(sve_st3dd_r)(CPUARMState *env, void *vg,
- target_ulong addr, uint32_t desc)
+static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr,
+ uint32_t desc, const uintptr_t ra,
+ const int esize, const int msize,
+ sve_st1_tlb_fn *tlb_fn)
{
- intptr_t i, oprsz = simd_oprsz(desc) / 8;
- intptr_t ra = GETPC();
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
unsigned rd = simd_data(desc);
- uint64_t *d1 = &env->vfp.zregs[rd].d[0];
- uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
- uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
- uint8_t *pg = vg;
+ void *d1 = &env->vfp.zregs[rd];
+ void *d2 = &env->vfp.zregs[(rd + 1) & 31];
- for (i = 0; i < oprsz; i += 1) {
- if (pg[H1(i)] & 1) {
- cpu_stq_data_ra(env, addr, d1[i], ra);
- cpu_stq_data_ra(env, addr + 8, d2[i], ra);
- cpu_stq_data_ra(env, addr + 16, d3[i], ra);
- }
- addr += 3 * 8;
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ tlb_fn(env, d1, i, addr, mmu_idx, ra);
+ tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+ }
+ i += esize, pg >>= esize;
+ addr += 2 * msize;
+ } while (i & 15);
}
+ set_helper_retaddr(0);
}
-void HELPER(sve_st4dd_r)(CPUARMState *env, void *vg,
- target_ulong addr, uint32_t desc)
+static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr,
+ uint32_t desc, const uintptr_t ra,
+ const int esize, const int msize,
+ sve_st1_tlb_fn *tlb_fn)
{
- intptr_t i, oprsz = simd_oprsz(desc) / 8;
- intptr_t ra = GETPC();
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
unsigned rd = simd_data(desc);
- uint64_t *d1 = &env->vfp.zregs[rd].d[0];
- uint64_t *d2 = &env->vfp.zregs[(rd + 1) & 31].d[0];
- uint64_t *d3 = &env->vfp.zregs[(rd + 2) & 31].d[0];
- uint64_t *d4 = &env->vfp.zregs[(rd + 3) & 31].d[0];
- uint8_t *pg = vg;
+ void *d1 = &env->vfp.zregs[rd];
+ void *d2 = &env->vfp.zregs[(rd + 1) & 31];
+ void *d3 = &env->vfp.zregs[(rd + 2) & 31];
- for (i = 0; i < oprsz; i += 1) {
- if (pg[H1(i)] & 1) {
- cpu_stq_data_ra(env, addr, d1[i], ra);
- cpu_stq_data_ra(env, addr + 8, d2[i], ra);
- cpu_stq_data_ra(env, addr + 16, d3[i], ra);
- cpu_stq_data_ra(env, addr + 24, d4[i], ra);
- }
- addr += 4 * 8;
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ tlb_fn(env, d1, i, addr, mmu_idx, ra);
+ tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+ tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
+ }
+ i += esize, pg >>= esize;
+ addr += 3 * msize;
+ } while (i & 15);
}
+ set_helper_retaddr(0);
}
+static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr,
+ uint32_t desc, const uintptr_t ra,
+ const int esize, const int msize,
+ sve_st1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
+ unsigned rd = simd_data(desc);
+ void *d1 = &env->vfp.zregs[rd];
+ void *d2 = &env->vfp.zregs[(rd + 1) & 31];
+ void *d3 = &env->vfp.zregs[(rd + 2) & 31];
+ void *d4 = &env->vfp.zregs[(rd + 3) & 31];
+
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ tlb_fn(env, d1, i, addr, mmu_idx, ra);
+ tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+ tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
+ tlb_fn(env, d4, i, addr + 3 * msize, mmu_idx, ra);
+ }
+ i += esize, pg >>= esize;
+ addr += 4 * msize;
+ } while (i & 15);
+ }
+ set_helper_retaddr(0);
+}
+
+#define DO_STN_1(N, NAME, ESIZE) \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \
+ (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
+{ \
+ sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, 1, \
+ sve_st1##NAME##_tlb); \
+}
+
+#define DO_STN_2(N, NAME, ESIZE, MSIZE) \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \
+ (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
+{ \
+ sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE, \
+ arm_cpu_data_is_big_endian(env) \
+ ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \
+}
+
+DO_STN_1(1, bb, 1)
+DO_STN_1(1, bh, 2)
+DO_STN_1(1, bs, 4)
+DO_STN_1(1, bd, 8)
+DO_STN_1(2, bb, 1)
+DO_STN_1(3, bb, 1)
+DO_STN_1(4, bb, 1)
+
+DO_STN_2(1, hh, 2, 2)
+DO_STN_2(1, hs, 4, 2)
+DO_STN_2(1, hd, 8, 2)
+DO_STN_2(2, hh, 2, 2)
+DO_STN_2(3, hh, 2, 2)
+DO_STN_2(4, hh, 2, 2)
+
+DO_STN_2(1, ss, 4, 4)
+DO_STN_2(1, sd, 8, 4)
+DO_STN_2(2, ss, 4, 4)
+DO_STN_2(3, ss, 4, 4)
+DO_STN_2(4, ss, 4, 4)
+
+DO_STN_2(1, dd, 8, 8)
+DO_STN_2(2, dd, 8, 8)
+DO_STN_2(3, dd, 8, 8)
+DO_STN_2(4, dd, 8, 8)
+
+#undef DO_STN_1
+#undef DO_STN_2
+
/* Loads with a vector index. */
#define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (13 preceding siblings ...)
2018-08-09 4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
@ 2018-08-09 4:22 ` Richard Henderson
2018-08-11 5:40 ` Philippe Mathieu-Daudé
2018-08-09 4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
` (7 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:22 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
We can choose the endianness at translation time, rather than
re-computing it at execution time.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper-sve.h | 117 +++++++++++++++-------
target/arm/sve_helper.c | 70 ++++++-------
target/arm/translate-sve.c | 196 +++++++++++++++++++++++++------------
3 files changed, 252 insertions(+), 131 deletions(-)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 023952a9a4..526caec8da 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1128,20 +1128,35 @@ DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ld4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
@@ -1150,13 +1165,21 @@ DEF_HELPER_FLAGS_4(sve_ld1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ld1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ld1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ld1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
@@ -1166,17 +1189,28 @@ DEF_HELPER_FLAGS_4(sve_ldff1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldff1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldff1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldff1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldff1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldff1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldff1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldnf1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldnf1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
@@ -1186,17 +1220,28 @@ DEF_HELPER_FLAGS_4(sve_ldnf1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldnf1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_ldnf1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_ldnf1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldnf1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_ldnf1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_ldnf1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_st1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4eae6569cc..56e2f523c5 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4362,18 +4362,18 @@ void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
sve_##NAME##_host, sve_##NAME##_tlb); \
}
-/* TODO: Propagate the endian check back to the translator. */
#define DO_LD1_2(NAME, ESZ, MSZ) \
-void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
-{ \
- if (arm_cpu_data_is_big_endian(env)) { \
- sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
- sve_##NAME##_be_host, sve_##NAME##_be_tlb); \
- } else { \
- sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
- sve_##NAME##_le_host, sve_##NAME##_le_tlb); \
- } \
+void HELPER(sve_##NAME##_le_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_##NAME##_le_host, sve_##NAME##_le_tlb); \
+} \
+void HELPER(sve_##NAME##_be_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_##NAME##_be_host, sve_##NAME##_be_tlb); \
}
DO_LD1_1(ld1bb, 0)
@@ -4500,12 +4500,17 @@ void __attribute__((flatten)) HELPER(sve_ld##N##bb_r) \
}
#define DO_LDN_2(N, SUFF, SIZE) \
-void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_r) \
+void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_le_r) \
(CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
{ \
sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(), \
- arm_cpu_data_is_big_endian(env) \
- ? sve_ld1##SUFF##_be_tlb : sve_ld1##SUFF##_le_tlb); \
+ sve_ld1##SUFF##_le_tlb); \
+} \
+void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_be_r) \
+ (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
+{ \
+ sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(), \
+ sve_ld1##SUFF##_be_tlb); \
}
DO_LDN_1(2)
@@ -4722,29 +4727,28 @@ void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host); \
}
-/* TODO: Propagate the endian check back to the translator. */
#define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
-void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
+void HELPER(sve_ldff1##PART##_le_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
{ \
- if (arm_cpu_data_is_big_endian(env)) { \
- sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
- sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb); \
- } else { \
- sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
- sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb); \
- } \
+ sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb); \
} \
-void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
- target_ulong addr, uint32_t desc) \
+void HELPER(sve_ldnf1##PART##_le_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
{ \
- if (arm_cpu_data_is_big_endian(env)) { \
- sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
- sve_ld1##PART##_be_host); \
- } else { \
- sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
- sve_ld1##PART##_le_host); \
- } \
+ sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_le_host); \
+} \
+void HELPER(sve_ldff1##PART##_be_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
+ sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb); \
+} \
+void HELPER(sve_ldnf1##PART##_be_r)(CPUARMState *env, void *vg, \
+ target_ulong addr, uint32_t desc) \
+{ \
+ sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_be_host); \
}
DO_LDFF1_LDNF1_1(bb, 0)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index bef6b8242d..de12c01e7d 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4624,32 +4624,58 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
static void do_ld_zpa(DisasContext *s, int zt, int pg,
TCGv_i64 addr, int dtype, int nreg)
{
- static gen_helper_gvec_mem * const fns[16][4] = {
- { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
- gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
- { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
+ static gen_helper_gvec_mem * const fns[2][16][4] = {
+ /* Little-endian */
+ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
+ gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
+ { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1sds_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hh_r, gen_helper_sve_ld2hh_r,
- gen_helper_sve_ld3hh_r, gen_helper_sve_ld4hh_r },
- { gen_helper_sve_ld1hsu_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hdu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r,
+ gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r },
+ { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hds_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1hss_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1ss_r, gen_helper_sve_ld2ss_r,
- gen_helper_sve_ld3ss_r, gen_helper_sve_ld4ss_r },
- { gen_helper_sve_ld1sdu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r,
+ gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r },
+ { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
- { gen_helper_sve_ld1dd_r, gen_helper_sve_ld2dd_r,
- gen_helper_sve_ld3dd_r, gen_helper_sve_ld4dd_r },
+ { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r,
+ gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } },
+
+ /* Big-endian */
+ { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
+ gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
+ { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hh_be_r, gen_helper_sve_ld2hh_be_r,
+ gen_helper_sve_ld3hh_be_r, gen_helper_sve_ld4hh_be_r },
+ { gen_helper_sve_ld1hsu_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hdu_be_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1hds_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1hss_be_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld2ss_be_r,
+ gen_helper_sve_ld3ss_be_r, gen_helper_sve_ld4ss_be_r },
+ { gen_helper_sve_ld1sdu_be_r, NULL, NULL, NULL },
+
+ { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
+ { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r,
+ gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } }
};
- gen_helper_gvec_mem *fn = fns[dtype][nreg];
+ gen_helper_gvec_mem *fn = fns[s->be_data == MO_BE][dtype][nreg];
/* While there are holes in the table, they are not
* accessible via the instruction encoding.
@@ -4689,59 +4715,103 @@ static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
{
- static gen_helper_gvec_mem * const fns[16] = {
- gen_helper_sve_ldff1bb_r,
- gen_helper_sve_ldff1bhu_r,
- gen_helper_sve_ldff1bsu_r,
- gen_helper_sve_ldff1bdu_r,
+ static gen_helper_gvec_mem * const fns[2][16] = {
+ /* Little-endian */
+ { gen_helper_sve_ldff1bb_r,
+ gen_helper_sve_ldff1bhu_r,
+ gen_helper_sve_ldff1bsu_r,
+ gen_helper_sve_ldff1bdu_r,
- gen_helper_sve_ldff1sds_r,
- gen_helper_sve_ldff1hh_r,
- gen_helper_sve_ldff1hsu_r,
- gen_helper_sve_ldff1hdu_r,
+ gen_helper_sve_ldff1sds_le_r,
+ gen_helper_sve_ldff1hh_le_r,
+ gen_helper_sve_ldff1hsu_le_r,
+ gen_helper_sve_ldff1hdu_le_r,
- gen_helper_sve_ldff1hds_r,
- gen_helper_sve_ldff1hss_r,
- gen_helper_sve_ldff1ss_r,
- gen_helper_sve_ldff1sdu_r,
+ gen_helper_sve_ldff1hds_le_r,
+ gen_helper_sve_ldff1hss_le_r,
+ gen_helper_sve_ldff1ss_le_r,
+ gen_helper_sve_ldff1sdu_le_r,
- gen_helper_sve_ldff1bds_r,
- gen_helper_sve_ldff1bss_r,
- gen_helper_sve_ldff1bhs_r,
- gen_helper_sve_ldff1dd_r,
+ gen_helper_sve_ldff1bds_r,
+ gen_helper_sve_ldff1bss_r,
+ gen_helper_sve_ldff1bhs_r,
+ gen_helper_sve_ldff1dd_le_r },
+
+ /* Big-endian */
+ { gen_helper_sve_ldff1bb_r,
+ gen_helper_sve_ldff1bhu_r,
+ gen_helper_sve_ldff1bsu_r,
+ gen_helper_sve_ldff1bdu_r,
+
+ gen_helper_sve_ldff1sds_be_r,
+ gen_helper_sve_ldff1hh_be_r,
+ gen_helper_sve_ldff1hsu_be_r,
+ gen_helper_sve_ldff1hdu_be_r,
+
+ gen_helper_sve_ldff1hds_be_r,
+ gen_helper_sve_ldff1hss_be_r,
+ gen_helper_sve_ldff1ss_be_r,
+ gen_helper_sve_ldff1sdu_be_r,
+
+ gen_helper_sve_ldff1bds_r,
+ gen_helper_sve_ldff1bss_r,
+ gen_helper_sve_ldff1bhs_r,
+ gen_helper_sve_ldff1dd_be_r },
};
if (sve_access_check(s)) {
TCGv_i64 addr = new_tmp_a64(s);
tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
- do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
+ do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
}
return true;
}
static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
{
- static gen_helper_gvec_mem * const fns[16] = {
- gen_helper_sve_ldnf1bb_r,
- gen_helper_sve_ldnf1bhu_r,
- gen_helper_sve_ldnf1bsu_r,
- gen_helper_sve_ldnf1bdu_r,
+ static gen_helper_gvec_mem * const fns[2][16] = {
+ /* Little-endian */
+ { gen_helper_sve_ldnf1bb_r,
+ gen_helper_sve_ldnf1bhu_r,
+ gen_helper_sve_ldnf1bsu_r,
+ gen_helper_sve_ldnf1bdu_r,
- gen_helper_sve_ldnf1sds_r,
- gen_helper_sve_ldnf1hh_r,
- gen_helper_sve_ldnf1hsu_r,
- gen_helper_sve_ldnf1hdu_r,
+ gen_helper_sve_ldnf1sds_le_r,
+ gen_helper_sve_ldnf1hh_le_r,
+ gen_helper_sve_ldnf1hsu_le_r,
+ gen_helper_sve_ldnf1hdu_le_r,
- gen_helper_sve_ldnf1hds_r,
- gen_helper_sve_ldnf1hss_r,
- gen_helper_sve_ldnf1ss_r,
- gen_helper_sve_ldnf1sdu_r,
+ gen_helper_sve_ldnf1hds_le_r,
+ gen_helper_sve_ldnf1hss_le_r,
+ gen_helper_sve_ldnf1ss_le_r,
+ gen_helper_sve_ldnf1sdu_le_r,
- gen_helper_sve_ldnf1bds_r,
- gen_helper_sve_ldnf1bss_r,
- gen_helper_sve_ldnf1bhs_r,
- gen_helper_sve_ldnf1dd_r,
+ gen_helper_sve_ldnf1bds_r,
+ gen_helper_sve_ldnf1bss_r,
+ gen_helper_sve_ldnf1bhs_r,
+ gen_helper_sve_ldnf1dd_le_r },
+
+ /* Big-endian */
+ { gen_helper_sve_ldnf1bb_r,
+ gen_helper_sve_ldnf1bhu_r,
+ gen_helper_sve_ldnf1bsu_r,
+ gen_helper_sve_ldnf1bdu_r,
+
+ gen_helper_sve_ldnf1sds_be_r,
+ gen_helper_sve_ldnf1hh_be_r,
+ gen_helper_sve_ldnf1hsu_be_r,
+ gen_helper_sve_ldnf1hdu_be_r,
+
+ gen_helper_sve_ldnf1hds_be_r,
+ gen_helper_sve_ldnf1hss_be_r,
+ gen_helper_sve_ldnf1ss_be_r,
+ gen_helper_sve_ldnf1sdu_be_r,
+
+ gen_helper_sve_ldnf1bds_r,
+ gen_helper_sve_ldnf1bss_r,
+ gen_helper_sve_ldnf1bhs_r,
+ gen_helper_sve_ldnf1dd_be_r },
};
if (sve_access_check(s)) {
@@ -4751,16 +4821,18 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
TCGv_i64 addr = new_tmp_a64(s);
tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), off);
- do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
+ do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
}
return true;
}
static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
{
- static gen_helper_gvec_mem * const fns[4] = {
- gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_r,
- gen_helper_sve_ld1ss_r, gen_helper_sve_ld1dd_r,
+ static gen_helper_gvec_mem * const fns[2][4] = {
+ { gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_le_r,
+ gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld1dd_le_r },
+ { gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_be_r,
+ gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld1dd_be_r },
};
unsigned vsz = vec_full_reg_size(s);
TCGv_ptr t_pg;
@@ -4792,7 +4864,7 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
t_pg = tcg_temp_new_ptr();
tcg_gen_addi_ptr(t_pg, cpu_env, poff);
- fns[msz](cpu_env, t_pg, addr, desc);
+ fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, desc);
tcg_temp_free_ptr(t_pg);
tcg_temp_free_i32(desc);
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores for endianness
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (14 preceding siblings ...)
2018-08-09 4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
@ 2018-08-09 4:22 ` Richard Henderson
2018-08-11 5:41 ` Philippe Mathieu-Daudé
2018-08-09 4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
` (6 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:22 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
We can choose the endianness at translation time, rather than
re-computing it at execution time.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper-sve.h | 48 +++++++++++++++++--------
target/arm/sve_helper.c | 11 ++++--
target/arm/translate-sve.c | 72 +++++++++++++++++++++++++++++---------
3 files changed, 96 insertions(+), 35 deletions(-)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 526caec8da..1ad043101a 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1248,29 +1248,47 @@ DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_st3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_st4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+
+DEF_HELPER_FLAGS_4(sve_st1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hs_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hs_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1hd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
-DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1sd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
+DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 56e2f523c5..92c0e961a9 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4940,12 +4940,17 @@ void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \
}
#define DO_STN_2(N, NAME, ESIZE, MSIZE) \
-void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_le_r) \
(CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
{ \
sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE, \
- arm_cpu_data_is_big_endian(env) \
- ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \
+ sve_st1##NAME##_le_tlb); \
+} \
+void __attribute__((flatten)) HELPER(sve_st##N##NAME##_be_r) \
+ (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
+{ \
+ sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE, \
+ sve_st1##NAME##_be_tlb); \
}
DO_STN_1(1, bb, 1)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index de12c01e7d..acb85731f8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4953,32 +4953,70 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
int msz, int esz, int nreg)
{
- static gen_helper_gvec_mem * const fn_single[4][4] = {
- { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r,
- gen_helper_sve_st1bs_r, gen_helper_sve_st1bd_r },
- { NULL, gen_helper_sve_st1hh_r,
- gen_helper_sve_st1hs_r, gen_helper_sve_st1hd_r },
- { NULL, NULL,
- gen_helper_sve_st1ss_r, gen_helper_sve_st1sd_r },
- { NULL, NULL, NULL, gen_helper_sve_st1dd_r },
+ static gen_helper_gvec_mem * const fn_single[2][4][4] = {
+ { { gen_helper_sve_st1bb_r,
+ gen_helper_sve_st1bh_r,
+ gen_helper_sve_st1bs_r,
+ gen_helper_sve_st1bd_r },
+ { NULL,
+ gen_helper_sve_st1hh_le_r,
+ gen_helper_sve_st1hs_le_r,
+ gen_helper_sve_st1hd_le_r },
+ { NULL, NULL,
+ gen_helper_sve_st1ss_le_r,
+ gen_helper_sve_st1sd_le_r },
+ { NULL, NULL, NULL,
+ gen_helper_sve_st1dd_le_r } },
+ { { gen_helper_sve_st1bb_r,
+ gen_helper_sve_st1bh_r,
+ gen_helper_sve_st1bs_r,
+ gen_helper_sve_st1bd_r },
+ { NULL,
+ gen_helper_sve_st1hh_be_r,
+ gen_helper_sve_st1hs_be_r,
+ gen_helper_sve_st1hd_be_r },
+ { NULL, NULL,
+ gen_helper_sve_st1ss_be_r,
+ gen_helper_sve_st1sd_be_r },
+ { NULL, NULL, NULL,
+ gen_helper_sve_st1dd_be_r } },
};
- static gen_helper_gvec_mem * const fn_multiple[3][4] = {
- { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_r,
- gen_helper_sve_st2ss_r, gen_helper_sve_st2dd_r },
- { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_r,
- gen_helper_sve_st3ss_r, gen_helper_sve_st3dd_r },
- { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_r,
- gen_helper_sve_st4ss_r, gen_helper_sve_st4dd_r },
+ static gen_helper_gvec_mem * const fn_multiple[2][3][4] = {
+ { { gen_helper_sve_st2bb_r,
+ gen_helper_sve_st2hh_le_r,
+ gen_helper_sve_st2ss_le_r,
+ gen_helper_sve_st2dd_le_r },
+ { gen_helper_sve_st3bb_r,
+ gen_helper_sve_st3hh_le_r,
+ gen_helper_sve_st3ss_le_r,
+ gen_helper_sve_st3dd_le_r },
+ { gen_helper_sve_st4bb_r,
+ gen_helper_sve_st4hh_le_r,
+ gen_helper_sve_st4ss_le_r,
+ gen_helper_sve_st4dd_le_r } },
+ { { gen_helper_sve_st2bb_r,
+ gen_helper_sve_st2hh_be_r,
+ gen_helper_sve_st2ss_be_r,
+ gen_helper_sve_st2dd_be_r },
+ { gen_helper_sve_st3bb_r,
+ gen_helper_sve_st3hh_be_r,
+ gen_helper_sve_st3ss_be_r,
+ gen_helper_sve_st3dd_be_r },
+ { gen_helper_sve_st4bb_r,
+ gen_helper_sve_st4hh_be_r,
+ gen_helper_sve_st4ss_be_r,
+ gen_helper_sve_st4dd_be_r } },
};
gen_helper_gvec_mem *fn;
+ int be = s->be_data == MO_BE;
if (nreg == 0) {
/* ST1 */
- fn = fn_single[msz][esz];
+ fn = fn_single[be][msz][esz];
} else {
/* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
assert(msz == esz);
- fn = fn_multiple[nreg - 1][msz];
+ fn = fn_multiple[be][nreg - 1][msz];
}
assert(fn != NULL);
do_mem_zpa(s, zt, pg, addr, fn);
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (15 preceding siblings ...)
2018-08-09 4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
@ 2018-08-09 4:22 ` Richard Henderson
2018-08-23 16:08 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
` (5 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:22 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
This fixes the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper-sve.h | 84 +++++++++----
target/arm/sve_helper.c | 218 +++++++++++++++++++++++----------
target/arm/translate-sve.c | 244 +++++++++++++++++++++++++------------
3 files changed, 380 insertions(+), 166 deletions(-)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 1ad043101a..49d1c09e30 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1292,69 +1292,111 @@ DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhsu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldssu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbss_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhss_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhss_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhss_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbsu_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhsu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldssu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhsu_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldss_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbss_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhss_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhss_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldhss_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbdu_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldddu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldsdu_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbds_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbdu_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldddu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldsdu_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbds_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbdu_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_le_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhdu_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldddu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldsdu_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsdu_be_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_lddd_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldbds_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldhds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_le_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldsds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldhds_be_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldsds_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 92c0e961a9..76d3f021e4 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -4984,80 +4984,166 @@ DO_STN_2(4, dd, 8, 8)
/* Loads with a vector index. */
-#define DO_LD1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- unsigned scale = simd_data(desc); \
- uintptr_t ra = GETPC(); \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- TYPEM m = 0; \
- if (pg & 1) { \
- target_ulong off = *(TYPEI *)(vm + H1_4(i)); \
- m = FN(env, base + (off << scale), ra); \
- } \
- *(uint32_t *)(vd + H1_4(i)) = m; \
- i += 4, pg >>= 4; \
- } while (i & 15); \
- } \
+typedef target_ulong zreg_off_fn(void *reg, intptr_t reg_ofs);
+
+static target_ulong off_zsu_s(void *reg, intptr_t reg_ofs)
+{
+ return *(uint32_t *)(reg + H1_4(reg_ofs));
}
-#define DO_LD1_ZPZ_D(NAME, TYPEI, TYPEM, FN) \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc) / 8; \
- unsigned scale = simd_data(desc); \
- uintptr_t ra = GETPC(); \
- uint64_t *d = vd, *m = vm; uint8_t *pg = vg; \
- for (i = 0; i < oprsz; i++) { \
- TYPEM mm = 0; \
- if (pg[H1(i)] & 1) { \
- target_ulong off = (TYPEI)m[i]; \
- mm = FN(env, base + (off << scale), ra); \
- } \
- d[i] = mm; \
- } \
+static target_ulong off_zss_s(void *reg, intptr_t reg_ofs)
+{
+ return *(int32_t *)(reg + H1_4(reg_ofs));
}
-DO_LD1_ZPZ_S(sve_ldbsu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhsu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_S(sve_ldssu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_S(sve_ldbss_zsu, uint32_t, int8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhss_zsu, uint32_t, int16_t, cpu_lduw_data_ra)
+static target_ulong off_zsu_d(void *reg, intptr_t reg_ofs)
+{
+ return (uint32_t)*(uint64_t *)(reg + reg_ofs);
+}
-DO_LD1_ZPZ_S(sve_ldbsu_zss, int32_t, uint8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhsu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_S(sve_ldssu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_S(sve_ldbss_zss, int32_t, int8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_S(sve_ldhss_zss, int32_t, int16_t, cpu_lduw_data_ra)
+static target_ulong off_zss_d(void *reg, intptr_t reg_ofs)
+{
+ return (int32_t)*(uint64_t *)(reg + reg_ofs);
+}
-DO_LD1_ZPZ_D(sve_ldbdu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhdu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsdu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_D(sve_ldddu_zsu, uint32_t, uint64_t, cpu_ldq_data_ra)
-DO_LD1_ZPZ_D(sve_ldbds_zsu, uint32_t, int8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhds_zsu, uint32_t, int16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsds_zsu, uint32_t, int32_t, cpu_ldl_data_ra)
+static target_ulong off_zd_d(void *reg, intptr_t reg_ofs)
+{
+ return *(uint64_t *)(reg + reg_ofs);
+}
-DO_LD1_ZPZ_D(sve_ldbdu_zss, int32_t, uint8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhdu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsdu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_D(sve_ldddu_zss, int32_t, uint64_t, cpu_ldq_data_ra)
-DO_LD1_ZPZ_D(sve_ldbds_zss, int32_t, int8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhds_zss, int32_t, int16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsds_zss, int32_t, int32_t, cpu_ldl_data_ra)
+static void sve_ld1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
+ target_ulong base, uint32_t desc, uintptr_t ra,
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
+ unsigned scale = simd_data(desc);
+ ARMVectorReg scratch = { };
-DO_LD1_ZPZ_D(sve_ldbdu_zd, uint64_t, uint8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhdu_zd, uint64_t, uint16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsdu_zd, uint64_t, uint32_t, cpu_ldl_data_ra)
-DO_LD1_ZPZ_D(sve_ldddu_zd, uint64_t, uint64_t, cpu_ldq_data_ra)
-DO_LD1_ZPZ_D(sve_ldbds_zd, uint64_t, int8_t, cpu_ldub_data_ra)
-DO_LD1_ZPZ_D(sve_ldhds_zd, uint64_t, int16_t, cpu_lduw_data_ra)
-DO_LD1_ZPZ_D(sve_ldsds_zd, uint64_t, int32_t, cpu_ldl_data_ra)
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ target_ulong off = off_fn(vm, i);
+ tlb_fn(env, &scratch, i, base + (off << scale), mmu_idx, ra);
+ }
+ i += 4, pg >>= 4;
+ } while (i & 15);
+ }
+ set_helper_retaddr(0);
+
+ /* Wait until all exceptions have been raised to write back. */
+ memcpy(vd, &scratch, oprsz);
+}
+
+static void sve_ld1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
+ target_ulong base, uint32_t desc, uintptr_t ra,
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc) / 8;
+ unsigned scale = simd_data(desc);
+ ARMVectorReg scratch = { };
+
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; i++) {
+ uint8_t pg = *(uint8_t *)(vg + H1(i));
+ if (pg & 1) {
+ target_ulong off = off_fn(vm, i * 8);
+ tlb_fn(env, &scratch, i * 8, base + (off << scale), mmu_idx, ra);
+ }
+ }
+ set_helper_retaddr(0);
+
+ /* Wait until all exceptions have been raised to write back. */
+ memcpy(vd, &scratch, oprsz * 8);
+}
+
+#define DO_LD1_ZPZ_S(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_ld##MEM##_##OFS) \
+ (CPUARMState *env, void *vd, void *vg, void *vm, \
+ target_ulong base, uint32_t desc) \
+{ \
+ sve_ld1_zs(env, vd, vg, vm, base, desc, GETPC(), \
+ off_##OFS##_s, sve_ld1##MEM##_tlb); \
+}
+
+#define DO_LD1_ZPZ_D(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_ld##MEM##_##OFS) \
+ (CPUARMState *env, void *vd, void *vg, void *vm, \
+ target_ulong base, uint32_t desc) \
+{ \
+ sve_ld1_zd(env, vd, vg, vm, base, desc, GETPC(), \
+ off_##OFS##_d, sve_ld1##MEM##_tlb); \
+}
+
+DO_LD1_ZPZ_S(bsu, zsu)
+DO_LD1_ZPZ_S(bsu, zss)
+DO_LD1_ZPZ_D(bdu, zsu)
+DO_LD1_ZPZ_D(bdu, zss)
+DO_LD1_ZPZ_D(bdu, zd)
+
+DO_LD1_ZPZ_S(bss, zsu)
+DO_LD1_ZPZ_S(bss, zss)
+DO_LD1_ZPZ_D(bds, zsu)
+DO_LD1_ZPZ_D(bds, zss)
+DO_LD1_ZPZ_D(bds, zd)
+
+DO_LD1_ZPZ_S(hsu_le, zsu)
+DO_LD1_ZPZ_S(hsu_le, zss)
+DO_LD1_ZPZ_D(hdu_le, zsu)
+DO_LD1_ZPZ_D(hdu_le, zss)
+DO_LD1_ZPZ_D(hdu_le, zd)
+
+DO_LD1_ZPZ_S(hsu_be, zsu)
+DO_LD1_ZPZ_S(hsu_be, zss)
+DO_LD1_ZPZ_D(hdu_be, zsu)
+DO_LD1_ZPZ_D(hdu_be, zss)
+DO_LD1_ZPZ_D(hdu_be, zd)
+
+DO_LD1_ZPZ_S(hss_le, zsu)
+DO_LD1_ZPZ_S(hss_le, zss)
+DO_LD1_ZPZ_D(hds_le, zsu)
+DO_LD1_ZPZ_D(hds_le, zss)
+DO_LD1_ZPZ_D(hds_le, zd)
+
+DO_LD1_ZPZ_S(hss_be, zsu)
+DO_LD1_ZPZ_S(hss_be, zss)
+DO_LD1_ZPZ_D(hds_be, zsu)
+DO_LD1_ZPZ_D(hds_be, zss)
+DO_LD1_ZPZ_D(hds_be, zd)
+
+DO_LD1_ZPZ_S(ss_le, zsu)
+DO_LD1_ZPZ_S(ss_le, zss)
+DO_LD1_ZPZ_D(sdu_le, zsu)
+DO_LD1_ZPZ_D(sdu_le, zss)
+DO_LD1_ZPZ_D(sdu_le, zd)
+
+DO_LD1_ZPZ_S(ss_be, zsu)
+DO_LD1_ZPZ_S(ss_be, zss)
+DO_LD1_ZPZ_D(sdu_be, zsu)
+DO_LD1_ZPZ_D(sdu_be, zss)
+DO_LD1_ZPZ_D(sdu_be, zd)
+
+DO_LD1_ZPZ_D(sds_le, zsu)
+DO_LD1_ZPZ_D(sds_le, zss)
+DO_LD1_ZPZ_D(sds_le, zd)
+
+DO_LD1_ZPZ_D(sds_be, zsu)
+DO_LD1_ZPZ_D(sds_be, zss)
+DO_LD1_ZPZ_D(sds_be, zd)
+
+DO_LD1_ZPZ_D(dd_le, zsu)
+DO_LD1_ZPZ_D(dd_le, zss)
+DO_LD1_ZPZ_D(dd_le, zd)
+
+DO_LD1_ZPZ_D(dd_be, zsu)
+DO_LD1_ZPZ_D(dd_be, zss)
+DO_LD1_ZPZ_D(dd_be, zd)
+
+#undef DO_LD1_ZPZ_S
+#undef DO_LD1_ZPZ_D
/* First fault loads with a vector index. */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index acb85731f8..d4d7e9d3ae 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5077,91 +5077,176 @@ static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale,
tcg_temp_free_i32(desc);
}
-/* Indexed by [ff][xs][u][msz]. */
-static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][3] = {
- { { { gen_helper_sve_ldbss_zsu,
- gen_helper_sve_ldhss_zsu,
- NULL, },
- { gen_helper_sve_ldbsu_zsu,
- gen_helper_sve_ldhsu_zsu,
- gen_helper_sve_ldssu_zsu, } },
- { { gen_helper_sve_ldbss_zss,
- gen_helper_sve_ldhss_zss,
- NULL, },
- { gen_helper_sve_ldbsu_zss,
- gen_helper_sve_ldhsu_zss,
- gen_helper_sve_ldssu_zss, } } },
+/* Indexed by [be][ff][xs][u][msz]. */
+static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][2][3] = {
+ /* Little-endian */
+ { { { { gen_helper_sve_ldbss_zsu,
+ gen_helper_sve_ldhss_le_zsu,
+ NULL, },
+ { gen_helper_sve_ldbsu_zsu,
+ gen_helper_sve_ldhsu_le_zsu,
+ gen_helper_sve_ldss_le_zsu, } },
+ { { gen_helper_sve_ldbss_zss,
+ gen_helper_sve_ldhss_le_zss,
+ NULL, },
+ { gen_helper_sve_ldbsu_zss,
+ gen_helper_sve_ldhsu_le_zss,
+ gen_helper_sve_ldss_le_zss, } } },
- { { { gen_helper_sve_ldffbss_zsu,
- gen_helper_sve_ldffhss_zsu,
- NULL, },
- { gen_helper_sve_ldffbsu_zsu,
- gen_helper_sve_ldffhsu_zsu,
- gen_helper_sve_ldffssu_zsu, } },
- { { gen_helper_sve_ldffbss_zss,
- gen_helper_sve_ldffhss_zss,
- NULL, },
- { gen_helper_sve_ldffbsu_zss,
- gen_helper_sve_ldffhsu_zss,
- gen_helper_sve_ldffssu_zss, } } }
+ /* First-fault */
+ { { { gen_helper_sve_ldffbss_zsu,
+ gen_helper_sve_ldffhss_zsu,
+ NULL, },
+ { gen_helper_sve_ldffbsu_zsu,
+ gen_helper_sve_ldffhsu_zsu,
+ gen_helper_sve_ldffssu_zsu, } },
+ { { gen_helper_sve_ldffbss_zss,
+ gen_helper_sve_ldffhss_zss,
+ NULL, },
+ { gen_helper_sve_ldffbsu_zss,
+ gen_helper_sve_ldffhsu_zss,
+ gen_helper_sve_ldffssu_zss, } } } },
+
+ /* Big-endian */
+ { { { { gen_helper_sve_ldbss_zsu,
+ gen_helper_sve_ldhss_be_zsu,
+ NULL, },
+ { gen_helper_sve_ldbsu_zsu,
+ gen_helper_sve_ldhsu_be_zsu,
+ gen_helper_sve_ldss_be_zsu, } },
+ { { gen_helper_sve_ldbss_zss,
+ gen_helper_sve_ldhss_be_zss,
+ NULL, },
+ { gen_helper_sve_ldbsu_zss,
+ gen_helper_sve_ldhsu_be_zss,
+ gen_helper_sve_ldss_be_zss, } } },
+
+ /* First-fault */
+ { { { gen_helper_sve_ldffbss_zsu,
+ gen_helper_sve_ldffhss_zsu,
+ NULL, },
+ { gen_helper_sve_ldffbsu_zsu,
+ gen_helper_sve_ldffhsu_zsu,
+ gen_helper_sve_ldffssu_zsu, } },
+ { { gen_helper_sve_ldffbss_zss,
+ gen_helper_sve_ldffhss_zss,
+ NULL, },
+ { gen_helper_sve_ldffbsu_zss,
+ gen_helper_sve_ldffhsu_zss,
+ gen_helper_sve_ldffssu_zss, } } } },
};
/* Note that we overload xs=2 to indicate 64-bit offset. */
-static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][3][2][4] = {
- { { { gen_helper_sve_ldbds_zsu,
- gen_helper_sve_ldhds_zsu,
- gen_helper_sve_ldsds_zsu,
- NULL, },
- { gen_helper_sve_ldbdu_zsu,
- gen_helper_sve_ldhdu_zsu,
- gen_helper_sve_ldsdu_zsu,
- gen_helper_sve_ldddu_zsu, } },
- { { gen_helper_sve_ldbds_zss,
- gen_helper_sve_ldhds_zss,
- gen_helper_sve_ldsds_zss,
- NULL, },
- { gen_helper_sve_ldbdu_zss,
- gen_helper_sve_ldhdu_zss,
- gen_helper_sve_ldsdu_zss,
- gen_helper_sve_ldddu_zss, } },
- { { gen_helper_sve_ldbds_zd,
- gen_helper_sve_ldhds_zd,
- gen_helper_sve_ldsds_zd,
- NULL, },
- { gen_helper_sve_ldbdu_zd,
- gen_helper_sve_ldhdu_zd,
- gen_helper_sve_ldsdu_zd,
- gen_helper_sve_ldddu_zd, } } },
+static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][2][3][2][4] = {
+ /* Little-endian */
+ { { { { gen_helper_sve_ldbds_zsu,
+ gen_helper_sve_ldhds_le_zsu,
+ gen_helper_sve_ldsds_le_zsu,
+ NULL, },
+ { gen_helper_sve_ldbdu_zsu,
+ gen_helper_sve_ldhdu_le_zsu,
+ gen_helper_sve_ldsdu_le_zsu,
+ gen_helper_sve_lddd_le_zsu, } },
+ { { gen_helper_sve_ldbds_zss,
+ gen_helper_sve_ldhds_le_zss,
+ gen_helper_sve_ldsds_le_zss,
+ NULL, },
+ { gen_helper_sve_ldbdu_zss,
+ gen_helper_sve_ldhdu_le_zss,
+ gen_helper_sve_ldsdu_le_zss,
+ gen_helper_sve_lddd_le_zss, } },
+ { { gen_helper_sve_ldbds_zd,
+ gen_helper_sve_ldhds_le_zd,
+ gen_helper_sve_ldsds_le_zd,
+ NULL, },
+ { gen_helper_sve_ldbdu_zd,
+ gen_helper_sve_ldhdu_le_zd,
+ gen_helper_sve_ldsdu_le_zd,
+ gen_helper_sve_lddd_le_zd, } } },
- { { { gen_helper_sve_ldffbds_zsu,
- gen_helper_sve_ldffhds_zsu,
- gen_helper_sve_ldffsds_zsu,
- NULL, },
- { gen_helper_sve_ldffbdu_zsu,
- gen_helper_sve_ldffhdu_zsu,
- gen_helper_sve_ldffsdu_zsu,
- gen_helper_sve_ldffddu_zsu, } },
- { { gen_helper_sve_ldffbds_zss,
- gen_helper_sve_ldffhds_zss,
- gen_helper_sve_ldffsds_zss,
- NULL, },
- { gen_helper_sve_ldffbdu_zss,
- gen_helper_sve_ldffhdu_zss,
- gen_helper_sve_ldffsdu_zss,
- gen_helper_sve_ldffddu_zss, } },
- { { gen_helper_sve_ldffbds_zd,
- gen_helper_sve_ldffhds_zd,
- gen_helper_sve_ldffsds_zd,
- NULL, },
- { gen_helper_sve_ldffbdu_zd,
- gen_helper_sve_ldffhdu_zd,
- gen_helper_sve_ldffsdu_zd,
- gen_helper_sve_ldffddu_zd, } } }
+ /* First-fault */
+ { { { gen_helper_sve_ldffbds_zsu,
+ gen_helper_sve_ldffhds_zsu,
+ gen_helper_sve_ldffsds_zsu,
+ NULL, },
+ { gen_helper_sve_ldffbdu_zsu,
+ gen_helper_sve_ldffhdu_zsu,
+ gen_helper_sve_ldffsdu_zsu,
+ gen_helper_sve_ldffddu_zsu, } },
+ { { gen_helper_sve_ldffbds_zss,
+ gen_helper_sve_ldffhds_zss,
+ gen_helper_sve_ldffsds_zss,
+ NULL, },
+ { gen_helper_sve_ldffbdu_zss,
+ gen_helper_sve_ldffhdu_zss,
+ gen_helper_sve_ldffsdu_zss,
+ gen_helper_sve_ldffddu_zss, } },
+ { { gen_helper_sve_ldffbds_zd,
+ gen_helper_sve_ldffhds_zd,
+ gen_helper_sve_ldffsds_zd,
+ NULL, },
+ { gen_helper_sve_ldffbdu_zd,
+ gen_helper_sve_ldffhdu_zd,
+ gen_helper_sve_ldffsdu_zd,
+ gen_helper_sve_ldffddu_zd, } } } },
+
+ /* Big-endian */
+ { { { { gen_helper_sve_ldbds_zsu,
+ gen_helper_sve_ldhds_be_zsu,
+ gen_helper_sve_ldsds_be_zsu,
+ NULL, },
+ { gen_helper_sve_ldbdu_zsu,
+ gen_helper_sve_ldhdu_be_zsu,
+ gen_helper_sve_ldsdu_be_zsu,
+ gen_helper_sve_lddd_be_zsu, } },
+ { { gen_helper_sve_ldbds_zss,
+ gen_helper_sve_ldhds_be_zss,
+ gen_helper_sve_ldsds_be_zss,
+ NULL, },
+ { gen_helper_sve_ldbdu_zss,
+ gen_helper_sve_ldhdu_be_zss,
+ gen_helper_sve_ldsdu_be_zss,
+ gen_helper_sve_lddd_be_zss, } },
+ { { gen_helper_sve_ldbds_zd,
+ gen_helper_sve_ldhds_be_zd,
+ gen_helper_sve_ldsds_be_zd,
+ NULL, },
+ { gen_helper_sve_ldbdu_zd,
+ gen_helper_sve_ldhdu_be_zd,
+ gen_helper_sve_ldsdu_be_zd,
+ gen_helper_sve_lddd_be_zd, } } },
+
+ /* First-fault */
+ { { { gen_helper_sve_ldffbds_zsu,
+ gen_helper_sve_ldffhds_zsu,
+ gen_helper_sve_ldffsds_zsu,
+ NULL, },
+ { gen_helper_sve_ldffbdu_zsu,
+ gen_helper_sve_ldffhdu_zsu,
+ gen_helper_sve_ldffsdu_zsu,
+ gen_helper_sve_ldffddu_zsu, } },
+ { { gen_helper_sve_ldffbds_zss,
+ gen_helper_sve_ldffhds_zss,
+ gen_helper_sve_ldffsds_zss,
+ NULL, },
+ { gen_helper_sve_ldffbdu_zss,
+ gen_helper_sve_ldffhdu_zss,
+ gen_helper_sve_ldffsdu_zss,
+ gen_helper_sve_ldffddu_zss, } },
+ { { gen_helper_sve_ldffbds_zd,
+ gen_helper_sve_ldffhds_zd,
+ gen_helper_sve_ldffsds_zd,
+ NULL, },
+ { gen_helper_sve_ldffbdu_zd,
+ gen_helper_sve_ldffhdu_zd,
+ gen_helper_sve_ldffsdu_zd,
+ gen_helper_sve_ldffddu_zd, } } } },
};
static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
{
gen_helper_gvec_mem_scatter *fn = NULL;
+ int be = s->be_data == MO_BE;
if (!sve_access_check(s)) {
return true;
@@ -5169,10 +5254,10 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
switch (a->esz) {
case MO_32:
- fn = gather_load_fn32[a->ff][a->xs][a->u][a->msz];
+ fn = gather_load_fn32[be][a->ff][a->xs][a->u][a->msz];
break;
case MO_64:
- fn = gather_load_fn64[a->ff][a->xs][a->u][a->msz];
+ fn = gather_load_fn64[be][a->ff][a->xs][a->u][a->msz];
break;
}
assert(fn != NULL);
@@ -5185,6 +5270,7 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
{
gen_helper_gvec_mem_scatter *fn = NULL;
+ int be = s->be_data == MO_BE;
TCGv_i64 imm;
if (a->esz < a->msz || (a->esz == a->msz && !a->u)) {
@@ -5196,10 +5282,10 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
switch (a->esz) {
case MO_32:
- fn = gather_load_fn32[a->ff][0][a->u][a->msz];
+ fn = gather_load_fn32[be][a->ff][0][a->u][a->msz];
break;
case MO_64:
- fn = gather_load_fn64[a->ff][2][a->u][a->msz];
+ fn = gather_load_fn64[be][a->ff][2][a->u][a->msz];
break;
}
assert(fn != NULL);
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (16 preceding siblings ...)
2018-08-09 4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
@ 2018-08-09 4:22 ` Richard Henderson
2018-08-23 16:09 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
` (4 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:22 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
This fixes the endianness problem for softmmu, and does
move the main loop out of a macro and into an inlined function.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper-sve.h | 52 ++++++++++----
target/arm/sve_helper.c | 139 ++++++++++++++++++++++++-------------
target/arm/translate-sve.c | 74 +++++++++++++-------
3 files changed, 177 insertions(+), 88 deletions(-)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 49d1c09e30..6b9b93af45 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1468,41 +1468,67 @@ DEF_HELPER_FLAGS_6(sve_ldffsds_zd, TCG_CALL_NO_WG,
DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sths_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stss_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_stbs_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sths_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stss_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sths_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stss_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_stbd_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sthd_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stsd_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stdd_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_stsd_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_stbd_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sthd_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stsd_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stdd_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_stsd_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_stbd_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_sthd_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_le_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stsd_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_sthd_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_stdd_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_stsd_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stsd_be_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_stdd_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 76d3f021e4..0a4756bff9 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5235,61 +5235,100 @@ DO_LDFF1_ZPZ_D(sve_ldffsds_zd, uint64_t, int32_t, cpu_ldl_data_ra)
/* Stores with a vector index. */
-#define DO_ST1_ZPZ_S(NAME, TYPEI, FN) \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- unsigned scale = simd_data(desc); \
- uintptr_t ra = GETPC(); \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- if (likely(pg & 1)) { \
- target_ulong off = *(TYPEI *)(vm + H1_4(i)); \
- uint32_t d = *(uint32_t *)(vd + H1_4(i)); \
- FN(env, base + (off << scale), d, ra); \
- } \
- i += sizeof(uint32_t), pg >>= sizeof(uint32_t); \
- } while (i & 15); \
- } \
+static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
+ target_ulong base, uint32_t desc, uintptr_t ra,
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc);
+ unsigned scale = simd_data(desc);
+
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; ) {
+ uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
+ do {
+ if (pg & 1) {
+ target_ulong off = off_fn(vm, i);
+ tlb_fn(env, vd, i, base + (off << scale), mmu_idx, ra);
+ }
+ i += 4, pg >>= 4;
+ } while (i & 15);
+ }
+ set_helper_retaddr(0);
}
-#define DO_ST1_ZPZ_D(NAME, TYPEI, FN) \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc) / 8; \
- unsigned scale = simd_data(desc); \
- uintptr_t ra = GETPC(); \
- uint64_t *d = vd, *m = vm; uint8_t *pg = vg; \
- for (i = 0; i < oprsz; i++) { \
- if (likely(pg[H1(i)] & 1)) { \
- target_ulong off = (target_ulong)(TYPEI)m[i] << scale; \
- FN(env, base + off, d[i], ra); \
- } \
- } \
+static void sve_st1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
+ target_ulong base, uint32_t desc, uintptr_t ra,
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t i, oprsz = simd_oprsz(desc) / 8;
+ unsigned scale = simd_data(desc);
+
+ set_helper_retaddr(ra);
+ for (i = 0; i < oprsz; i++) {
+ uint8_t pg = *(uint8_t *)(vg + H1(i));
+ if (pg & 1) {
+ target_ulong off = off_fn(vm, i * 8);
+ tlb_fn(env, vd, i * 8, base + (off << scale), mmu_idx, ra);
+ }
+ }
+ set_helper_retaddr(0);
}
-DO_ST1_ZPZ_S(sve_stbs_zsu, uint32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_S(sve_sths_zsu, uint32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_S(sve_stss_zsu, uint32_t, cpu_stl_data_ra)
+#define DO_ST1_ZPZ_S(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_st##MEM##_##OFS) \
+ (CPUARMState *env, void *vd, void *vg, void *vm, \
+ target_ulong base, uint32_t desc) \
+{ \
+ sve_st1_zs(env, vd, vg, vm, base, desc, GETPC(), \
+ off_##OFS##_s, sve_st1##MEM##_tlb); \
+}
-DO_ST1_ZPZ_S(sve_stbs_zss, int32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_S(sve_sths_zss, int32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_S(sve_stss_zss, int32_t, cpu_stl_data_ra)
+#define DO_ST1_ZPZ_D(MEM, OFS) \
+void __attribute__((flatten)) HELPER(sve_st##MEM##_##OFS) \
+ (CPUARMState *env, void *vd, void *vg, void *vm, \
+ target_ulong base, uint32_t desc) \
+{ \
+ sve_st1_zd(env, vd, vg, vm, base, desc, GETPC(), \
+ off_##OFS##_d, sve_st1##MEM##_tlb); \
+}
-DO_ST1_ZPZ_D(sve_stbd_zsu, uint32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_D(sve_sthd_zsu, uint32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_D(sve_stsd_zsu, uint32_t, cpu_stl_data_ra)
-DO_ST1_ZPZ_D(sve_stdd_zsu, uint32_t, cpu_stq_data_ra)
+DO_ST1_ZPZ_S(bs, zsu)
+DO_ST1_ZPZ_S(hs_le, zsu)
+DO_ST1_ZPZ_S(hs_be, zsu)
+DO_ST1_ZPZ_S(ss_le, zsu)
+DO_ST1_ZPZ_S(ss_be, zsu)
-DO_ST1_ZPZ_D(sve_stbd_zss, int32_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_D(sve_sthd_zss, int32_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_D(sve_stsd_zss, int32_t, cpu_stl_data_ra)
-DO_ST1_ZPZ_D(sve_stdd_zss, int32_t, cpu_stq_data_ra)
+DO_ST1_ZPZ_S(bs, zss)
+DO_ST1_ZPZ_S(hs_le, zss)
+DO_ST1_ZPZ_S(hs_be, zss)
+DO_ST1_ZPZ_S(ss_le, zss)
+DO_ST1_ZPZ_S(ss_be, zss)
-DO_ST1_ZPZ_D(sve_stbd_zd, uint64_t, cpu_stb_data_ra)
-DO_ST1_ZPZ_D(sve_sthd_zd, uint64_t, cpu_stw_data_ra)
-DO_ST1_ZPZ_D(sve_stsd_zd, uint64_t, cpu_stl_data_ra)
-DO_ST1_ZPZ_D(sve_stdd_zd, uint64_t, cpu_stq_data_ra)
+DO_ST1_ZPZ_D(bd, zsu)
+DO_ST1_ZPZ_D(hd_le, zsu)
+DO_ST1_ZPZ_D(hd_be, zsu)
+DO_ST1_ZPZ_D(sd_le, zsu)
+DO_ST1_ZPZ_D(sd_be, zsu)
+DO_ST1_ZPZ_D(dd_le, zsu)
+DO_ST1_ZPZ_D(dd_be, zsu)
+
+DO_ST1_ZPZ_D(bd, zss)
+DO_ST1_ZPZ_D(hd_le, zss)
+DO_ST1_ZPZ_D(hd_be, zss)
+DO_ST1_ZPZ_D(sd_le, zss)
+DO_ST1_ZPZ_D(sd_be, zss)
+DO_ST1_ZPZ_D(dd_le, zss)
+DO_ST1_ZPZ_D(dd_be, zss)
+
+DO_ST1_ZPZ_D(bd, zd)
+DO_ST1_ZPZ_D(hd_le, zd)
+DO_ST1_ZPZ_D(hd_be, zd)
+DO_ST1_ZPZ_D(sd_le, zd)
+DO_ST1_ZPZ_D(sd_be, zd)
+DO_ST1_ZPZ_D(dd_le, zd)
+DO_ST1_ZPZ_D(dd_be, zd)
+
+#undef DO_ST1_ZPZ_S
+#undef DO_ST1_ZPZ_D
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d4d7e9d3ae..fdd9b9b3a0 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5299,35 +5299,58 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
return true;
}
-/* Indexed by [xs][msz]. */
-static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][3] = {
- { gen_helper_sve_stbs_zsu,
- gen_helper_sve_sths_zsu,
- gen_helper_sve_stss_zsu, },
- { gen_helper_sve_stbs_zss,
- gen_helper_sve_sths_zss,
- gen_helper_sve_stss_zss, },
+/* Indexed by [be][xs][msz]. */
+static gen_helper_gvec_mem_scatter * const scatter_store_fn32[2][2][3] = {
+ /* Little-endian */
+ { { gen_helper_sve_stbs_zsu,
+ gen_helper_sve_sths_le_zsu,
+ gen_helper_sve_stss_le_zsu, },
+ { gen_helper_sve_stbs_zss,
+ gen_helper_sve_sths_le_zss,
+ gen_helper_sve_stss_le_zss, } },
+ /* Big-endian */
+ { { gen_helper_sve_stbs_zsu,
+ gen_helper_sve_sths_be_zsu,
+ gen_helper_sve_stss_be_zsu, },
+ { gen_helper_sve_stbs_zss,
+ gen_helper_sve_sths_be_zss,
+ gen_helper_sve_stss_be_zss, } },
};
/* Note that we overload xs=2 to indicate 64-bit offset. */
-static gen_helper_gvec_mem_scatter * const scatter_store_fn64[3][4] = {
- { gen_helper_sve_stbd_zsu,
- gen_helper_sve_sthd_zsu,
- gen_helper_sve_stsd_zsu,
- gen_helper_sve_stdd_zsu, },
- { gen_helper_sve_stbd_zss,
- gen_helper_sve_sthd_zss,
- gen_helper_sve_stsd_zss,
- gen_helper_sve_stdd_zss, },
- { gen_helper_sve_stbd_zd,
- gen_helper_sve_sthd_zd,
- gen_helper_sve_stsd_zd,
- gen_helper_sve_stdd_zd, },
+static gen_helper_gvec_mem_scatter * const scatter_store_fn64[2][3][4] = {
+ /* Little-endian */
+ { { gen_helper_sve_stbd_zsu,
+ gen_helper_sve_sthd_le_zsu,
+ gen_helper_sve_stsd_le_zsu,
+ gen_helper_sve_stdd_le_zsu, },
+ { gen_helper_sve_stbd_zss,
+ gen_helper_sve_sthd_le_zss,
+ gen_helper_sve_stsd_le_zss,
+ gen_helper_sve_stdd_le_zss, },
+ { gen_helper_sve_stbd_zd,
+ gen_helper_sve_sthd_le_zd,
+ gen_helper_sve_stsd_le_zd,
+ gen_helper_sve_stdd_le_zd, } },
+ /* Big-endian */
+ { { gen_helper_sve_stbd_zsu,
+ gen_helper_sve_sthd_be_zsu,
+ gen_helper_sve_stsd_be_zsu,
+ gen_helper_sve_stdd_be_zsu, },
+ { gen_helper_sve_stbd_zss,
+ gen_helper_sve_sthd_be_zss,
+ gen_helper_sve_stsd_be_zss,
+ gen_helper_sve_stdd_be_zss, },
+ { gen_helper_sve_stbd_zd,
+ gen_helper_sve_sthd_be_zd,
+ gen_helper_sve_stsd_be_zd,
+ gen_helper_sve_stdd_be_zd, } },
};
static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
{
gen_helper_gvec_mem_scatter *fn;
+ int be = s->be_data == MO_BE;
if (a->esz < a->msz || (a->msz == 0 && a->scale)) {
return false;
@@ -5337,10 +5360,10 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
}
switch (a->esz) {
case MO_32:
- fn = scatter_store_fn32[a->xs][a->msz];
+ fn = scatter_store_fn32[be][a->xs][a->msz];
break;
case MO_64:
- fn = scatter_store_fn64[a->xs][a->msz];
+ fn = scatter_store_fn64[be][a->xs][a->msz];
break;
default:
g_assert_not_reached();
@@ -5353,6 +5376,7 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
{
gen_helper_gvec_mem_scatter *fn = NULL;
+ int be = s->be_data == MO_BE;
TCGv_i64 imm;
if (a->esz < a->msz) {
@@ -5364,10 +5388,10 @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
switch (a->esz) {
case MO_32:
- fn = scatter_store_fn32[0][a->msz];
+ fn = scatter_store_fn32[be][0][a->msz];
break;
case MO_64:
- fn = scatter_store_fn64[2][a->msz];
+ fn = scatter_store_fn64[be][2][a->msz];
break;
}
assert(fn != NULL);
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (17 preceding siblings ...)
2018-08-09 4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
@ 2018-08-09 4:22 ` Richard Henderson
2018-08-23 16:10 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
` (3 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:22 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
This implements the feature for softmmu, and moves the
main loop out of a macro and into a function.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper-sve.h | 84 ++++++++---
target/arm/sve_helper.c | 290 +++++++++++++++++++++++++++----------
target/arm/translate-sve.c | 84 +++++------
3 files changed, 321 insertions(+), 137 deletions(-)
diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
index 6b9b93af45..9e79182ab4 100644
--- a/target/arm/helper-sve.h
+++ b/target/arm/helper-sve.h
@@ -1401,69 +1401,111 @@ DEF_HELPER_FLAGS_6(sve_ldsds_be_zd, TCG_CALL_NO_WG,
DEF_HELPER_FLAGS_6(sve_ldffbsu_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhsu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffssu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbss_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhss_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhss_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffhss_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbsu_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhsu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffssu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhsu_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffss_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbss_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhss_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhss_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffhss_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbdu_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsdu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffddu_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffsdu_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsdu_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbds_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_le_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsds_zsu, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_be_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_le_zsu, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_be_zsu, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbdu_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsdu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffddu_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffsdu_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsdu_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbds_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_le_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsds_zss, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_be_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_le_zss, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_be_zss, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbdu_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_le_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsdu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhdu_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffddu_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffsdu_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsdu_be_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffdd_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_ldffbds_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffhds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_le_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
-DEF_HELPER_FLAGS_6(sve_ldffsds_zd, TCG_CALL_NO_WG,
+DEF_HELPER_FLAGS_6(sve_ldffhds_be_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_le_zd, TCG_CALL_NO_WG,
+ void, env, ptr, ptr, ptr, tl, i32)
+DEF_HELPER_FLAGS_6(sve_ldffsds_be_zd, TCG_CALL_NO_WG,
void, env, ptr, ptr, ptr, tl, i32)
DEF_HELPER_FLAGS_6(sve_stbs_zsu, TCG_CALL_NO_WG,
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 0a4756bff9..6728862326 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -5147,91 +5147,233 @@ DO_LD1_ZPZ_D(dd_be, zd)
/* First fault loads with a vector index. */
-#ifdef CONFIG_USER_ONLY
+/* Load one element into VD+REG_OFF from (ENV,VADDR) without faulting.
+ * The controlling predicate is known to be true. Return true if the
+ * load was successful.
+ */
+typedef bool sve_ld1_nf_fn(CPUARMState *env, void *vd, intptr_t reg_off,
+ target_ulong vaddr, int mmu_idx);
-#define DO_LDFF1_ZPZ(NAME, TYPEE, TYPEI, TYPEM, FN, H) \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc) \
-{ \
- intptr_t i, oprsz = simd_oprsz(desc); \
- unsigned scale = simd_data(desc); \
- uintptr_t ra = GETPC(); \
- bool first = true; \
- mmap_lock(); \
- for (i = 0; i < oprsz; ) { \
- uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
- do { \
- TYPEM m = 0; \
- if (pg & 1) { \
- target_ulong off = *(TYPEI *)(vm + H(i)); \
- target_ulong addr = base + (off << scale); \
- if (!first && \
- page_check_range(addr, sizeof(TYPEM), PAGE_READ)) { \
- record_fault(env, i, oprsz); \
- goto exit; \
- } \
- m = FN(env, addr, ra); \
- first = false; \
- } \
- *(TYPEE *)(vd + H(i)) = m; \
- i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
- } while (i & 15); \
- } \
- exit: \
- mmap_unlock(); \
+#ifdef CONFIG_SOFTMMU
+#define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
+static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
+ target_ulong addr, int mmu_idx) \
+{ \
+ target_ulong next_page = -(addr | TARGET_PAGE_MASK); \
+ if (likely(next_page - addr >= sizeof(TYPEM))) { \
+ void *host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx); \
+ if (likely(host)) { \
+ TYPEM val = HOST(host); \
+ *(TYPEE *)(vd + H(reg_off)) = val; \
+ return true; \
+ } \
+ } \
+ return false; \
}
-
#else
-
-#define DO_LDFF1_ZPZ(NAME, TYPEE, TYPEI, TYPEM, FN, H) \
-void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \
- target_ulong base, uint32_t desc) \
-{ \
- g_assert_not_reached(); \
+#define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
+static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
+ target_ulong addr, int mmu_idx) \
+{ \
+ if (likely(page_check_range(addr, sizeof(TYPEM), PAGE_READ))) { \
+ TYPEM val = HOST(g2h(addr)); \
+ *(TYPEE *)(vd + H(reg_off)) = val; \
+ return true; \
+ } \
+ return false; \
}
-
#endif
-#define DO_LDFF1_ZPZ_S(NAME, TYPEI, TYPEM, FN) \
- DO_LDFF1_ZPZ(NAME, uint32_t, TYPEI, TYPEM, FN, H1_4)
-#define DO_LDFF1_ZPZ_D(NAME, TYPEI, TYPEM, FN) \
- DO_LDFF1_ZPZ(NAME, uint64_t, TYPEI, TYPEM, FN, )
+DO_LD_NF(bsu, H1_4, uint32_t, uint8_t, ldub_p)
+DO_LD_NF(bss, H1_4, uint32_t, int8_t, ldsb_p)
+DO_LD_NF(bdu, , uint64_t, uint8_t, ldub_p)
+DO_LD_NF(bds, , uint64_t, int8_t, ldsb_p)
-DO_LDFF1_ZPZ_S(sve_ldffbsu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhsu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffssu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffbss_zsu, uint32_t, int8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhss_zsu, uint32_t, int16_t, cpu_lduw_data_ra)
+DO_LD_NF(hsu_le, H1_4, uint32_t, uint16_t, lduw_le_p)
+DO_LD_NF(hss_le, H1_4, uint32_t, int16_t, ldsw_le_p)
+DO_LD_NF(hsu_be, H1_4, uint32_t, uint16_t, lduw_be_p)
+DO_LD_NF(hss_be, H1_4, uint32_t, int16_t, ldsw_be_p)
+DO_LD_NF(hdu_le, , uint64_t, uint16_t, lduw_le_p)
+DO_LD_NF(hds_le, , uint64_t, int16_t, ldsw_le_p)
+DO_LD_NF(hdu_be, , uint64_t, uint16_t, lduw_be_p)
+DO_LD_NF(hds_be, , uint64_t, int16_t, ldsw_be_p)
-DO_LDFF1_ZPZ_S(sve_ldffbsu_zss, int32_t, uint8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhsu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffssu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffbss_zss, int32_t, int8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_S(sve_ldffhss_zss, int32_t, int16_t, cpu_lduw_data_ra)
+DO_LD_NF(ss_le, H1_4, uint32_t, uint32_t, ldl_le_p)
+DO_LD_NF(ss_be, H1_4, uint32_t, uint32_t, ldl_be_p)
+DO_LD_NF(sdu_le, , uint64_t, uint32_t, ldl_le_p)
+DO_LD_NF(sds_le, , uint64_t, int32_t, ldl_le_p)
+DO_LD_NF(sdu_be, , uint64_t, uint32_t, ldl_be_p)
+DO_LD_NF(sds_be, , uint64_t, int32_t, ldl_be_p)
-DO_LDFF1_ZPZ_D(sve_ldffbdu_zsu, uint32_t, uint8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhdu_zsu, uint32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsdu_zsu, uint32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffddu_zsu, uint32_t, uint64_t, cpu_ldq_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffbds_zsu, uint32_t, int8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhds_zsu, uint32_t, int16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsds_zsu, uint32_t, int32_t, cpu_ldl_data_ra)
+DO_LD_NF(dd_le, , uint64_t, uint64_t, ldq_le_p)
+DO_LD_NF(dd_be, , uint64_t, uint64_t, ldq_be_p)
-DO_LDFF1_ZPZ_D(sve_ldffbdu_zss, int32_t, uint8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhdu_zss, int32_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsdu_zss, int32_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffddu_zss, int32_t, uint64_t, cpu_ldq_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffbds_zss, int32_t, int8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhds_zss, int32_t, int16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsds_zss, int32_t, int32_t, cpu_ldl_data_ra)
+/*
+ * Common helper for all gather first-faulting loads.
+ */
+static inline void sve_ldff1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
+ target_ulong base, uint32_t desc, uintptr_t ra,
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
+ sve_ld1_nf_fn *nonfault_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t reg_off, reg_max = simd_oprsz(desc);
+ unsigned scale = simd_data(desc);
+ target_ulong addr;
-DO_LDFF1_ZPZ_D(sve_ldffbdu_zd, uint64_t, uint8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhdu_zd, uint64_t, uint16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsdu_zd, uint64_t, uint32_t, cpu_ldl_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffddu_zd, uint64_t, uint64_t, cpu_ldq_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffbds_zd, uint64_t, int8_t, cpu_ldub_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffhds_zd, uint64_t, int16_t, cpu_lduw_data_ra)
-DO_LDFF1_ZPZ_D(sve_ldffsds_zd, uint64_t, int32_t, cpu_ldl_data_ra)
+ /* Skip to the first true predicate. */
+ reg_off = find_next_active(vg, 0, reg_max, MO_32);
+ if (likely(reg_off < reg_max)) {
+ /* Perform one normal read, which will fault or not. */
+ set_helper_retaddr(ra);
+ addr = off_fn(vm, reg_off);
+ addr = base + (addr << scale);
+ tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+
+ /* The rest of the reads will be non-faulting. */
+ set_helper_retaddr(0);
+ }
+
+ /* After any fault, zero the leading predicated false elements. */
+ swap_memzero(vd, reg_off);
+
+ while (likely((reg_off += 4) < reg_max)) {
+ uint64_t pg = *(uint64_t *)(vg + (reg_off >> 6) * 8);
+ if (likely((pg >> (reg_off & 63)) & 1)) {
+ addr = off_fn(vm, reg_off);
+ addr = base + (addr << scale);
+ if (!nonfault_fn(env, vd, reg_off, addr, mmu_idx)) {
+ record_fault(env, reg_off, reg_max);
+ break;
+ }
+ } else {
+ *(uint32_t *)(vd + H1_4(reg_off)) = 0;
+ }
+ }
+}
+
+static inline void sve_ldff1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
+ target_ulong base, uint32_t desc, uintptr_t ra,
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
+ sve_ld1_nf_fn *nonfault_fn)
+{
+ const int mmu_idx = cpu_mmu_index(env, false);
+ intptr_t reg_off, reg_max = simd_oprsz(desc);
+ unsigned scale = simd_data(desc);
+ target_ulong addr;
+
+ /* Skip to the first true predicate. */
+ reg_off = find_next_active(vg, 0, reg_max, MO_64);
+ if (likely(reg_off < reg_max)) {
+ /* Perform one normal read, which will fault or not. */
+ set_helper_retaddr(ra);
+ addr = off_fn(vm, reg_off);
+ addr = base + (addr << scale);
+ tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+
+ /* The rest of the reads will be non-faulting. */
+ set_helper_retaddr(0);
+ }
+
+ /* After any fault, zero the leading predicated false elements. */
+ swap_memzero(vd, reg_off);
+
+ while (likely((reg_off += 8) < reg_max)) {
+ uint8_t pg = *(uint8_t *)(vg + H1(reg_off >> 3));
+ if (likely(pg & 1)) {
+ addr = off_fn(vm, reg_off);
+ addr = base + (addr << scale);
+ if (!nonfault_fn(env, vd, reg_off, addr, mmu_idx)) {
+ record_fault(env, reg_off, reg_max);
+ break;
+ }
+ } else {
+ *(uint64_t *)(vd + reg_off) = 0;
+ }
+ }
+}
+
+#define DO_LDFF1_ZPZ_S(MEM, OFS) \
+void HELPER(sve_ldff##MEM##_##OFS) \
+ (CPUARMState *env, void *vd, void *vg, void *vm, \
+ target_ulong base, uint32_t desc) \
+{ \
+ sve_ldff1_zs(env, vd, vg, vm, base, desc, GETPC(), \
+ off_##OFS##_s, sve_ld1##MEM##_tlb, sve_ld##MEM##_nf); \
+}
+
+#define DO_LDFF1_ZPZ_D(MEM, OFS) \
+void HELPER(sve_ldff##MEM##_##OFS) \
+ (CPUARMState *env, void *vd, void *vg, void *vm, \
+ target_ulong base, uint32_t desc) \
+{ \
+ sve_ldff1_zd(env, vd, vg, vm, base, desc, GETPC(), \
+ off_##OFS##_d, sve_ld1##MEM##_tlb, sve_ld##MEM##_nf); \
+}
+
+DO_LDFF1_ZPZ_S(bsu, zsu)
+DO_LDFF1_ZPZ_S(bsu, zss)
+DO_LDFF1_ZPZ_D(bdu, zsu)
+DO_LDFF1_ZPZ_D(bdu, zss)
+DO_LDFF1_ZPZ_D(bdu, zd)
+
+DO_LDFF1_ZPZ_S(bss, zsu)
+DO_LDFF1_ZPZ_S(bss, zss)
+DO_LDFF1_ZPZ_D(bds, zsu)
+DO_LDFF1_ZPZ_D(bds, zss)
+DO_LDFF1_ZPZ_D(bds, zd)
+
+DO_LDFF1_ZPZ_S(hsu_le, zsu)
+DO_LDFF1_ZPZ_S(hsu_le, zss)
+DO_LDFF1_ZPZ_D(hdu_le, zsu)
+DO_LDFF1_ZPZ_D(hdu_le, zss)
+DO_LDFF1_ZPZ_D(hdu_le, zd)
+
+DO_LDFF1_ZPZ_S(hsu_be, zsu)
+DO_LDFF1_ZPZ_S(hsu_be, zss)
+DO_LDFF1_ZPZ_D(hdu_be, zsu)
+DO_LDFF1_ZPZ_D(hdu_be, zss)
+DO_LDFF1_ZPZ_D(hdu_be, zd)
+
+DO_LDFF1_ZPZ_S(hss_le, zsu)
+DO_LDFF1_ZPZ_S(hss_le, zss)
+DO_LDFF1_ZPZ_D(hds_le, zsu)
+DO_LDFF1_ZPZ_D(hds_le, zss)
+DO_LDFF1_ZPZ_D(hds_le, zd)
+
+DO_LDFF1_ZPZ_S(hss_be, zsu)
+DO_LDFF1_ZPZ_S(hss_be, zss)
+DO_LDFF1_ZPZ_D(hds_be, zsu)
+DO_LDFF1_ZPZ_D(hds_be, zss)
+DO_LDFF1_ZPZ_D(hds_be, zd)
+
+DO_LDFF1_ZPZ_S(ss_le, zsu)
+DO_LDFF1_ZPZ_S(ss_le, zss)
+DO_LDFF1_ZPZ_D(sdu_le, zsu)
+DO_LDFF1_ZPZ_D(sdu_le, zss)
+DO_LDFF1_ZPZ_D(sdu_le, zd)
+
+DO_LDFF1_ZPZ_S(ss_be, zsu)
+DO_LDFF1_ZPZ_S(ss_be, zss)
+DO_LDFF1_ZPZ_D(sdu_be, zsu)
+DO_LDFF1_ZPZ_D(sdu_be, zss)
+DO_LDFF1_ZPZ_D(sdu_be, zd)
+
+DO_LDFF1_ZPZ_D(sds_le, zsu)
+DO_LDFF1_ZPZ_D(sds_le, zss)
+DO_LDFF1_ZPZ_D(sds_le, zd)
+
+DO_LDFF1_ZPZ_D(sds_be, zsu)
+DO_LDFF1_ZPZ_D(sds_be, zss)
+DO_LDFF1_ZPZ_D(sds_be, zd)
+
+DO_LDFF1_ZPZ_D(dd_le, zsu)
+DO_LDFF1_ZPZ_D(dd_le, zss)
+DO_LDFF1_ZPZ_D(dd_le, zd)
+
+DO_LDFF1_ZPZ_D(dd_be, zsu)
+DO_LDFF1_ZPZ_D(dd_be, zss)
+DO_LDFF1_ZPZ_D(dd_be, zd)
/* Stores with a vector index. */
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index fdd9b9b3a0..20492e9b8b 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -5095,17 +5095,17 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][2][3] = {
/* First-fault */
{ { { gen_helper_sve_ldffbss_zsu,
- gen_helper_sve_ldffhss_zsu,
+ gen_helper_sve_ldffhss_le_zsu,
NULL, },
{ gen_helper_sve_ldffbsu_zsu,
- gen_helper_sve_ldffhsu_zsu,
- gen_helper_sve_ldffssu_zsu, } },
+ gen_helper_sve_ldffhsu_le_zsu,
+ gen_helper_sve_ldffss_le_zsu, } },
{ { gen_helper_sve_ldffbss_zss,
- gen_helper_sve_ldffhss_zss,
+ gen_helper_sve_ldffhss_le_zss,
NULL, },
{ gen_helper_sve_ldffbsu_zss,
- gen_helper_sve_ldffhsu_zss,
- gen_helper_sve_ldffssu_zss, } } } },
+ gen_helper_sve_ldffhsu_le_zss,
+ gen_helper_sve_ldffss_le_zss, } } } },
/* Big-endian */
{ { { { gen_helper_sve_ldbss_zsu,
@@ -5123,17 +5123,17 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn32[2][2][2][2][3] = {
/* First-fault */
{ { { gen_helper_sve_ldffbss_zsu,
- gen_helper_sve_ldffhss_zsu,
+ gen_helper_sve_ldffhss_be_zsu,
NULL, },
{ gen_helper_sve_ldffbsu_zsu,
- gen_helper_sve_ldffhsu_zsu,
- gen_helper_sve_ldffssu_zsu, } },
+ gen_helper_sve_ldffhsu_be_zsu,
+ gen_helper_sve_ldffss_be_zsu, } },
{ { gen_helper_sve_ldffbss_zss,
- gen_helper_sve_ldffhss_zss,
+ gen_helper_sve_ldffhss_be_zss,
NULL, },
{ gen_helper_sve_ldffbsu_zss,
- gen_helper_sve_ldffhsu_zss,
- gen_helper_sve_ldffssu_zss, } } } },
+ gen_helper_sve_ldffhsu_be_zss,
+ gen_helper_sve_ldffss_be_zss, } } } },
};
/* Note that we overload xs=2 to indicate 64-bit offset. */
@@ -5166,29 +5166,29 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][2][3][2][4] = {
/* First-fault */
{ { { gen_helper_sve_ldffbds_zsu,
- gen_helper_sve_ldffhds_zsu,
- gen_helper_sve_ldffsds_zsu,
+ gen_helper_sve_ldffhds_le_zsu,
+ gen_helper_sve_ldffsds_le_zsu,
NULL, },
{ gen_helper_sve_ldffbdu_zsu,
- gen_helper_sve_ldffhdu_zsu,
- gen_helper_sve_ldffsdu_zsu,
- gen_helper_sve_ldffddu_zsu, } },
+ gen_helper_sve_ldffhdu_le_zsu,
+ gen_helper_sve_ldffsdu_le_zsu,
+ gen_helper_sve_ldffdd_le_zsu, } },
{ { gen_helper_sve_ldffbds_zss,
- gen_helper_sve_ldffhds_zss,
- gen_helper_sve_ldffsds_zss,
+ gen_helper_sve_ldffhds_le_zss,
+ gen_helper_sve_ldffsds_le_zss,
NULL, },
{ gen_helper_sve_ldffbdu_zss,
- gen_helper_sve_ldffhdu_zss,
- gen_helper_sve_ldffsdu_zss,
- gen_helper_sve_ldffddu_zss, } },
+ gen_helper_sve_ldffhdu_le_zss,
+ gen_helper_sve_ldffsdu_le_zss,
+ gen_helper_sve_ldffdd_le_zss, } },
{ { gen_helper_sve_ldffbds_zd,
- gen_helper_sve_ldffhds_zd,
- gen_helper_sve_ldffsds_zd,
+ gen_helper_sve_ldffhds_le_zd,
+ gen_helper_sve_ldffsds_le_zd,
NULL, },
{ gen_helper_sve_ldffbdu_zd,
- gen_helper_sve_ldffhdu_zd,
- gen_helper_sve_ldffsdu_zd,
- gen_helper_sve_ldffddu_zd, } } } },
+ gen_helper_sve_ldffhdu_le_zd,
+ gen_helper_sve_ldffsdu_le_zd,
+ gen_helper_sve_ldffdd_le_zd, } } } },
/* Big-endian */
{ { { { gen_helper_sve_ldbds_zsu,
@@ -5218,29 +5218,29 @@ static gen_helper_gvec_mem_scatter * const gather_load_fn64[2][2][3][2][4] = {
/* First-fault */
{ { { gen_helper_sve_ldffbds_zsu,
- gen_helper_sve_ldffhds_zsu,
- gen_helper_sve_ldffsds_zsu,
+ gen_helper_sve_ldffhds_be_zsu,
+ gen_helper_sve_ldffsds_be_zsu,
NULL, },
{ gen_helper_sve_ldffbdu_zsu,
- gen_helper_sve_ldffhdu_zsu,
- gen_helper_sve_ldffsdu_zsu,
- gen_helper_sve_ldffddu_zsu, } },
+ gen_helper_sve_ldffhdu_be_zsu,
+ gen_helper_sve_ldffsdu_be_zsu,
+ gen_helper_sve_ldffdd_be_zsu, } },
{ { gen_helper_sve_ldffbds_zss,
- gen_helper_sve_ldffhds_zss,
- gen_helper_sve_ldffsds_zss,
+ gen_helper_sve_ldffhds_be_zss,
+ gen_helper_sve_ldffsds_be_zss,
NULL, },
{ gen_helper_sve_ldffbdu_zss,
- gen_helper_sve_ldffhdu_zss,
- gen_helper_sve_ldffsdu_zss,
- gen_helper_sve_ldffddu_zss, } },
+ gen_helper_sve_ldffhdu_be_zss,
+ gen_helper_sve_ldffsdu_be_zss,
+ gen_helper_sve_ldffdd_be_zss, } },
{ { gen_helper_sve_ldffbds_zd,
- gen_helper_sve_ldffhds_zd,
- gen_helper_sve_ldffsds_zd,
+ gen_helper_sve_ldffhds_be_zd,
+ gen_helper_sve_ldffsds_be_zd,
NULL, },
{ gen_helper_sve_ldffbdu_zd,
- gen_helper_sve_ldffhdu_zd,
- gen_helper_sve_ldffsdu_zd,
- gen_helper_sve_ldffddu_zd, } } } },
+ gen_helper_sve_ldffhdu_be_zd,
+ gen_helper_sve_ldffsdu_be_zd,
+ gen_helper_sve_ldffdd_be_zd, } } } },
};
static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (18 preceding siblings ...)
2018-08-09 4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
@ 2018-08-09 4:22 ` Richard Henderson
2018-08-23 16:23 ` Peter Maydell
2018-08-09 5:48 ` [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Laurent Desnogues
` (2 subsequent siblings)
22 siblings, 1 reply; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 4:22 UTC (permalink / raw)
To: qemu-devel; +Cc: laurent.desnogues, peter.maydell, alex.bennee
There is quite a lot of code required to compute cpu_mem_index,
or even put together the full TCGMemOpIdx. This can easily be
done at translation time.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/internals.h | 5 ++
target/arm/sve_helper.c | 138 +++++++++++++++++++------------------
target/arm/translate-sve.c | 67 +++++++++++-------
3 files changed, 121 insertions(+), 89 deletions(-)
diff --git a/target/arm/internals.h b/target/arm/internals.h
index dc9357766c..24c0444c8d 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -796,4 +796,9 @@ static inline uint32_t arm_debug_exception_fsr(CPUARMState *env)
}
}
+/* Note make_memop_idx reserves 4 bits for mmu_idx, and MO_BSWAP is bit 3.
+ * Thus a TCGMemOpIdx, without any MO_ALIGN bits, fits in 8 bits.
+ */
+#define MEMOPIDX_SHIFT 8
+
#endif
diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 6728862326..5bae600d17 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -19,6 +19,7 @@
#include "qemu/osdep.h"
#include "cpu.h"
+#include "internals.h"
#include "exec/exec-all.h"
#include "exec/cpu_ldst.h"
#include "exec/helper-proto.h"
@@ -3986,7 +3987,7 @@ typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
* The controlling predicate is known to be true.
*/
typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
- target_ulong vaddr, int mmu_idx, uintptr_t ra);
+ target_ulong vaddr, TCGMemOpIdx oi, uintptr_t ra);
typedef sve_ld1_tlb_fn sve_st1_tlb_fn;
/*
@@ -4013,16 +4014,15 @@ static intptr_t sve_##NAME##_host(void *vd, void *vg, void *host, \
#ifdef CONFIG_SOFTMMU
#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
- target_ulong addr, int mmu_idx, uintptr_t ra) \
+ target_ulong addr, TCGMemOpIdx oi, uintptr_t ra) \
{ \
- TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
TYPEM val = TLB(env, addr, oi, ra); \
*(TYPEE *)(vd + H(reg_off)) = val; \
}
#else
#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
- target_ulong addr, int mmu_idx, uintptr_t ra) \
+ target_ulong addr, TCGMemOpIdx oi, uintptr_t ra) \
{ \
TYPEM val = HOST(g2h(addr)); \
*(TYPEE *)(vd + H(reg_off)) = val; \
@@ -4287,11 +4287,13 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
sve_ld1_host_fn *host_fn,
sve_ld1_tlb_fn *tlb_fn)
{
- void *vd = &env->vfp.zregs[simd_data(desc)];
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int mmu_idx = get_mmuidx(oi);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+ void *vd = &env->vfp.zregs[rd];
const int diffsz = esz - msz;
const intptr_t reg_max = simd_oprsz(desc);
const intptr_t mem_max = reg_max >> diffsz;
- const int mmu_idx = cpu_mmu_index(env, false);
ARMVectorReg scratch;
void *host, *result;
intptr_t split;
@@ -4345,7 +4347,7 @@ static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
* on I/O memory, it may succeed but not bring in the TLB entry.
* But even then we have still made forward progress.
*/
- tlb_fn(env, result, reg_off, addr + mem_off, mmu_idx, retaddr);
+ tlb_fn(env, result, reg_off, addr + mem_off, oi, retaddr);
reg_off += 1 << esz;
}
#endif
@@ -4406,9 +4408,9 @@ static void sve_ld2_r(CPUARMState *env, void *vg, target_ulong addr,
uint32_t desc, int size, uintptr_t ra,
sve_ld1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned rd = simd_data(desc);
ARMVectorReg scratch[2] = { };
set_helper_retaddr(ra);
@@ -4416,8 +4418,8 @@ static void sve_ld2_r(CPUARMState *env, void *vg, target_ulong addr,
uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
do {
if (pg & 1) {
- tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
- tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
+ tlb_fn(env, &scratch[0], i, addr, oi, ra);
+ tlb_fn(env, &scratch[1], i, addr + size, oi, ra);
}
i += size, pg >>= size;
addr += 2 * size;
@@ -4434,9 +4436,9 @@ static void sve_ld3_r(CPUARMState *env, void *vg, target_ulong addr,
uint32_t desc, int size, uintptr_t ra,
sve_ld1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned rd = simd_data(desc);
ARMVectorReg scratch[3] = { };
set_helper_retaddr(ra);
@@ -4444,9 +4446,9 @@ static void sve_ld3_r(CPUARMState *env, void *vg, target_ulong addr,
uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
do {
if (pg & 1) {
- tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
- tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
- tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
+ tlb_fn(env, &scratch[0], i, addr, oi, ra);
+ tlb_fn(env, &scratch[1], i, addr + size, oi, ra);
+ tlb_fn(env, &scratch[2], i, addr + 2 * size, oi, ra);
}
i += size, pg >>= size;
addr += 3 * size;
@@ -4464,9 +4466,9 @@ static void sve_ld4_r(CPUARMState *env, void *vg, target_ulong addr,
uint32_t desc, int size, uintptr_t ra,
sve_ld1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned rd = simd_data(desc);
ARMVectorReg scratch[4] = { };
set_helper_retaddr(ra);
@@ -4474,10 +4476,10 @@ static void sve_ld4_r(CPUARMState *env, void *vg, target_ulong addr,
uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
do {
if (pg & 1) {
- tlb_fn(env, &scratch[0], i, addr, mmu_idx, ra);
- tlb_fn(env, &scratch[1], i, addr + size, mmu_idx, ra);
- tlb_fn(env, &scratch[2], i, addr + 2 * size, mmu_idx, ra);
- tlb_fn(env, &scratch[3], i, addr + 3 * size, mmu_idx, ra);
+ tlb_fn(env, &scratch[0], i, addr, oi, ra);
+ tlb_fn(env, &scratch[1], i, addr + size, oi, ra);
+ tlb_fn(env, &scratch[2], i, addr + 2 * size, oi, ra);
+ tlb_fn(env, &scratch[3], i, addr + 3 * size, oi, ra);
}
i += size, pg >>= size;
addr += 4 * size;
@@ -4572,11 +4574,13 @@ static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
sve_ld1_host_fn *host_fn,
sve_ld1_tlb_fn *tlb_fn)
{
- void *vd = &env->vfp.zregs[simd_data(desc)];
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int mmu_idx = get_mmuidx(oi);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+ void *vd = &env->vfp.zregs[rd];
const int diffsz = esz - msz;
const intptr_t reg_max = simd_oprsz(desc);
const intptr_t mem_max = reg_max >> diffsz;
- const int mmu_idx = cpu_mmu_index(env, false);
intptr_t split, reg_off, mem_off;
void *host;
@@ -4620,7 +4624,7 @@ static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
/* Perform one normal read, which will fault or not.
* But it is likely to bring the page into the tlb.
*/
- tlb_fn(env, vd, reg_off, addr + mem_off, mmu_idx, retaddr);
+ tlb_fn(env, vd, reg_off, addr + mem_off, oi, retaddr);
/* After any fault, zero any leading predicated false elts. */
swap_memzero(vd, reg_off);
@@ -4649,7 +4653,8 @@ static void sve_ldnf1_r(CPUARMState *env, void *vg, const target_ulong addr,
uint32_t desc, const int esz, const int msz,
sve_ld1_host_fn *host_fn)
{
- void *vd = &env->vfp.zregs[simd_data(desc)];
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
+ void *vd = &env->vfp.zregs[rd];
const int diffsz = esz - msz;
const intptr_t reg_max = simd_oprsz(desc);
const intptr_t mem_max = reg_max >> diffsz;
@@ -4781,15 +4786,14 @@ DO_LDFF1_LDNF1_2(dd, 3, 3)
#ifdef CONFIG_SOFTMMU
#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
- target_ulong addr, int mmu_idx, uintptr_t ra) \
+ target_ulong addr, TCGMemOpIdx oi, uintptr_t ra) \
{ \
- TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
TLB(env, addr, *(TYPEM *)(vd + H(reg_off)), oi, ra); \
}
#else
#define DO_ST_TLB(NAME, H, TYPEM, HOST, MOEND, TLB) \
static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
- target_ulong addr, int mmu_idx, uintptr_t ra) \
+ target_ulong addr, TCGMemOpIdx oi, uintptr_t ra) \
{ \
HOST(g2h(addr), *(TYPEM *)(vd + H(reg_off))); \
}
@@ -4828,9 +4832,9 @@ static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr,
const int esize, const int msize,
sve_st1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned rd = simd_data(desc);
void *vd = &env->vfp.zregs[rd];
set_helper_retaddr(ra);
@@ -4838,7 +4842,7 @@ static void sve_st1_r(CPUARMState *env, void *vg, target_ulong addr,
uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
do {
if (pg & 1) {
- tlb_fn(env, vd, i, addr, mmu_idx, ra);
+ tlb_fn(env, vd, i, addr, oi, ra);
}
i += esize, pg >>= esize;
addr += msize;
@@ -4852,9 +4856,9 @@ static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr,
const int esize, const int msize,
sve_st1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned rd = simd_data(desc);
void *d1 = &env->vfp.zregs[rd];
void *d2 = &env->vfp.zregs[(rd + 1) & 31];
@@ -4863,8 +4867,8 @@ static void sve_st2_r(CPUARMState *env, void *vg, target_ulong addr,
uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
do {
if (pg & 1) {
- tlb_fn(env, d1, i, addr, mmu_idx, ra);
- tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
+ tlb_fn(env, d1, i, addr, oi, ra);
+ tlb_fn(env, d2, i, addr + msize, oi, ra);
}
i += esize, pg >>= esize;
addr += 2 * msize;
@@ -4878,9 +4882,9 @@ static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr,
const int esize, const int msize,
sve_st1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned rd = simd_data(desc);
void *d1 = &env->vfp.zregs[rd];
void *d2 = &env->vfp.zregs[(rd + 1) & 31];
void *d3 = &env->vfp.zregs[(rd + 2) & 31];
@@ -4890,9 +4894,9 @@ static void sve_st3_r(CPUARMState *env, void *vg, target_ulong addr,
uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
do {
if (pg & 1) {
- tlb_fn(env, d1, i, addr, mmu_idx, ra);
- tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
- tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
+ tlb_fn(env, d1, i, addr, oi, ra);
+ tlb_fn(env, d2, i, addr + msize, oi, ra);
+ tlb_fn(env, d3, i, addr + 2 * msize, oi, ra);
}
i += esize, pg >>= esize;
addr += 3 * msize;
@@ -4906,9 +4910,9 @@ static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr,
const int esize, const int msize,
sve_st1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const unsigned rd = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 5);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned rd = simd_data(desc);
void *d1 = &env->vfp.zregs[rd];
void *d2 = &env->vfp.zregs[(rd + 1) & 31];
void *d3 = &env->vfp.zregs[(rd + 2) & 31];
@@ -4919,10 +4923,10 @@ static void sve_st4_r(CPUARMState *env, void *vg, target_ulong addr,
uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
do {
if (pg & 1) {
- tlb_fn(env, d1, i, addr, mmu_idx, ra);
- tlb_fn(env, d2, i, addr + msize, mmu_idx, ra);
- tlb_fn(env, d3, i, addr + 2 * msize, mmu_idx, ra);
- tlb_fn(env, d4, i, addr + 3 * msize, mmu_idx, ra);
+ tlb_fn(env, d1, i, addr, oi, ra);
+ tlb_fn(env, d2, i, addr + msize, oi, ra);
+ tlb_fn(env, d3, i, addr + 2 * msize, oi, ra);
+ tlb_fn(env, d4, i, addr + 3 * msize, oi, ra);
}
i += esize, pg >>= esize;
addr += 4 * msize;
@@ -5015,9 +5019,9 @@ static void sve_ld1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
target_ulong base, uint32_t desc, uintptr_t ra,
zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned scale = simd_data(desc);
ARMVectorReg scratch = { };
set_helper_retaddr(ra);
@@ -5026,7 +5030,7 @@ static void sve_ld1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
do {
if (pg & 1) {
target_ulong off = off_fn(vm, i);
- tlb_fn(env, &scratch, i, base + (off << scale), mmu_idx, ra);
+ tlb_fn(env, &scratch, i, base + (off << scale), oi, ra);
}
i += 4, pg >>= 4;
} while (i & 15);
@@ -5041,9 +5045,9 @@ static void sve_ld1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
target_ulong base, uint32_t desc, uintptr_t ra,
zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
intptr_t i, oprsz = simd_oprsz(desc) / 8;
- unsigned scale = simd_data(desc);
ARMVectorReg scratch = { };
set_helper_retaddr(ra);
@@ -5051,7 +5055,7 @@ static void sve_ld1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
uint8_t pg = *(uint8_t *)(vg + H1(i));
if (pg & 1) {
target_ulong off = off_fn(vm, i * 8);
- tlb_fn(env, &scratch, i * 8, base + (off << scale), mmu_idx, ra);
+ tlb_fn(env, &scratch, i * 8, base + (off << scale), oi, ra);
}
}
set_helper_retaddr(0);
@@ -5157,7 +5161,7 @@ typedef bool sve_ld1_nf_fn(CPUARMState *env, void *vd, intptr_t reg_off,
#ifdef CONFIG_SOFTMMU
#define DO_LD_NF(NAME, H, TYPEE, TYPEM, HOST) \
static bool sve_ld##NAME##_nf(CPUARMState *env, void *vd, intptr_t reg_off, \
- target_ulong addr, int mmu_idx) \
+ target_ulong addr, int mmu_idx) \
{ \
target_ulong next_page = -(addr | TARGET_PAGE_MASK); \
if (likely(next_page - addr >= sizeof(TYPEM))) { \
@@ -5216,9 +5220,10 @@ static inline void sve_ldff1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
sve_ld1_nf_fn *nonfault_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int mmu_idx = get_mmuidx(oi);
+ const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
intptr_t reg_off, reg_max = simd_oprsz(desc);
- unsigned scale = simd_data(desc);
target_ulong addr;
/* Skip to the first true predicate. */
@@ -5228,7 +5233,7 @@ static inline void sve_ldff1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
set_helper_retaddr(ra);
addr = off_fn(vm, reg_off);
addr = base + (addr << scale);
- tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+ tlb_fn(env, vd, reg_off, addr, oi, ra);
/* The rest of the reads will be non-faulting. */
set_helper_retaddr(0);
@@ -5257,9 +5262,10 @@ static inline void sve_ldff1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
sve_ld1_nf_fn *nonfault_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int mmu_idx = get_mmuidx(oi);
+ const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
intptr_t reg_off, reg_max = simd_oprsz(desc);
- unsigned scale = simd_data(desc);
target_ulong addr;
/* Skip to the first true predicate. */
@@ -5269,7 +5275,7 @@ static inline void sve_ldff1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
set_helper_retaddr(ra);
addr = off_fn(vm, reg_off);
addr = base + (addr << scale);
- tlb_fn(env, vd, reg_off, addr, mmu_idx, ra);
+ tlb_fn(env, vd, reg_off, addr, oi, ra);
/* The rest of the reads will be non-faulting. */
set_helper_retaddr(0);
@@ -5381,9 +5387,9 @@ static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
target_ulong base, uint32_t desc, uintptr_t ra,
zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
intptr_t i, oprsz = simd_oprsz(desc);
- unsigned scale = simd_data(desc);
set_helper_retaddr(ra);
for (i = 0; i < oprsz; ) {
@@ -5391,7 +5397,7 @@ static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
do {
if (pg & 1) {
target_ulong off = off_fn(vm, i);
- tlb_fn(env, vd, i, base + (off << scale), mmu_idx, ra);
+ tlb_fn(env, vd, i, base + (off << scale), oi, ra);
}
i += 4, pg >>= 4;
} while (i & 15);
@@ -5403,16 +5409,16 @@ static void sve_st1_zd(CPUARMState *env, void *vd, void *vg, void *vm,
target_ulong base, uint32_t desc, uintptr_t ra,
zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
{
- const int mmu_idx = cpu_mmu_index(env, false);
+ const TCGMemOpIdx oi = extract32(desc, SIMD_DATA_SHIFT, MEMOPIDX_SHIFT);
+ const int scale = extract32(desc, SIMD_DATA_SHIFT + MEMOPIDX_SHIFT, 2);
intptr_t i, oprsz = simd_oprsz(desc) / 8;
- unsigned scale = simd_data(desc);
set_helper_retaddr(ra);
for (i = 0; i < oprsz; i++) {
uint8_t pg = *(uint8_t *)(vg + H1(i));
if (pg & 1) {
target_ulong off = off_fn(vm, i * 8);
- tlb_fn(env, vd, i * 8, base + (off << scale), mmu_idx, ra);
+ tlb_fn(env, vd, i * 8, base + (off << scale), oi, ra);
}
}
set_helper_retaddr(0);
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index 20492e9b8b..05ba0518c8 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -4600,25 +4600,34 @@ static const uint8_t dtype_esz[16] = {
3, 2, 1, 3
};
+static TCGMemOpIdx sve_memopidx(DisasContext *s, int dtype)
+{
+ return make_memop_idx(s->be_data | dtype_mop[dtype], get_mem_index(s));
+}
+
static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
- gen_helper_gvec_mem *fn)
+ int dtype, gen_helper_gvec_mem *fn)
{
unsigned vsz = vec_full_reg_size(s);
TCGv_ptr t_pg;
- TCGv_i32 desc;
+ TCGv_i32 t_desc;
+ int desc;
/* For e.g. LD4, there are not enough arguments to pass all 4
* registers as pointers, so encode the regno into the data field.
* For consistency, do this even for LD1.
*/
- desc = tcg_const_i32(simd_desc(vsz, vsz, zt));
+ desc = sve_memopidx(s, dtype);
+ desc |= zt << MEMOPIDX_SHIFT;
+ desc = simd_desc(vsz, vsz, desc);
+ t_desc = tcg_const_i32(desc);
t_pg = tcg_temp_new_ptr();
tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
- fn(cpu_env, t_pg, addr, desc);
+ fn(cpu_env, t_pg, addr, t_desc);
tcg_temp_free_ptr(t_pg);
- tcg_temp_free_i32(desc);
+ tcg_temp_free_i32(t_desc);
}
static void do_ld_zpa(DisasContext *s, int zt, int pg,
@@ -4681,7 +4690,7 @@ static void do_ld_zpa(DisasContext *s, int zt, int pg,
* accessible via the instruction encoding.
*/
assert(fn != NULL);
- do_mem_zpa(s, zt, pg, addr, fn);
+ do_mem_zpa(s, zt, pg, addr, dtype, fn);
}
static bool trans_LD_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
@@ -4763,7 +4772,8 @@ static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
TCGv_i64 addr = new_tmp_a64(s);
tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
- do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
+ do_mem_zpa(s, a->rd, a->pg, addr, a->dtype,
+ fns[s->be_data == MO_BE][a->dtype]);
}
return true;
}
@@ -4821,7 +4831,8 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
TCGv_i64 addr = new_tmp_a64(s);
tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), off);
- do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
+ do_mem_zpa(s, a->rd, a->pg, addr, a->dtype,
+ fns[s->be_data == MO_BE][a->dtype]);
}
return true;
}
@@ -4836,11 +4847,14 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
};
unsigned vsz = vec_full_reg_size(s);
TCGv_ptr t_pg;
- TCGv_i32 desc;
- int poff;
+ TCGv_i32 t_desc;
+ int desc, poff;
/* Load the first quadword using the normal predicated load helpers. */
- desc = tcg_const_i32(simd_desc(16, 16, zt));
+ desc = sve_memopidx(s, msz_dtype(msz));
+ desc |= zt << MEMOPIDX_SHIFT;
+ desc = simd_desc(16, 16, desc);
+ t_desc = tcg_const_i32(desc);
poff = pred_full_reg_offset(s, pg);
if (vsz > 16) {
@@ -4864,10 +4878,10 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
t_pg = tcg_temp_new_ptr();
tcg_gen_addi_ptr(t_pg, cpu_env, poff);
- fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, desc);
+ fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, t_desc);
tcg_temp_free_ptr(t_pg);
- tcg_temp_free_i32(desc);
+ tcg_temp_free_i32(t_desc);
/* Replicate that first quadword. */
if (vsz > 16) {
@@ -5019,7 +5033,7 @@ static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
fn = fn_multiple[be][nreg - 1][msz];
}
assert(fn != NULL);
- do_mem_zpa(s, zt, pg, addr, fn);
+ do_mem_zpa(s, zt, pg, addr, msz_dtype(msz), fn);
}
static bool trans_ST_zprr(DisasContext *s, arg_rprr_store *a, uint32_t insn)
@@ -5057,24 +5071,31 @@ static bool trans_ST_zpri(DisasContext *s, arg_rpri_store *a, uint32_t insn)
*** SVE gather loads / scatter stores
*/
-static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm, int scale,
- TCGv_i64 scalar, gen_helper_gvec_mem_scatter *fn)
+static void do_mem_zpz(DisasContext *s, int zt, int pg, int zm,
+ int scale, TCGv_i64 scalar, int msz,
+ gen_helper_gvec_mem_scatter *fn)
{
unsigned vsz = vec_full_reg_size(s);
- TCGv_i32 desc = tcg_const_i32(simd_desc(vsz, vsz, scale));
TCGv_ptr t_zm = tcg_temp_new_ptr();
TCGv_ptr t_pg = tcg_temp_new_ptr();
TCGv_ptr t_zt = tcg_temp_new_ptr();
+ TCGv_i32 t_desc;
+ int desc;
+
+ desc = sve_memopidx(s, msz_dtype(msz));
+ desc |= scale << MEMOPIDX_SHIFT;
+ desc = simd_desc(vsz, vsz, desc);
+ t_desc = tcg_const_i32(desc);
tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
tcg_gen_addi_ptr(t_zm, cpu_env, vec_full_reg_offset(s, zm));
tcg_gen_addi_ptr(t_zt, cpu_env, vec_full_reg_offset(s, zt));
- fn(cpu_env, t_zt, t_pg, t_zm, scalar, desc);
+ fn(cpu_env, t_zt, t_pg, t_zm, scalar, t_desc);
tcg_temp_free_ptr(t_zt);
tcg_temp_free_ptr(t_zm);
tcg_temp_free_ptr(t_pg);
- tcg_temp_free_i32(desc);
+ tcg_temp_free_i32(t_desc);
}
/* Indexed by [be][ff][xs][u][msz]. */
@@ -5263,7 +5284,7 @@ static bool trans_LD1_zprz(DisasContext *s, arg_LD1_zprz *a, uint32_t insn)
assert(fn != NULL);
do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
- cpu_reg_sp(s, a->rn), fn);
+ cpu_reg_sp(s, a->rn), a->msz, fn);
return true;
}
@@ -5294,7 +5315,7 @@ static bool trans_LD1_zpiz(DisasContext *s, arg_LD1_zpiz *a, uint32_t insn)
* by loading the immediate into the scalar parameter.
*/
imm = tcg_const_i64(a->imm << a->msz);
- do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn);
+ do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, a->msz, fn);
tcg_temp_free_i64(imm);
return true;
}
@@ -5369,7 +5390,7 @@ static bool trans_ST1_zprz(DisasContext *s, arg_ST1_zprz *a, uint32_t insn)
g_assert_not_reached();
}
do_mem_zpz(s, a->rd, a->pg, a->rm, a->scale * a->msz,
- cpu_reg_sp(s, a->rn), fn);
+ cpu_reg_sp(s, a->rn), a->msz, fn);
return true;
}
@@ -5400,7 +5421,7 @@ static bool trans_ST1_zpiz(DisasContext *s, arg_ST1_zpiz *a, uint32_t insn)
* by loading the immediate into the scalar parameter.
*/
imm = tcg_const_i64(a->imm << a->msz);
- do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, fn);
+ do_mem_zpz(s, a->rd, a->pg, a->rn, 0, imm, a->msz, fn);
tcg_temp_free_i64(imm);
return true;
}
--
2.17.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (19 preceding siblings ...)
2018-08-09 4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
@ 2018-08-09 5:48 ` Laurent Desnogues
2018-08-18 9:15 ` no-reply
2018-08-18 10:01 ` no-reply
22 siblings, 0 replies; 51+ messages in thread
From: Laurent Desnogues @ 2018-08-09 5:48 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel@nongnu.org, Peter Maydell, Alex Bennée
Hello,
On Thu, Aug 9, 2018 at 6:21 AM, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This is my current set of patches for running SVE in system mode.
>
> The first half deal with the system registers that affect SVE.
> I recall that Peter has said he'd like the first patch to be
> done a different way, but we haven't had a chance to talk about
> what form it should take. I've left it as-is since it does what
> I need for now.
>
> The second half re-implement the SVE memory operations.
> The FF and NF loads had been stubbed out. Getting those to work
> requires some infrastructure that can be reused to speed up normal
> loads -- one guest-to-host tlb lookup can be reused for the rest
> of the page.
I did not review every patch individually but tested the whole and
found no issue.
Tested-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Thanks,
Laurent
>
> r~
>
>
> Based-on: <20180809034033.10579-1-richard.henderson@linaro.org>
> Richard Henderson (20):
> target/arm: Set ISAR bits for -cpu max
> target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
> target/arm: Define ID_AA64ZFR0_EL1
> target/arm: Adjust sve_exception_el
> target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
> target/arm: Fix arm_current_el for user-only
> target/arm: Fix is_a64 for user-only
> target/arm: Pass in current_el to fp and sve_exception_el
> target/arm: Handle SVE vector length changes in system mode
> target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
> target/arm: Clear unused predicate bits for LD1RQ
> target/arm: Rewrite helper_sve_ld1*_r using pages
> target/arm: Rewrite helper_sve_ld[234]*_r
> target/arm: Rewrite helper_sve_st[1234]*_r
> target/arm: Split contiguous loads for endianness
> target/arm: Split contiguous stores for endianness
> target/arm: Rewrite vector gather loads
> target/arm: Rewrite vector gather stores
> target/arm: Rewrite vector gather first-fault loads
> target/arm: Pass TCGMemOpIdx to sve memory helpers
>
> target/arm/cpu.h | 47 +-
> target/arm/helper-sve.h | 385 +++++--
> target/arm/internals.h | 5 +
> target/arm/cpu.c | 24 +-
> target/arm/cpu64.c | 93 +-
> target/arm/helper.c | 237 +++--
> target/arm/op_helper.c | 1 +
> target/arm/sve_helper.c | 2062 +++++++++++++++++++++++++-----------
> target/arm/translate-a64.c | 8 +-
> target/arm/translate-sve.c | 670 ++++++++----
> 10 files changed, 2453 insertions(+), 1079 deletions(-)
>
> --
> 2.17.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el
2018-08-09 4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
@ 2018-08-09 18:01 ` Alex Bennée
2018-08-09 18:50 ` Richard Henderson
0 siblings, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2018-08-09 18:01 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, laurent.desnogues, peter.maydell
Richard Henderson <richard.henderson@linaro.org> writes:
> We are going to want to determine whether sve is enabled
> for EL than current.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Are these patches meant to apply to origin/master or on top of the
user-mode fixes? This didn't apply for me:
> @@ -12385,11 +12382,12 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
> target_ulong *cs_base, uint32_t *pflags)
> {
> ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
> - int fp_el = fp_exception_el(env);
> + int current_el = arm_current_el(env);
> + int fp_el = fp_exception_el(env, current_el);
> uint32_t flags;
>
> if (is_a64(env)) {
> - int sve_el = sve_exception_el(env);
> + int sve_el = sve_exception_el(env, current_el);
> uint32_t zcr_len;
>
> *pc = env->pc;
> @@ -12404,7 +12402,6 @@ void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
> if (sve_el != 0 && fp_el == 0) {
> zcr_len = 0;
> } else {
> - int current_el = arm_current_el(env);
> ARMCPU *cpu = arm_env_get_cpu(env);
>
> zcr_len = cpu->sve_max_vq - 1;
++<<<<<<< HEAD
+ int current_el = arm_current_el(env);
++=======
+ ARMCPU *cpu = arm_env_get_cpu(env);
--
Alex Bennée
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el
2018-08-09 18:01 ` Alex Bennée
@ 2018-08-09 18:50 ` Richard Henderson
0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-09 18:50 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, laurent.desnogues, peter.maydell
On 08/09/2018 11:01 AM, Alex Bennée wrote:
>
> Richard Henderson <richard.henderson@linaro.org> writes:
>
>> We are going to want to determine whether sve is enabled
>> for EL than current.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
> Are these patches meant to apply to origin/master or on top of the
> user-mode fixes? This didn't apply for me:
On top of the user-mode fixes.
Based-on: <20180809034033.10579-1-richard.henderson@linaro.org>
And of course you can see it all on my sve-3.1 branch.
r~
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
2018-08-09 4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
@ 2018-08-10 9:13 ` Alex Bennée
2018-08-10 19:15 ` Richard Henderson
2018-08-23 16:01 ` Peter Maydell
1 sibling, 1 reply; 51+ messages in thread
From: Alex Bennée @ 2018-08-10 9:13 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, laurent.desnogues, peter.maydell
Richard Henderson <richard.henderson@linaro.org> writes:
> Uses tlb_vaddr_to_host for correct operation with softmmu.
> Optimize for accesses within a single page or pair of pages.
>
> Perf report comparison for cortex-strings test-strlen
> with aarch64-linux-user:
>
<snip>
> +/*
> + * Common helper for all contiguous one-register predicated loads.
> + */
> +static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
> + uint32_t desc, const uintptr_t retaddr,
> + const int esz, const int msz,
> + sve_ld1_host_fn *host_fn,
> + sve_ld1_tlb_fn *tlb_fn)
> +{
> + void *vd = &env->vfp.zregs[simd_data(desc)];
> + const int diffsz = esz - msz;
> + const intptr_t reg_max = simd_oprsz(desc);
> + const intptr_t mem_max = reg_max >> diffsz;
> + const int mmu_idx = cpu_mmu_index(env, false);
> + ARMVectorReg scratch;
> + void *host, *result;
> + intptr_t split;
> +
> + set_helper_retaddr(retaddr);
> +
> + host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> + if (test_host_page(host)) {
> + split = max_for_page(addr, 0, mem_max);
> + if (likely(split == mem_max)) {
> + /* The load is entirely within a valid page. For softmmu,
> + * no faults. For user-only, if the first byte does not
> + * fault then none of them will fault, so Vd will never be
> + * partially modified.
> + */
> + host_fn(vd, vg, host, 0, mem_max);
> + set_helper_retaddr(0);
> + return;
> + }
> + }
> +
> + /* Perform the predicated read into a temporary, thus ensuring
> + * if the load of the last element faults, Vd is not modified.
> + */
> + result = &scratch;
> +#ifdef CONFIG_USER_ONLY
> + host_fn(vd, vg, host, 0, mem_max);
> +#else
> + memset(result, 0, reg_max);
> + for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
Hmm this blew up CI complaining about c99-isms, but QEMU is supposed to
be c99 compliant.
https://travis-ci.org/stsquad/qemu/builds/414248994
--
Alex Bennée
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
2018-08-10 9:13 ` Alex Bennée
@ 2018-08-10 19:15 ` Richard Henderson
0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-10 19:15 UTC (permalink / raw)
To: Alex Bennée; +Cc: qemu-devel, laurent.desnogues, peter.maydell
On 08/10/2018 02:13 AM, Alex Bennée wrote:
>> + for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
>
> Hmm this blew up CI complaining about c99-isms, but QEMU is supposed to
> be c99 compliant.
>
> https://travis-ci.org/stsquad/qemu/builds/414248994
Bah. That's what I get for doing two things at once on
different projects with different standards.
On the other hand, would anyone seriously object to me
adding -std=gnu99 to the compile flags? C99's 20th
birthday is coming up next year...
r~
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness
2018-08-09 4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
@ 2018-08-11 5:40 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-08-11 5:40 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Cc: laurent.desnogues, peter.maydell, alex.bennee
On 08/09/2018 01:22 AM, Richard Henderson wrote:
> We can choose the endianness at translation time, rather than
> re-computing it at execution time.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
> target/arm/helper-sve.h | 117 +++++++++++++++-------
> target/arm/sve_helper.c | 70 ++++++-------
> target/arm/translate-sve.c | 196 +++++++++++++++++++++++++------------
> 3 files changed, 252 insertions(+), 131 deletions(-)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 023952a9a4..526caec8da 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -1128,20 +1128,35 @@ DEF_HELPER_FLAGS_4(sve_ld2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ld3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ld4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ld1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ld1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ld1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> DEF_HELPER_FLAGS_4(sve_ld1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ld1bsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> @@ -1150,13 +1165,21 @@ DEF_HELPER_FLAGS_4(sve_ld1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ld1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ld1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ld1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ld1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ld1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ld1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ld1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> DEF_HELPER_FLAGS_4(sve_ldff1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ldff1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> @@ -1166,17 +1189,28 @@ DEF_HELPER_FLAGS_4(sve_ldff1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ldff1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ldff1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ldff1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ldff1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldff1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ldff1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldff1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldff1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldff1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> DEF_HELPER_FLAGS_4(sve_ldnf1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ldnf1bhu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> @@ -1186,17 +1220,28 @@ DEF_HELPER_FLAGS_4(sve_ldnf1bhs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ldnf1bss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_ldnf1bds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ldnf1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hsu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1hds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hsu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ldnf1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1sdu_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_ldnf1sds_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hsu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1hds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_ldnf1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sdu_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sds_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldnf1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sdu_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1sds_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_ldnf1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_ldnf1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> DEF_HELPER_FLAGS_4(sve_st1bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 4eae6569cc..56e2f523c5 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -4362,18 +4362,18 @@ void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
> sve_##NAME##_host, sve_##NAME##_tlb); \
> }
>
> -/* TODO: Propagate the endian check back to the translator. */
> #define DO_LD1_2(NAME, ESZ, MSZ) \
> -void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> -{ \
> - if (arm_cpu_data_is_big_endian(env)) { \
> - sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> - sve_##NAME##_be_host, sve_##NAME##_be_tlb); \
> - } else { \
> - sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> - sve_##NAME##_le_host, sve_##NAME##_le_tlb); \
> - } \
> +void HELPER(sve_##NAME##_le_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \
> +} \
> +void HELPER(sve_##NAME##_be_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \
> }
>
> DO_LD1_1(ld1bb, 0)
> @@ -4500,12 +4500,17 @@ void __attribute__((flatten)) HELPER(sve_ld##N##bb_r) \
> }
>
> #define DO_LDN_2(N, SUFF, SIZE) \
> -void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_r) \
> +void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_le_r) \
> (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
> { \
> sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(), \
> - arm_cpu_data_is_big_endian(env) \
> - ? sve_ld1##SUFF##_be_tlb : sve_ld1##SUFF##_le_tlb); \
> + sve_ld1##SUFF##_le_tlb); \
> +} \
> +void __attribute__((flatten)) HELPER(sve_ld##N##SUFF##_be_r) \
> + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ld##N##_r(env, vg, addr, desc, SIZE, GETPC(), \
> + sve_ld1##SUFF##_be_tlb); \
> }
>
> DO_LDN_1(2)
> @@ -4722,29 +4727,28 @@ void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
> sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host); \
> }
>
> -/* TODO: Propagate the endian check back to the translator. */
> #define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
> -void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> +void HELPER(sve_ldff1##PART##_le_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> { \
> - if (arm_cpu_data_is_big_endian(env)) { \
> - sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> - sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb); \
> - } else { \
> - sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> - sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb); \
> - } \
> + sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb); \
> } \
> -void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> +void HELPER(sve_ldnf1##PART##_le_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> { \
> - if (arm_cpu_data_is_big_endian(env)) { \
> - sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
> - sve_ld1##PART##_be_host); \
> - } else { \
> - sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
> - sve_ld1##PART##_le_host); \
> - } \
> + sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_le_host); \
> +} \
> +void HELPER(sve_ldff1##PART##_be_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb); \
> +} \
> +void HELPER(sve_ldnf1##PART##_be_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, sve_ld1##PART##_be_host); \
> }
>
> DO_LDFF1_LDNF1_1(bb, 0)
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index bef6b8242d..de12c01e7d 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4624,32 +4624,58 @@ static void do_mem_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
> static void do_ld_zpa(DisasContext *s, int zt, int pg,
> TCGv_i64 addr, int dtype, int nreg)
> {
> - static gen_helper_gvec_mem * const fns[16][4] = {
> - { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
> - gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
> - { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
> + static gen_helper_gvec_mem * const fns[2][16][4] = {
> + /* Little-endian */
> + { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
> + gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
> + { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
>
> - { gen_helper_sve_ld1sds_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1hh_r, gen_helper_sve_ld2hh_r,
> - gen_helper_sve_ld3hh_r, gen_helper_sve_ld4hh_r },
> - { gen_helper_sve_ld1hsu_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1hdu_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1sds_le_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1hh_le_r, gen_helper_sve_ld2hh_le_r,
> + gen_helper_sve_ld3hh_le_r, gen_helper_sve_ld4hh_le_r },
> + { gen_helper_sve_ld1hsu_le_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1hdu_le_r, NULL, NULL, NULL },
>
> - { gen_helper_sve_ld1hds_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1hss_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1ss_r, gen_helper_sve_ld2ss_r,
> - gen_helper_sve_ld3ss_r, gen_helper_sve_ld4ss_r },
> - { gen_helper_sve_ld1sdu_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1hds_le_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1hss_le_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld2ss_le_r,
> + gen_helper_sve_ld3ss_le_r, gen_helper_sve_ld4ss_le_r },
> + { gen_helper_sve_ld1sdu_le_r, NULL, NULL, NULL },
>
> - { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
> - { gen_helper_sve_ld1dd_r, gen_helper_sve_ld2dd_r,
> - gen_helper_sve_ld3dd_r, gen_helper_sve_ld4dd_r },
> + { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1dd_le_r, gen_helper_sve_ld2dd_le_r,
> + gen_helper_sve_ld3dd_le_r, gen_helper_sve_ld4dd_le_r } },
> +
> + /* Big-endian */
> + { { gen_helper_sve_ld1bb_r, gen_helper_sve_ld2bb_r,
> + gen_helper_sve_ld3bb_r, gen_helper_sve_ld4bb_r },
> + { gen_helper_sve_ld1bhu_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bsu_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bdu_r, NULL, NULL, NULL },
> +
> + { gen_helper_sve_ld1sds_be_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1hh_be_r, gen_helper_sve_ld2hh_be_r,
> + gen_helper_sve_ld3hh_be_r, gen_helper_sve_ld4hh_be_r },
> + { gen_helper_sve_ld1hsu_be_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1hdu_be_r, NULL, NULL, NULL },
> +
> + { gen_helper_sve_ld1hds_be_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1hss_be_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld2ss_be_r,
> + gen_helper_sve_ld3ss_be_r, gen_helper_sve_ld4ss_be_r },
> + { gen_helper_sve_ld1sdu_be_r, NULL, NULL, NULL },
> +
> + { gen_helper_sve_ld1bds_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bss_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1bhs_r, NULL, NULL, NULL },
> + { gen_helper_sve_ld1dd_be_r, gen_helper_sve_ld2dd_be_r,
> + gen_helper_sve_ld3dd_be_r, gen_helper_sve_ld4dd_be_r } }
> };
> - gen_helper_gvec_mem *fn = fns[dtype][nreg];
> + gen_helper_gvec_mem *fn = fns[s->be_data == MO_BE][dtype][nreg];
>
> /* While there are holes in the table, they are not
> * accessible via the instruction encoding.
> @@ -4689,59 +4715,103 @@ static bool trans_LD_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
>
> static bool trans_LDFF1_zprr(DisasContext *s, arg_rprr_load *a, uint32_t insn)
> {
> - static gen_helper_gvec_mem * const fns[16] = {
> - gen_helper_sve_ldff1bb_r,
> - gen_helper_sve_ldff1bhu_r,
> - gen_helper_sve_ldff1bsu_r,
> - gen_helper_sve_ldff1bdu_r,
> + static gen_helper_gvec_mem * const fns[2][16] = {
> + /* Little-endian */
> + { gen_helper_sve_ldff1bb_r,
> + gen_helper_sve_ldff1bhu_r,
> + gen_helper_sve_ldff1bsu_r,
> + gen_helper_sve_ldff1bdu_r,
>
> - gen_helper_sve_ldff1sds_r,
> - gen_helper_sve_ldff1hh_r,
> - gen_helper_sve_ldff1hsu_r,
> - gen_helper_sve_ldff1hdu_r,
> + gen_helper_sve_ldff1sds_le_r,
> + gen_helper_sve_ldff1hh_le_r,
> + gen_helper_sve_ldff1hsu_le_r,
> + gen_helper_sve_ldff1hdu_le_r,
>
> - gen_helper_sve_ldff1hds_r,
> - gen_helper_sve_ldff1hss_r,
> - gen_helper_sve_ldff1ss_r,
> - gen_helper_sve_ldff1sdu_r,
> + gen_helper_sve_ldff1hds_le_r,
> + gen_helper_sve_ldff1hss_le_r,
> + gen_helper_sve_ldff1ss_le_r,
> + gen_helper_sve_ldff1sdu_le_r,
>
> - gen_helper_sve_ldff1bds_r,
> - gen_helper_sve_ldff1bss_r,
> - gen_helper_sve_ldff1bhs_r,
> - gen_helper_sve_ldff1dd_r,
> + gen_helper_sve_ldff1bds_r,
> + gen_helper_sve_ldff1bss_r,
> + gen_helper_sve_ldff1bhs_r,
> + gen_helper_sve_ldff1dd_le_r },
> +
> + /* Big-endian */
> + { gen_helper_sve_ldff1bb_r,
> + gen_helper_sve_ldff1bhu_r,
> + gen_helper_sve_ldff1bsu_r,
> + gen_helper_sve_ldff1bdu_r,
> +
> + gen_helper_sve_ldff1sds_be_r,
> + gen_helper_sve_ldff1hh_be_r,
> + gen_helper_sve_ldff1hsu_be_r,
> + gen_helper_sve_ldff1hdu_be_r,
> +
> + gen_helper_sve_ldff1hds_be_r,
> + gen_helper_sve_ldff1hss_be_r,
> + gen_helper_sve_ldff1ss_be_r,
> + gen_helper_sve_ldff1sdu_be_r,
> +
> + gen_helper_sve_ldff1bds_r,
> + gen_helper_sve_ldff1bss_r,
> + gen_helper_sve_ldff1bhs_r,
> + gen_helper_sve_ldff1dd_be_r },
> };
>
> if (sve_access_check(s)) {
> TCGv_i64 addr = new_tmp_a64(s);
> tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), dtype_msz(a->dtype));
> tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
> - do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
> + do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
> }
> return true;
> }
>
> static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
> {
> - static gen_helper_gvec_mem * const fns[16] = {
> - gen_helper_sve_ldnf1bb_r,
> - gen_helper_sve_ldnf1bhu_r,
> - gen_helper_sve_ldnf1bsu_r,
> - gen_helper_sve_ldnf1bdu_r,
> + static gen_helper_gvec_mem * const fns[2][16] = {
> + /* Little-endian */
> + { gen_helper_sve_ldnf1bb_r,
> + gen_helper_sve_ldnf1bhu_r,
> + gen_helper_sve_ldnf1bsu_r,
> + gen_helper_sve_ldnf1bdu_r,
>
> - gen_helper_sve_ldnf1sds_r,
> - gen_helper_sve_ldnf1hh_r,
> - gen_helper_sve_ldnf1hsu_r,
> - gen_helper_sve_ldnf1hdu_r,
> + gen_helper_sve_ldnf1sds_le_r,
> + gen_helper_sve_ldnf1hh_le_r,
> + gen_helper_sve_ldnf1hsu_le_r,
> + gen_helper_sve_ldnf1hdu_le_r,
>
> - gen_helper_sve_ldnf1hds_r,
> - gen_helper_sve_ldnf1hss_r,
> - gen_helper_sve_ldnf1ss_r,
> - gen_helper_sve_ldnf1sdu_r,
> + gen_helper_sve_ldnf1hds_le_r,
> + gen_helper_sve_ldnf1hss_le_r,
> + gen_helper_sve_ldnf1ss_le_r,
> + gen_helper_sve_ldnf1sdu_le_r,
>
> - gen_helper_sve_ldnf1bds_r,
> - gen_helper_sve_ldnf1bss_r,
> - gen_helper_sve_ldnf1bhs_r,
> - gen_helper_sve_ldnf1dd_r,
> + gen_helper_sve_ldnf1bds_r,
> + gen_helper_sve_ldnf1bss_r,
> + gen_helper_sve_ldnf1bhs_r,
> + gen_helper_sve_ldnf1dd_le_r },
> +
> + /* Big-endian */
> + { gen_helper_sve_ldnf1bb_r,
> + gen_helper_sve_ldnf1bhu_r,
> + gen_helper_sve_ldnf1bsu_r,
> + gen_helper_sve_ldnf1bdu_r,
> +
> + gen_helper_sve_ldnf1sds_be_r,
> + gen_helper_sve_ldnf1hh_be_r,
> + gen_helper_sve_ldnf1hsu_be_r,
> + gen_helper_sve_ldnf1hdu_be_r,
> +
> + gen_helper_sve_ldnf1hds_be_r,
> + gen_helper_sve_ldnf1hss_be_r,
> + gen_helper_sve_ldnf1ss_be_r,
> + gen_helper_sve_ldnf1sdu_be_r,
> +
> + gen_helper_sve_ldnf1bds_r,
> + gen_helper_sve_ldnf1bss_r,
> + gen_helper_sve_ldnf1bhs_r,
> + gen_helper_sve_ldnf1dd_be_r },
> };
>
> if (sve_access_check(s)) {
> @@ -4751,16 +4821,18 @@ static bool trans_LDNF1_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
> TCGv_i64 addr = new_tmp_a64(s);
>
> tcg_gen_addi_i64(addr, cpu_reg_sp(s, a->rn), off);
> - do_mem_zpa(s, a->rd, a->pg, addr, fns[a->dtype]);
> + do_mem_zpa(s, a->rd, a->pg, addr, fns[s->be_data == MO_BE][a->dtype]);
> }
> return true;
> }
>
> static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
> {
> - static gen_helper_gvec_mem * const fns[4] = {
> - gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_r,
> - gen_helper_sve_ld1ss_r, gen_helper_sve_ld1dd_r,
> + static gen_helper_gvec_mem * const fns[2][4] = {
> + { gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_le_r,
> + gen_helper_sve_ld1ss_le_r, gen_helper_sve_ld1dd_le_r },
> + { gen_helper_sve_ld1bb_r, gen_helper_sve_ld1hh_be_r,
> + gen_helper_sve_ld1ss_be_r, gen_helper_sve_ld1dd_be_r },
> };
> unsigned vsz = vec_full_reg_size(s);
> TCGv_ptr t_pg;
> @@ -4792,7 +4864,7 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
> t_pg = tcg_temp_new_ptr();
> tcg_gen_addi_ptr(t_pg, cpu_env, poff);
>
> - fns[msz](cpu_env, t_pg, addr, desc);
> + fns[s->be_data == MO_BE][msz](cpu_env, t_pg, addr, desc);
>
> tcg_temp_free_ptr(t_pg);
> tcg_temp_free_i32(desc);
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores for endianness
2018-08-09 4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
@ 2018-08-11 5:41 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 51+ messages in thread
From: Philippe Mathieu-Daudé @ 2018-08-11 5:41 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
Cc: laurent.desnogues, peter.maydell, alex.bennee
On 08/09/2018 01:22 AM, Richard Henderson wrote:
> We can choose the endianness at translation time, rather than
> re-computing it at execution time.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
> target/arm/helper-sve.h | 48 +++++++++++++++++--------
> target/arm/sve_helper.c | 11 ++++--
> target/arm/translate-sve.c | 72 +++++++++++++++++++++++++++++---------
> 3 files changed, 96 insertions(+), 35 deletions(-)
>
> diff --git a/target/arm/helper-sve.h b/target/arm/helper-sve.h
> index 526caec8da..1ad043101a 100644
> --- a/target/arm/helper-sve.h
> +++ b/target/arm/helper-sve.h
> @@ -1248,29 +1248,47 @@ DEF_HELPER_FLAGS_4(sve_st2bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_st3bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_st4bb_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_st1hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st2hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st3hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st4hh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4hh_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_st1ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st2ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st3ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st4ss_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4hh_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_st1dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st2dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st3dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st4dd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4ss_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_st1ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4ss_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_st1dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4dd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +
> +DEF_HELPER_FLAGS_4(sve_st1dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st2dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st3dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st4dd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> DEF_HELPER_FLAGS_4(sve_st1bh_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_st1bs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> DEF_HELPER_FLAGS_4(sve_st1bd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_st1hs_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> -DEF_HELPER_FLAGS_4(sve_st1hd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hs_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hs_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1hd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> -DEF_HELPER_FLAGS_4(sve_st1sd_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1sd_le_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
> +DEF_HELPER_FLAGS_4(sve_st1sd_be_r, TCG_CALL_NO_WG, void, env, ptr, tl, i32)
>
> DEF_HELPER_FLAGS_6(sve_ldbsu_zsu, TCG_CALL_NO_WG,
> void, env, ptr, ptr, ptr, tl, i32)
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 56e2f523c5..92c0e961a9 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -4940,12 +4940,17 @@ void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \
> }
>
> #define DO_STN_2(N, NAME, ESIZE, MSIZE) \
> -void __attribute__((flatten)) HELPER(sve_st##N##NAME##_r) \
> +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_le_r) \
> (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
> { \
> sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE, \
> - arm_cpu_data_is_big_endian(env) \
> - ? sve_st1##NAME##_be_tlb : sve_st1##NAME##_le_tlb); \
> + sve_st1##NAME##_le_tlb); \
> +} \
> +void __attribute__((flatten)) HELPER(sve_st##N##NAME##_be_r) \
> + (CPUARMState *env, void *vg, target_ulong addr, uint32_t desc) \
> +{ \
> + sve_st##N##_r(env, vg, addr, desc, GETPC(), ESIZE, MSIZE, \
> + sve_st1##NAME##_be_tlb); \
> }
>
> DO_STN_1(1, bb, 1)
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index de12c01e7d..acb85731f8 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4953,32 +4953,70 @@ static bool trans_LD1R_zpri(DisasContext *s, arg_rpri_load *a, uint32_t insn)
> static void do_st_zpa(DisasContext *s, int zt, int pg, TCGv_i64 addr,
> int msz, int esz, int nreg)
> {
> - static gen_helper_gvec_mem * const fn_single[4][4] = {
> - { gen_helper_sve_st1bb_r, gen_helper_sve_st1bh_r,
> - gen_helper_sve_st1bs_r, gen_helper_sve_st1bd_r },
> - { NULL, gen_helper_sve_st1hh_r,
> - gen_helper_sve_st1hs_r, gen_helper_sve_st1hd_r },
> - { NULL, NULL,
> - gen_helper_sve_st1ss_r, gen_helper_sve_st1sd_r },
> - { NULL, NULL, NULL, gen_helper_sve_st1dd_r },
> + static gen_helper_gvec_mem * const fn_single[2][4][4] = {
> + { { gen_helper_sve_st1bb_r,
> + gen_helper_sve_st1bh_r,
> + gen_helper_sve_st1bs_r,
> + gen_helper_sve_st1bd_r },
> + { NULL,
> + gen_helper_sve_st1hh_le_r,
> + gen_helper_sve_st1hs_le_r,
> + gen_helper_sve_st1hd_le_r },
> + { NULL, NULL,
> + gen_helper_sve_st1ss_le_r,
> + gen_helper_sve_st1sd_le_r },
> + { NULL, NULL, NULL,
> + gen_helper_sve_st1dd_le_r } },
> + { { gen_helper_sve_st1bb_r,
> + gen_helper_sve_st1bh_r,
> + gen_helper_sve_st1bs_r,
> + gen_helper_sve_st1bd_r },
> + { NULL,
> + gen_helper_sve_st1hh_be_r,
> + gen_helper_sve_st1hs_be_r,
> + gen_helper_sve_st1hd_be_r },
> + { NULL, NULL,
> + gen_helper_sve_st1ss_be_r,
> + gen_helper_sve_st1sd_be_r },
> + { NULL, NULL, NULL,
> + gen_helper_sve_st1dd_be_r } },
> };
> - static gen_helper_gvec_mem * const fn_multiple[3][4] = {
> - { gen_helper_sve_st2bb_r, gen_helper_sve_st2hh_r,
> - gen_helper_sve_st2ss_r, gen_helper_sve_st2dd_r },
> - { gen_helper_sve_st3bb_r, gen_helper_sve_st3hh_r,
> - gen_helper_sve_st3ss_r, gen_helper_sve_st3dd_r },
> - { gen_helper_sve_st4bb_r, gen_helper_sve_st4hh_r,
> - gen_helper_sve_st4ss_r, gen_helper_sve_st4dd_r },
> + static gen_helper_gvec_mem * const fn_multiple[2][3][4] = {
> + { { gen_helper_sve_st2bb_r,
> + gen_helper_sve_st2hh_le_r,
> + gen_helper_sve_st2ss_le_r,
> + gen_helper_sve_st2dd_le_r },
> + { gen_helper_sve_st3bb_r,
> + gen_helper_sve_st3hh_le_r,
> + gen_helper_sve_st3ss_le_r,
> + gen_helper_sve_st3dd_le_r },
> + { gen_helper_sve_st4bb_r,
> + gen_helper_sve_st4hh_le_r,
> + gen_helper_sve_st4ss_le_r,
> + gen_helper_sve_st4dd_le_r } },
> + { { gen_helper_sve_st2bb_r,
> + gen_helper_sve_st2hh_be_r,
> + gen_helper_sve_st2ss_be_r,
> + gen_helper_sve_st2dd_be_r },
> + { gen_helper_sve_st3bb_r,
> + gen_helper_sve_st3hh_be_r,
> + gen_helper_sve_st3ss_be_r,
> + gen_helper_sve_st3dd_be_r },
> + { gen_helper_sve_st4bb_r,
> + gen_helper_sve_st4hh_be_r,
> + gen_helper_sve_st4ss_be_r,
> + gen_helper_sve_st4dd_be_r } },
> };
> gen_helper_gvec_mem *fn;
> + int be = s->be_data == MO_BE;
>
> if (nreg == 0) {
> /* ST1 */
> - fn = fn_single[msz][esz];
> + fn = fn_single[be][msz][esz];
> } else {
> /* ST2, ST3, ST4 -- msz == esz, enforced by encoding */
> assert(msz == esz);
> - fn = fn_multiple[nreg - 1][msz];
> + fn = fn_multiple[be][nreg - 1][msz];
> }
> assert(fn != NULL);
> do_mem_zpa(s, zt, pg, addr, fn);
>
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1
2018-08-09 4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
@ 2018-08-17 15:50 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 15:50 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Given that the only field defined for this new register may only
> be 0, we don't actually need to change anything except the name.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el
2018-08-09 4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
@ 2018-08-17 15:57 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 15:57 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Check for EL3 before testing CPTR_EL3.EZ. Return 0 when the exception
> should be routed via AdvSIMDFPAccessTrap. Mirror the structure of
> CheckSVEEnabled more closely.
>
> Fixes: 5be5e8eda78
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.c | 96 ++++++++++++++++++++++-----------------------
> 1 file changed, 46 insertions(+), 50 deletions(-)
>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
2018-08-09 4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
@ 2018-08-17 16:02 ` Peter Maydell
2018-08-17 16:47 ` Richard Henderson
0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:02 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Unlike aa32, endianness cannot be adjusted by userland in aa64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 27 +++++++++++++++++----------
> 1 file changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 9526ed27cb..2d6d7d03aa 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -2709,8 +2709,6 @@ static inline bool arm_sctlr_b(CPUARMState *env)
> /* Return true if the processor is in big-endian mode. */
> static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
> {
> - int cur_el;
> -
> /* In 32bit endianness is determined by looking at CPSR's E bit */
> if (!is_a64(env)) {
> return
> @@ -2729,15 +2727,24 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
> arm_sctlr_b(env) ||
> #endif
> ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
> + } else {
> +#ifdef CONFIG_USER_ONLY
> + /* AArch64 does not have a SETEND instruction; endianness
> + * for usermode is fixed at compile-time.
> + */
> +# ifdef TARGET_WORDS_BIGENDIAN
> + return true;
> +# else
> + return false;
> +# endif
> +#else
> + int cur_el = arm_current_el(env);
> + if (cur_el == 0) {
> + return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
> + }
> + return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
> +#endif
> }
> -
> - cur_el = arm_current_el(env);
> -
> - if (cur_el == 0) {
> - return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
> - }
> -
> - return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
> }
>
When does this make a difference? For user-mode, we've already
dealt with the "aa32" case, so the code here is aa64-only.
In linux-user/aarch64/cpu_loop.c we set sctlr_el[1]'s E0E bit
if TARGET_WORDS_BIGENDIAN is defined, and cur_el is definitely
zero, so we should already be returning true from this function
if TARGET_WORDS_BIGENDIAN and false otherwise.
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only
2018-08-09 4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
@ 2018-08-17 16:03 ` Peter Maydell
2018-08-17 16:51 ` Richard Henderson
0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:03 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Saves about 12k code size in qemu-aarch64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 2d6d7d03aa..aedaf2631e 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1958,6 +1958,9 @@ static inline bool arm_v7m_is_handler_mode(CPUARMState *env)
> */
> static inline int arm_current_el(CPUARMState *env)
> {
> +#ifdef CONFIG_USER_ONLY
> + return 0;
> +#else
> if (arm_feature(env, ARM_FEATURE_M)) {
> return arm_v7m_is_handler_mode(env) ||
> !(env->v7m.control[env->v7m.secure] & 1);
> @@ -1984,6 +1987,7 @@ static inline int arm_current_el(CPUARMState *env)
>
> return 1;
> }
> +#endif
Again, the #ifdeffery here should be unnecessary ? env->pstate,
env->uncached_cpsr, etc should be set so that we return the
right thing.
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
2018-08-09 4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
@ 2018-08-17 16:03 ` Peter Maydell
2018-08-17 16:10 ` Laurent Desnogues
0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:03 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Saves about 8k code size in qemu-aarch64.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index aedaf2631e..ed51a2f5aa 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -918,7 +918,15 @@ void aarch64_sync_64_to_32(CPUARMState *env);
>
> static inline bool is_a64(CPUARMState *env)
> {
> +#ifdef CONFIG_USER_ONLY
> +# ifdef TARGET_AARCH64
> + return true;
> +# else
> + return false;
> +# endif
> +#else
> return env->aarch64;
> +#endif
> }
And again. I don't want to pepper the code with ifdefs if
we can do the right thing without them.
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
2018-08-17 16:03 ` Peter Maydell
@ 2018-08-17 16:10 ` Laurent Desnogues
2018-08-17 16:23 ` Peter Maydell
0 siblings, 1 reply; 51+ messages in thread
From: Laurent Desnogues @ 2018-08-17 16:10 UTC (permalink / raw)
To: Peter Maydell; +Cc: Richard Henderson, qemu-devel@nongnu.org, Alex Bennée
Hello,
On Fri, Aug 17, 2018 at 6:04 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> > Saves about 8k code size in qemu-aarch64.
> >
> > Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > ---
> > target/arm/cpu.h | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > index aedaf2631e..ed51a2f5aa 100644
> > --- a/target/arm/cpu.h
> > +++ b/target/arm/cpu.h
> > @@ -918,7 +918,15 @@ void aarch64_sync_64_to_32(CPUARMState *env);
> >
> > static inline bool is_a64(CPUARMState *env)
> > {
> > +#ifdef CONFIG_USER_ONLY
> > +# ifdef TARGET_AARCH64
> > + return true;
> > +# else
> > + return false;
> > +# endif
> > +#else
> > return env->aarch64;
> > +#endif
> > }
>
> And again. I don't want to pepper the code with ifdefs if
> we can do the right thing without them.
FWIW I find it more readable with the ifdef's (here and in the
previous patches) and I guess that helps the compiler too.
Thanks,
Laurent
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode
2018-08-09 4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
@ 2018-08-17 16:22 ` Peter Maydell
2018-08-25 19:41 ` Richard Henderson
0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:22 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> SVE vector length can change when changing EL, or when writing
> to one of the ZCR_ELn registers.
>
> For correctness, our implementation requires that predicate bits
> that are inaccessible are never set. Which means noticing length
> changes and zeroing the appropriate register bits.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> +/*
> + * Notice a change in SVE vector size when changing EL.
> + */
> +void aarch64_sve_change_el(CPUARMState *env, int old_el, int new_el)
> +{
> + int old_len, new_len;
> +
> + /* Nothing to do if no SVE. */
> + if (!arm_feature(env, ARM_FEATURE_SVE)) {
> + return;
> + }
> +
> + /* Nothing to do if FP is disabled in either EL. */
> + if (fp_exception_el(env, old_el) || fp_exception_el(env, new_el)) {
> + return;
> + }
> +
> + /*
> + * When FP is enabled, but SVE is disabled, the effective len is 0.
> + * ??? How should sve_exception_el interact with AArch32 state?
> + * That isn't included in the CheckSVEEnabled pseudocode, so is the
> + * host kernel required to explicitly disable SVE for an EL using aa32?
> + */
I'm not clear what you're asking here. If the EL is AArch32
then why does it make a difference if SVE is enabled or disabled?
You can't get at it...
> + old_len = (sve_exception_el(env, old_el)
> + ? 0 : sve_zcr_len_for_el(env, old_el));
> + new_len = (sve_exception_el(env, new_el)
> + ? 0 : sve_zcr_len_for_el(env, new_el));
> +
> + /* When changing vector length, clear inaccessible state. */
> + if (new_len < old_len) {
> + aarch64_sve_narrow_vq(env, new_len + 1);
> + }
> +}
> +#endif
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 for user-only
2018-08-17 16:10 ` Laurent Desnogues
@ 2018-08-17 16:23 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:23 UTC (permalink / raw)
To: Laurent Desnogues
Cc: Richard Henderson, qemu-devel@nongnu.org, Alex Bennée
On 17 August 2018 at 17:10, Laurent Desnogues
<laurent.desnogues@gmail.com> wrote:
> Hello,
>
> On Fri, Aug 17, 2018 at 6:04 PM Peter Maydell <peter.maydell@linaro.org> wrote:
>> And again. I don't want to pepper the code with ifdefs if
>> we can do the right thing without them.
>
> FWIW I find it more readable with the ifdef's (here and in the
> previous patches) and I guess that helps the compiler too.
Hmm. I prefer to think of user-mode as a funny variant on
system emulation where we make the minimal changes required,
and mostly work just by having the CPU being in the state it
would be for system-emulation EL0 and not being able to get out
of it.
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
2018-08-09 4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
@ 2018-08-17 16:35 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-17 16:35 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Use the existing helpers to determine if (1) the fpu is enabled,
> (2) sve state is enabled, and (3) the current sve vector length.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 4 ++++
> target/arm/helper.c | 6 +++---
> target/arm/translate-a64.c | 8 ++++++--
> 3 files changed, 13 insertions(+), 5 deletions(-)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
I can't think of a reason why we'd want to look at the FPU
state with the FPU disabled, so I guess this is ok...
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
2018-08-17 16:02 ` Peter Maydell
@ 2018-08-17 16:47 ` Richard Henderson
0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-17 16:47 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 08/17/2018 09:02 AM, Peter Maydell wrote:
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Unlike aa32, endianness cannot be adjusted by userland in aa64.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/cpu.h | 27 +++++++++++++++++----------
>> 1 file changed, 17 insertions(+), 10 deletions(-)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 9526ed27cb..2d6d7d03aa 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -2709,8 +2709,6 @@ static inline bool arm_sctlr_b(CPUARMState *env)
>> /* Return true if the processor is in big-endian mode. */
>> static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
>> {
>> - int cur_el;
>> -
>> /* In 32bit endianness is determined by looking at CPSR's E bit */
>> if (!is_a64(env)) {
>> return
>> @@ -2729,15 +2727,24 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
>> arm_sctlr_b(env) ||
>> #endif
>> ((env->uncached_cpsr & CPSR_E) ? 1 : 0);
>> + } else {
>> +#ifdef CONFIG_USER_ONLY
>> + /* AArch64 does not have a SETEND instruction; endianness
>> + * for usermode is fixed at compile-time.
>> + */
>> +# ifdef TARGET_WORDS_BIGENDIAN
>> + return true;
>> +# else
>> + return false;
>> +# endif
>> +#else
>> + int cur_el = arm_current_el(env);
>> + if (cur_el == 0) {
>> + return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
>> + }
>> + return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
>> +#endif
>> }
>> -
>> - cur_el = arm_current_el(env);
>> -
>> - if (cur_el == 0) {
>> - return (env->cp15.sctlr_el[1] & SCTLR_E0E) != 0;
>> - }
>> -
>> - return (env->cp15.sctlr_el[cur_el] & SCTLR_EE) != 0;
>> }
>>
>
> When does this make a difference? For user-mode, we've already
> dealt with the "aa32" case, so the code here is aa64-only.
> In linux-user/aarch64/cpu_loop.c we set sctlr_el[1]'s E0E bit
> if TARGET_WORDS_BIGENDIAN is defined, and cur_el is definitely
> zero, so we should already be returning true from this function
> if TARGET_WORDS_BIGENDIAN and false otherwise.
I should have re-ordered this after the other following
simplifications to see if it still matters. But I was
after a code-size reduction.
r~
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only
2018-08-17 16:03 ` Peter Maydell
@ 2018-08-17 16:51 ` Richard Henderson
0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-17 16:51 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 08/17/2018 09:03 AM, Peter Maydell wrote:
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Saves about 12k code size in qemu-aarch64.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/cpu.h | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 2d6d7d03aa..aedaf2631e 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -1958,6 +1958,9 @@ static inline bool arm_v7m_is_handler_mode(CPUARMState *env)
>> */
>> static inline int arm_current_el(CPUARMState *env)
>> {
>> +#ifdef CONFIG_USER_ONLY
>> + return 0;
>> +#else
>> if (arm_feature(env, ARM_FEATURE_M)) {
>> return arm_v7m_is_handler_mode(env) ||
>> !(env->v7m.control[env->v7m.secure] & 1);
>> @@ -1984,6 +1987,7 @@ static inline int arm_current_el(CPUARMState *env)
>>
>> return 1;
>> }
>> +#endif
>
> Again, the #ifdeffery here should be unnecessary ? env->pstate,
> env->uncached_cpsr, etc should be set so that we return the
> right thing.
We get the right result, but you should have a look at how large the expansion
of this function is.
r~
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (20 preceding siblings ...)
2018-08-09 5:48 ` [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Laurent Desnogues
@ 2018-08-18 9:15 ` no-reply
2018-08-18 10:01 ` no-reply
22 siblings, 0 replies; 51+ messages in thread
From: no-reply @ 2018-08-18 9:15 UTC (permalink / raw)
To: richard.henderson
Cc: famz, qemu-devel, laurent.desnogues, peter.maydell, alex.bennee
Hi,
This series seems to have some coding style problems. See output below for
more information:
Type: series
Message-id: 20180809042206.15726-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
=== TEST SCRIPT BEGIN ===
#!/bin/bash
BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done
exit $failed
=== TEST SCRIPT END ===
Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
35e9c8f21d target/arm: Pass TCGMemOpIdx to sve memory helpers
6b726db20c target/arm: Rewrite vector gather first-fault loads
42f47a681a target/arm: Rewrite vector gather stores
ebdf36bdd6 target/arm: Rewrite vector gather loads
bea4bda9bd target/arm: Split contiguous stores for endianness
f74a87d369 target/arm: Split contiguous loads for endianness
522240fd71 target/arm: Rewrite helper_sve_st[1234]*_r
291c9a4079 target/arm: Rewrite helper_sve_ld[234]*_r
761fe6b96c target/arm: Rewrite helper_sve_ld1*_r using pages
439f82f39c target/arm: Clear unused predicate bits for LD1RQ
9f664be291 target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
72b8c608a0 target/arm: Handle SVE vector length changes in system mode
4d25343973 target/arm: Pass in current_el to fp and sve_exception_el
f63e45c476 target/arm: Fix is_a64 for user-only
77c7e3327f target/arm: Fix arm_current_el for user-only
065eea0432 target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
4fb82ef6d0 target/arm: Adjust sve_exception_el
bc8fa3f868 target/arm: Define ID_AA64ZFR0_EL1
3233806b21 target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
2989413056 target/arm: Set ISAR bits for -cpu max
=== OUTPUT BEGIN ===
Checking PATCH 1/20: target/arm: Set ISAR bits for -cpu max...
Checking PATCH 2/20: target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max...
Checking PATCH 3/20: target/arm: Define ID_AA64ZFR0_EL1...
Checking PATCH 4/20: target/arm: Adjust sve_exception_el...
ERROR: return is not a function, parentheses are not required
#57: FILE: target/arm/helper.c:4367:
+ return (arm_feature(env, ARM_FEATURE_EL2)
total: 1 errors, 0 warnings, 113 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 5/20: target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only...
Checking PATCH 6/20: target/arm: Fix arm_current_el for user-only...
Checking PATCH 7/20: target/arm: Fix is_a64 for user-only...
Checking PATCH 8/20: target/arm: Pass in current_el to fp and sve_exception_el...
Checking PATCH 9/20: target/arm: Handle SVE vector length changes in system mode...
Checking PATCH 10/20: target/arm: Adjust aarch64_cpu_dump_state for system mode SVE...
Checking PATCH 11/20: target/arm: Clear unused predicate bits for LD1RQ...
Checking PATCH 12/20: target/arm: Rewrite helper_sve_ld1*_r using pages...
Checking PATCH 13/20: target/arm: Rewrite helper_sve_ld[234]*_r...
Checking PATCH 14/20: target/arm: Rewrite helper_sve_st[1234]*_r...
ERROR: spaces required around that '*' (ctx:WxV)
#215: FILE: target/arm/sve_helper.c:4825:
+ sve_st1_tlb_fn *tlb_fn)
^
total: 1 errors, 0 warnings, 392 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 15/20: target/arm: Split contiguous loads for endianness...
Checking PATCH 16/20: target/arm: Split contiguous stores for endianness...
Checking PATCH 17/20: target/arm: Rewrite vector gather loads...
Checking PATCH 18/20: target/arm: Rewrite vector gather stores...
Checking PATCH 19/20: target/arm: Rewrite vector gather first-fault loads...
ERROR: spaces required around that '*' (ctx:WxV)
#292: FILE: target/arm/sve_helper.c:5216:
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
^
ERROR: spaces required around that '*' (ctx:WxV)
#292: FILE: target/arm/sve_helper.c:5216:
+ zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn,
^
ERROR: spaces required around that '*' (ctx:WxV)
#293: FILE: target/arm/sve_helper.c:5217:
+ sve_ld1_nf_fn *nonfault_fn)
^
total: 3 errors, 0 warnings, 573 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 20/20: target/arm: Pass TCGMemOpIdx to sve memory helpers...
=== OUTPUT END ===
Test command exited with code: 1
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
` (21 preceding siblings ...)
2018-08-18 9:15 ` no-reply
@ 2018-08-18 10:01 ` no-reply
22 siblings, 0 replies; 51+ messages in thread
From: no-reply @ 2018-08-18 10:01 UTC (permalink / raw)
To: richard.henderson
Cc: famz, qemu-devel, laurent.desnogues, peter.maydell, alex.bennee
Hi,
This series failed docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.
Type: series
Message-id: 20180809042206.15726-1-richard.henderson@linaro.org
Subject: [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches
=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-quick@centos7 SHOW_ENV=1 J=8
=== TEST SCRIPT END ===
Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
35e9c8f21d target/arm: Pass TCGMemOpIdx to sve memory helpers
6b726db20c target/arm: Rewrite vector gather first-fault loads
42f47a681a target/arm: Rewrite vector gather stores
ebdf36bdd6 target/arm: Rewrite vector gather loads
bea4bda9bd target/arm: Split contiguous stores for endianness
f74a87d369 target/arm: Split contiguous loads for endianness
522240fd71 target/arm: Rewrite helper_sve_st[1234]*_r
291c9a4079 target/arm: Rewrite helper_sve_ld[234]*_r
761fe6b96c target/arm: Rewrite helper_sve_ld1*_r using pages
439f82f39c target/arm: Clear unused predicate bits for LD1RQ
9f664be291 target/arm: Adjust aarch64_cpu_dump_state for system mode SVE
72b8c608a0 target/arm: Handle SVE vector length changes in system mode
4d25343973 target/arm: Pass in current_el to fp and sve_exception_el
f63e45c476 target/arm: Fix is_a64 for user-only
77c7e3327f target/arm: Fix arm_current_el for user-only
065eea0432 target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only
4fb82ef6d0 target/arm: Adjust sve_exception_el
bc8fa3f868 target/arm: Define ID_AA64ZFR0_EL1
3233806b21 target/arm: Set ID_AA64PFR0 bits for SVE for -cpu max
2989413056 target/arm: Set ISAR bits for -cpu max
=== OUTPUT BEGIN ===
BUILD centos7
make[1]: Entering directory '/var/tmp/patchew-tester-tmp-6uc3dv26/src'
GEN /var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar
Cloning into '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar.vroot'...
done.
Your branch is up-to-date with 'origin/test'.
Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc'
Cloning into '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar.vroot/dtc'...
Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42'
Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb'
Cloning into '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090/qemu.tar.vroot/ui/keycodemapdb'...
Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce'
COPY RUNNER
RUN test-quick in qemu:centos7
Packages installed:
SDL-devel-1.2.15-14.el7.x86_64
bison-3.0.4-1.el7.x86_64
bzip2-devel-1.0.6-13.el7.x86_64
ccache-3.3.4-1.el7.x86_64
csnappy-devel-0-6.20150729gitd7bc683.el7.x86_64
flex-2.5.37-3.el7.x86_64
gcc-4.8.5-16.el7_4.2.x86_64
gettext-0.19.8.1-2.el7.x86_64
git-1.8.3.1-12.el7_4.x86_64
glib2-devel-2.50.3-3.el7.x86_64
libepoxy-devel-1.3.1-1.el7.x86_64
libfdt-devel-1.4.6-1.el7.x86_64
lzo-devel-2.06-8.el7.x86_64
make-3.82-23.el7.x86_64
mesa-libEGL-devel-17.0.1-6.20170307.el7.x86_64
mesa-libgbm-devel-17.0.1-6.20170307.el7.x86_64
package g++ is not installed
package librdmacm-devel is not installed
pixman-devel-0.34.0-1.el7.x86_64
spice-glib-devel-0.33-6.el7_4.1.x86_64
spice-server-devel-0.12.8-2.el7.1.x86_64
tar-1.26-32.el7.x86_64
vte-devel-0.28.2-10.el7.x86_64
xen-devel-4.6.6-10.el7.x86_64
zlib-devel-1.2.7-17.el7.x86_64
Environment variables:
PACKAGES=bison bzip2-devel ccache csnappy-devel flex g++ gcc gettext git glib2-devel libepoxy-devel libfdt-devel librdmacm-devel lzo-devel make mesa-libEGL-devel mesa-libgbm-devel pixman-devel SDL-devel spice-glib-devel spice-server-devel tar vte-devel xen-devel zlib-devel
HOSTNAME=78baf9183a69
MAKEFLAGS= -j8
J=8
CCACHE_DIR=/var/tmp/ccache
EXTRA_CONFIGURE_OPTS=
V=
SHOW_ENV=1
PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/
TARGET_LIST=
SHLVL=1
HOME=/home/patchew
TEST_DIR=/tmp/qemu-test
FEATURES= dtc
DEBUG=
_=/usr/bin/env
Configure options:
--enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/tmp/qemu-test/install
No C++ compiler available; disabling C++ specific optional code
Install prefix /tmp/qemu-test/install
BIOS directory /tmp/qemu-test/install/share/qemu
firmware path /tmp/qemu-test/install/share/qemu-firmware
binary directory /tmp/qemu-test/install/bin
library directory /tmp/qemu-test/install/lib
module directory /tmp/qemu-test/install/lib/qemu
libexec directory /tmp/qemu-test/install/libexec
include directory /tmp/qemu-test/install/include
config directory /tmp/qemu-test/install/etc
local state directory /tmp/qemu-test/install/var
Manual directory /tmp/qemu-test/install/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path /tmp/qemu-test/src
GIT binary git
GIT submodules
C compiler cc
Host C compiler cc
C++ compiler
Objective-C compiler cc
ARFLAGS rv
CFLAGS -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g
QEMU_CFLAGS -I/usr/include/pixman-1 -Werror -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wendif-labels -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -Wno-missing-braces -I/usr/include/libpng15 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/nss3 -I/usr/include/nspr4 -I/usr/include/spice-1
LDFLAGS -Wl,--warn-common -Wl,-z,relro -Wl,-z,now -pie -m64 -g
QEMU_LDFLAGS
make make
install install
python python -B
smbd /usr/sbin/smbd
module support no
host CPU x86_64
host big endian no
target list x86_64-softmmu aarch64-softmmu
gprof enabled no
sparse enabled no
strip binaries yes
profiler no
static build no
SDL support yes (1.2.15)
GTK support yes (2.24.31)
GTK GL support no
VTE support yes (0.28.2)
TLS priority NORMAL
GNUTLS support no
GNUTLS rnd no
libgcrypt no
libgcrypt kdf no
nettle no
nettle kdf no
libtasn1 no
curses support yes
virgl support no
curl support no
mingw32 support no
Audio drivers oss
Block whitelist (rw)
Block whitelist (ro)
VirtFS support no
Multipath support no
VNC support yes
VNC SASL support no
VNC JPEG support no
VNC PNG support yes
xen support yes
xen ctrl version 40600
pv dom build no
brlapi support no
bluez support no
Documentation no
PIE yes
vde support no
netmap support no
Linux AIO support no
ATTR/XATTR support yes
Install blobs yes
KVM support yes
HAX support no
HVF support no
WHPX support no
TCG support yes
TCG debug enabled no
TCG interpreter no
malloc trim support yes
RDMA support yes
fdt support system
membarrier no
preadv support yes
fdatasync yes
madvise yes
posix_madvise yes
posix_memalign yes
libcap-ng support no
vhost-net support yes
vhost-crypto support yes
vhost-scsi support yes
vhost-vsock support yes
vhost-user support yes
Trace backends log
spice support yes (0.12.12/0.12.8)
rbd support no
xfsctl support no
smartcard support yes
libusb no
usb net redir no
OpenGL support yes
OpenGL dmabufs yes
libiscsi support no
libnfs support no
build guest agent yes
QGA VSS support no
QGA w32 disk info no
QGA MSI support no
seccomp support no
coroutine backend ucontext
coroutine pool yes
debug stack usage no
mutex debugging no
crypto afalg no
GlusterFS support no
gcov gcov
gcov enabled no
TPM support yes
libssh2 support no
TPM passthrough yes
TPM emulator yes
QOM debugging yes
Live block migration yes
lzo support yes
snappy support no
bzip2 support yes
NUMA host support no
libxml2 no
tcmalloc support no
jemalloc support no
avx2 optimization yes
replication support yes
VxHS block device no
capstone no
docker no
WARNING: Use of GTK 2.0 is deprecated and will be removed in
WARNING: future releases. Please switch to using GTK 3.0
WARNING: Use of SDL 1.2 is deprecated and will be removed in
WARNING: future releases. Please switch to using SDL 2.0
NOTE: cross-compilers enabled: 'cc'
GEN x86_64-softmmu/config-devices.mak.tmp
GEN aarch64-softmmu/config-devices.mak.tmp
GEN config-host.h
GEN qapi-gen
GEN qemu-options.def
GEN trace/generated-tcg-tracers.h
GEN trace/generated-helpers-wrappers.h
GEN trace/generated-helpers.h
GEN trace/generated-helpers.c
GEN module_block.h
GEN x86_64-softmmu/config-devices.mak
GEN aarch64-softmmu/config-devices.mak
GEN ui/input-keymap-atset1-to-qcode.c
GEN ui/input-keymap-linux-to-qcode.c
GEN ui/input-keymap-qcode-to-atset1.c
GEN ui/input-keymap-qcode-to-atset2.c
GEN ui/input-keymap-qcode-to-atset3.c
GEN ui/input-keymap-qcode-to-linux.c
GEN ui/input-keymap-qcode-to-qnum.c
GEN ui/input-keymap-qcode-to-sun.c
GEN ui/input-keymap-qnum-to-qcode.c
GEN ui/input-keymap-usb-to-qcode.c
GEN ui/input-keymap-win32-to-qcode.c
GEN ui/input-keymap-x11-to-qcode.c
GEN ui/input-keymap-xorgevdev-to-qcode.c
GEN ui/input-keymap-xorgxquartz-to-qcode.c
GEN ui/input-keymap-xorgkbd-to-qcode.c
GEN ui/input-keymap-xorgxwin-to-qcode.c
GEN ui/input-keymap-osx-to-qcode.c
GEN tests/test-qapi-gen
GEN trace-root.h
GEN accel/kvm/trace.h
GEN accel/tcg/trace.h
GEN audio/trace.h
GEN block/trace.h
GEN chardev/trace.h
GEN crypto/trace.h
GEN hw/9pfs/trace.h
GEN hw/acpi/trace.h
GEN hw/alpha/trace.h
GEN hw/arm/trace.h
GEN hw/audio/trace.h
GEN hw/block/trace.h
GEN hw/block/dataplane/trace.h
GEN hw/char/trace.h
GEN hw/display/trace.h
GEN hw/dma/trace.h
GEN hw/hppa/trace.h
GEN hw/i2c/trace.h
GEN hw/i386/trace.h
GEN hw/i386/xen/trace.h
GEN hw/ide/trace.h
GEN hw/input/trace.h
GEN hw/intc/trace.h
GEN hw/isa/trace.h
GEN hw/mem/trace.h
GEN hw/misc/trace.h
GEN hw/misc/macio/trace.h
GEN hw/net/trace.h
GEN hw/nvram/trace.h
GEN hw/pci/trace.h
GEN hw/pci-host/trace.h
GEN hw/ppc/trace.h
GEN hw/rdma/trace.h
GEN hw/rdma/vmw/trace.h
GEN hw/s390x/trace.h
GEN hw/scsi/trace.h
GEN hw/sd/trace.h
GEN hw/sparc/trace.h
GEN hw/sparc64/trace.h
GEN hw/timer/trace.h
GEN hw/tpm/trace.h
GEN hw/usb/trace.h
GEN hw/vfio/trace.h
GEN hw/virtio/trace.h
GEN hw/xen/trace.h
GEN io/trace.h
GEN linux-user/trace.h
GEN migration/trace.h
GEN nbd/trace.h
GEN net/trace.h
GEN qapi/trace.h
GEN qom/trace.h
GEN scsi/trace.h
GEN target/arm/trace.h
GEN target/i386/trace.h
GEN target/mips/trace.h
GEN target/ppc/trace.h
GEN target/s390x/trace.h
GEN target/sparc/trace.h
GEN ui/trace.h
GEN util/trace.h
GEN trace-root.c
GEN accel/kvm/trace.c
GEN accel/tcg/trace.c
GEN audio/trace.c
GEN block/trace.c
GEN chardev/trace.c
GEN crypto/trace.c
GEN hw/9pfs/trace.c
GEN hw/acpi/trace.c
GEN hw/alpha/trace.c
GEN hw/arm/trace.c
GEN hw/audio/trace.c
GEN hw/block/trace.c
GEN hw/block/dataplane/trace.c
GEN hw/char/trace.c
GEN hw/display/trace.c
GEN hw/dma/trace.c
GEN hw/hppa/trace.c
GEN hw/i2c/trace.c
GEN hw/i386/trace.c
GEN hw/i386/xen/trace.c
GEN hw/ide/trace.c
GEN hw/input/trace.c
GEN hw/intc/trace.c
GEN hw/isa/trace.c
GEN hw/mem/trace.c
GEN hw/misc/trace.c
GEN hw/misc/macio/trace.c
GEN hw/net/trace.c
GEN hw/nvram/trace.c
GEN hw/pci/trace.c
GEN hw/pci-host/trace.c
GEN hw/ppc/trace.c
GEN hw/rdma/trace.c
GEN hw/rdma/vmw/trace.c
GEN hw/s390x/trace.c
GEN hw/scsi/trace.c
GEN hw/sd/trace.c
GEN hw/sparc/trace.c
GEN hw/sparc64/trace.c
GEN hw/timer/trace.c
GEN hw/tpm/trace.c
GEN hw/usb/trace.c
GEN hw/vfio/trace.c
GEN hw/virtio/trace.c
GEN hw/xen/trace.c
GEN io/trace.c
GEN linux-user/trace.c
GEN migration/trace.c
GEN nbd/trace.c
GEN net/trace.c
GEN qapi/trace.c
GEN qom/trace.c
GEN scsi/trace.c
GEN target/arm/trace.c
GEN target/i386/trace.c
GEN target/mips/trace.c
GEN target/ppc/trace.c
GEN target/s390x/trace.c
GEN target/sparc/trace.c
GEN ui/trace.c
GEN util/trace.c
GEN config-all-devices.mak
CC tests/qemu-iotests/socket_scm_helper.o
GEN qga/qapi-generated/qapi-gen
CC qapi/qapi-types.o
CC qapi/qapi-builtin-types.o
CC qapi/qapi-types-block-core.o
CC qapi/qapi-types-block.o
CC qapi/qapi-types-char.o
CC qapi/qapi-types-common.o
CC qapi/qapi-types-introspect.o
CC qapi/qapi-types-crypto.o
CC qapi/qapi-types-job.o
CC qapi/qapi-types-migration.o
CC qapi/qapi-types-misc.o
CC qapi/qapi-types-net.o
CC qapi/qapi-types-rocker.o
CC qapi/qapi-types-run-state.o
CC qapi/qapi-types-sockets.o
CC qapi/qapi-types-tpm.o
CC qapi/qapi-types-trace.o
CC qapi/qapi-types-transaction.o
CC qapi/qapi-types-ui.o
CC qapi/qapi-builtin-visit.o
CC qapi/qapi-visit.o
CC qapi/qapi-visit-block-core.o
CC qapi/qapi-visit-block.o
CC qapi/qapi-visit-char.o
CC qapi/qapi-visit-common.o
CC qapi/qapi-visit-crypto.o
CC qapi/qapi-visit-introspect.o
CC qapi/qapi-visit-job.o
CC qapi/qapi-visit-migration.o
CC qapi/qapi-visit-misc.o
CC qapi/qapi-visit-net.o
CC qapi/qapi-visit-rocker.o
CC qapi/qapi-visit-run-state.o
CC qapi/qapi-visit-sockets.o
CC qapi/qapi-visit-tpm.o
CC qapi/qapi-visit-trace.o
CC qapi/qapi-visit-transaction.o
CC qapi/qapi-visit-ui.o
CC qapi/qapi-events.o
CC qapi/qapi-events-block-core.o
CC qapi/qapi-events-block.o
CC qapi/qapi-events-char.o
CC qapi/qapi-events-common.o
CC qapi/qapi-events-crypto.o
CC qapi/qapi-events-introspect.o
CC qapi/qapi-events-job.o
CC qapi/qapi-events-migration.o
CC qapi/qapi-events-misc.o
CC qapi/qapi-events-net.o
CC qapi/qapi-events-rocker.o
CC qapi/qapi-events-run-state.o
CC qapi/qapi-events-sockets.o
CC qapi/qapi-events-tpm.o
CC qapi/qapi-events-trace.o
CC qapi/qapi-events-transaction.o
CC qapi/qapi-events-ui.o
CC qapi/qapi-introspect.o
CC qapi/qapi-visit-core.o
CC qapi/qapi-dealloc-visitor.o
CC qapi/qobject-input-visitor.o
CC qapi/qobject-output-visitor.o
CC qapi/qmp-registry.o
CC qapi/qmp-dispatch.o
CC qapi/string-input-visitor.o
CC qapi/string-output-visitor.o
CC qapi/opts-visitor.o
CC qapi/qapi-clone-visitor.o
CC qapi/qmp-event.o
CC qapi/qapi-util.o
CC qobject/qnull.o
CC qobject/qnum.o
CC qobject/qstring.o
CC qobject/qdict.o
CC qobject/qlist.o
CC qobject/qbool.o
CC qobject/qlit.o
CC qobject/qjson.o
CC qobject/qobject.o
CC qobject/json-lexer.o
CC qobject/json-streamer.o
CC qobject/json-parser.o
CC qobject/block-qdict.o
CC trace/control.o
CC trace/qmp.o
CC util/osdep.o
CC util/cutils.o
CC util/unicode.o
CC util/qemu-timer-common.o
CC util/bufferiszero.o
CC util/lockcnt.o
CC util/aiocb.o
CC util/async.o
CC util/aio-wait.o
CC util/thread-pool.o
CC util/qemu-timer.o
CC util/main-loop.o
CC util/iohandler.o
CC util/aio-posix.o
CC util/compatfd.o
CC util/event_notifier-posix.o
CC util/mmap-alloc.o
CC util/oslib-posix.o
CC util/qemu-openpty.o
CC util/qemu-thread-posix.o
CC util/memfd.o
CC util/envlist.o
CC util/path.o
CC util/module.o
CC util/host-utils.o
CC util/bitmap.o
CC util/bitops.o
CC util/hbitmap.o
CC util/fifo8.o
CC util/acl.o
CC util/cacheinfo.o
CC util/error.o
CC util/qemu-error.o
CC util/id.o
CC util/iov.o
CC util/qemu-config.o
CC util/qemu-sockets.o
CC util/uri.o
CC util/notify.o
CC util/qemu-option.o
CC util/qemu-progress.o
CC util/keyval.o
CC util/hexdump.o
CC util/crc32c.o
CC util/uuid.o
CC util/throttle.o
CC util/getauxval.o
CC util/readline.o
CC util/rcu.o
CC util/qemu-coroutine.o
CC util/qemu-coroutine-lock.o
CC util/qemu-coroutine-io.o
CC util/qemu-coroutine-sleep.o
CC util/coroutine-ucontext.o
CC util/buffer.o
CC util/timed-average.o
CC util/base64.o
CC util/log.o
CC util/pagesize.o
CC util/qdist.o
CC util/qht.o
CC util/range.o
CC util/stats64.o
CC util/systemd.o
CC util/iova-tree.o
CC util/vfio-helpers.o
CC trace-root.o
CC accel/kvm/trace.o
CC accel/tcg/trace.o
CC audio/trace.o
CC block/trace.o
CC chardev/trace.o
CC crypto/trace.o
CC hw/9pfs/trace.o
CC hw/acpi/trace.o
CC hw/alpha/trace.o
CC hw/arm/trace.o
CC hw/audio/trace.o
CC hw/block/trace.o
CC hw/block/dataplane/trace.o
CC hw/char/trace.o
CC hw/display/trace.o
CC hw/dma/trace.o
CC hw/hppa/trace.o
CC hw/i2c/trace.o
CC hw/i386/trace.o
CC hw/i386/xen/trace.o
CC hw/ide/trace.o
CC hw/input/trace.o
CC hw/intc/trace.o
CC hw/isa/trace.o
CC hw/mem/trace.o
CC hw/misc/trace.o
CC hw/misc/macio/trace.o
CC hw/net/trace.o
CC hw/nvram/trace.o
CC hw/pci/trace.o
CC hw/pci-host/trace.o
CC hw/ppc/trace.o
CC hw/rdma/trace.o
CC hw/rdma/vmw/trace.o
CC hw/s390x/trace.o
CC hw/scsi/trace.o
CC hw/sd/trace.o
CC hw/sparc/trace.o
CC hw/sparc64/trace.o
CC hw/timer/trace.o
CC hw/tpm/trace.o
CC hw/usb/trace.o
CC hw/vfio/trace.o
CC hw/virtio/trace.o
CC hw/xen/trace.o
CC io/trace.o
CC linux-user/trace.o
CC migration/trace.o
CC nbd/trace.o
CC net/trace.o
CC qapi/trace.o
CC qom/trace.o
CC scsi/trace.o
CC target/arm/trace.o
CC target/i386/trace.o
CC target/mips/trace.o
CC target/ppc/trace.o
CC target/s390x/trace.o
CC target/sparc/trace.o
CC ui/trace.o
CC util/trace.o
CC crypto/pbkdf-stub.o
CC stubs/arch-query-cpu-def.o
CC stubs/arch-query-cpu-model-expansion.o
CC stubs/arch-query-cpu-model-comparison.o
CC stubs/arch-query-cpu-model-baseline.o
CC stubs/bdrv-next-monitor-owned.o
CC stubs/blk-commit-all.o
CC stubs/blockdev-close-all-bdrv-states.o
CC stubs/clock-warp.o
CC stubs/cpu-get-clock.o
CC stubs/cpu-get-icount.o
CC stubs/dump.o
CC stubs/error-printf.o
CC stubs/fdset.o
CC stubs/gdbstub.o
CC stubs/get-vm-name.o
CC stubs/iothread.o
CC stubs/iothread-lock.o
CC stubs/is-daemonized.o
CC stubs/machine-init-done.o
CC stubs/migr-blocker.o
CC stubs/change-state-handler.o
CC stubs/monitor.o
CC stubs/notify-event.o
CC stubs/qtest.o
CC stubs/replay.o
CC stubs/runstate-check.o
CC stubs/set-fd-handler.o
CC stubs/slirp.o
CC stubs/sysbus.o
CC stubs/tpm.o
CC stubs/trace-control.o
CC stubs/uuid.o
CC stubs/vm-stop.o
CC stubs/vmstate.o
CC stubs/qmp_memory_device.o
CC stubs/target-monitor-defs.o
CC stubs/target-get-monitor-def.o
CC stubs/pc_madt_cpu_entry.o
CC stubs/vmgenid.o
CC stubs/xen-common.o
CC stubs/xen-hvm.o
CC stubs/pci-host-piix.o
CC stubs/ram-block.o
CC contrib/ivshmem-client/ivshmem-client.o
CC contrib/ivshmem-client/main.o
CC contrib/ivshmem-server/ivshmem-server.o
CC contrib/ivshmem-server/main.o
CC qemu-nbd.o
CC block.o
CC blockjob.o
CC job.o
CC qemu-io-cmds.o
CC replication.o
CC block/raw-format.o
CC block/qcow.o
CC block/vdi.o
CC block/vmdk.o
CC block/cloop.o
CC block/bochs.o
CC block/vpc.o
CC block/vvfat.o
CC block/dmg.o
CC block/qcow2.o
CC block/qcow2-refcount.o
CC block/qcow2-cluster.o
CC block/qcow2-snapshot.o
CC block/qcow2-cache.o
CC block/qcow2-bitmap.o
CC block/qed.o
CC block/qed-table.o
CC block/qed-l2-cache.o
CC block/qed-cluster.o
CC block/qed-check.o
CC block/vhdx.o
CC block/vhdx-endian.o
CC block/vhdx-log.o
CC block/quorum.o
CC block/parallels.o
CC block/blkdebug.o
CC block/blkverify.o
CC block/blkreplay.o
CC block/blklogwrites.o
CC block/block-backend.o
CC block/snapshot.o
CC block/qapi.o
CC block/file-posix.o
CC block/null.o
CC block/mirror.o
CC block/commit.o
CC block/io.o
CC block/create.o
CC block/throttle-groups.o
CC block/nvme.o
CC block/nbd.o
CC block/nbd-client.o
CC block/sheepdog.o
CC block/accounting.o
CC block/dirty-bitmap.o
CC block/write-threshold.o
CC block/backup.o
CC block/replication.o
CC block/throttle.o
CC block/copy-on-read.o
CC block/crypto.o
CC nbd/server.o
CC nbd/client.o
CC nbd/common.o
CC scsi/utils.o
CC scsi/pr-manager.o
CC scsi/pr-manager-helper.o
CC block/dmg-bz2.o
CC crypto/init.o
CC crypto/hash.o
CC crypto/hash-glib.o
CC crypto/hmac.o
CC crypto/hmac-glib.o
CC crypto/aes.o
CC crypto/desrfb.o
CC crypto/cipher.o
CC crypto/tlscreds.o
CC crypto/tlscredsanon.o
CC crypto/tlscredspsk.o
CC crypto/tlscredsx509.o
CC crypto/tlssession.o
CC crypto/secret.o
CC crypto/random-platform.o
CC crypto/pbkdf.o
CC crypto/ivgen.o
CC crypto/ivgen-essiv.o
CC crypto/ivgen-plain.o
CC crypto/ivgen-plain64.o
CC crypto/afsplit.o
CC crypto/xts.o
CC crypto/block.o
CC crypto/block-luks.o
CC crypto/block-qcow.o
CC io/channel.o
CC io/channel-buffer.o
CC io/channel-command.o
CC io/channel-file.o
CC io/channel-socket.o
CC io/channel-tls.o
CC io/channel-watch.o
CC io/channel-websock.o
CC io/channel-util.o
CC io/dns-resolver.o
CC io/net-listener.o
CC io/task.o
CC qom/object.o
CC qom/container.o
CC qom/qom-qobject.o
CC qom/object_interfaces.o
GEN qemu-img-cmds.h
CC qemu-io.o
CC scsi/qemu-pr-helper.o
CC qemu-bridge-helper.o
CC blockdev.o
CC blockdev-nbd.o
CC bootdevice.o
CC iothread.o
CC job-qmp.o
CC qdev-monitor.o
CC device-hotplug.o
CC os-posix.o
CC bt-host.o
CC bt-vhci.o
CC dma-helpers.o
CC vl.o
CC tpm.o
CC device_tree.o
CC qapi/qapi-commands.o
CC qapi/qapi-commands-block-core.o
CC qapi/qapi-commands-block.o
CC qapi/qapi-commands-char.o
CC qapi/qapi-commands-common.o
CC qapi/qapi-commands-crypto.o
CC qapi/qapi-commands-introspect.o
CC qapi/qapi-commands-job.o
CC qapi/qapi-commands-migration.o
CC qapi/qapi-commands-misc.o
CC qapi/qapi-commands-net.o
CC qapi/qapi-commands-rocker.o
CC qapi/qapi-commands-run-state.o
CC qapi/qapi-commands-sockets.o
CC qapi/qapi-commands-tpm.o
CC qapi/qapi-commands-trace.o
CC qapi/qapi-commands-transaction.o
CC qapi/qapi-commands-ui.o
CC qmp.o
CC hmp.o
CC cpus-common.o
CC audio/audio.o
CC audio/noaudio.o
CC audio/wavaudio.o
CC audio/mixeng.o
CC audio/spiceaudio.o
CC audio/wavcapture.o
CC backends/rng.o
CC backends/rng-egd.o
CC backends/rng-random.o
CC backends/tpm.o
CC backends/hostmem.o
CC backends/hostmem-ram.o
CC backends/hostmem-file.o
CC backends/cryptodev.o
CC backends/cryptodev-builtin.o
CC backends/cryptodev-vhost.o
CC backends/cryptodev-vhost-user.o
CC backends/hostmem-memfd.o
CC block/stream.o
CC chardev/msmouse.o
CC chardev/wctablet.o
CC chardev/testdev.o
CC chardev/spice.o
CC disas/arm.o
CC disas/i386.o
CC fsdev/qemu-fsdev-dummy.o
CC fsdev/qemu-fsdev-opts.o
CC fsdev/qemu-fsdev-throttle.o
CC hw/acpi/core.o
CC hw/acpi/piix4.o
CC hw/acpi/pcihp.o
CC hw/acpi/ich9.o
CC hw/acpi/tco.o
CC hw/acpi/cpu_hotplug.o
CC hw/acpi/memory_hotplug.o
CC hw/acpi/cpu.o
CC hw/acpi/nvdimm.o
CC hw/acpi/vmgenid.o
CC hw/acpi/acpi_interface.o
CC hw/acpi/bios-linker-loader.o
CC hw/acpi/aml-build.o
CC hw/acpi/ipmi.o
CC hw/acpi/acpi-stub.o
CC hw/acpi/ipmi-stub.o
CC hw/audio/sb16.o
CC hw/audio/es1370.o
CC hw/audio/ac97.o
CC hw/audio/fmopl.o
CC hw/audio/adlib.o
CC hw/audio/gus.o
CC hw/audio/gusemu_hal.o
CC hw/audio/gusemu_mixer.o
CC hw/audio/cs4231a.o
CC hw/audio/intel-hda.o
CC hw/audio/hda-codec.o
CC hw/audio/pcspk.o
CC hw/audio/wm8750.o
CC hw/audio/pl041.o
CC hw/audio/lm4549.o
CC hw/audio/marvell_88w8618.o
CC hw/audio/soundhw.o
CC hw/block/block.o
CC hw/block/cdrom.o
CC hw/block/hd-geometry.o
CC hw/block/fdc.o
CC hw/block/m25p80.o
CC hw/block/nand.o
CC hw/block/pflash_cfi01.o
CC hw/block/pflash_cfi02.o
CC hw/block/xen_disk.o
CC hw/block/ecc.o
CC hw/block/onenand.o
CC hw/block/nvme.o
CC hw/bt/core.o
CC hw/bt/l2cap.o
CC hw/bt/sdp.o
CC hw/bt/hci.o
CC hw/bt/hid.o
CC hw/bt/hci-csr.o
CC hw/char/ipoctal232.o
CC hw/char/parallel.o
CC hw/char/parallel-isa.o
CC hw/char/pl011.o
CC hw/char/serial.o
CC hw/char/serial-isa.o
CC hw/char/serial-pci.o
CC hw/char/virtio-console.o
CC hw/char/xen_console.o
CC hw/char/cadence_uart.o
CC hw/char/cmsdk-apb-uart.o
CC hw/char/debugcon.o
CC hw/char/imx_serial.o
CC hw/core/qdev.o
CC hw/core/qdev-properties.o
CC hw/core/bus.o
CC hw/core/reset.o
CC hw/core/qdev-fw.o
CC hw/core/fw-path-provider.o
CC hw/core/irq.o
CC hw/core/hotplug.o
CC hw/core/nmi.o
CC hw/core/stream.o
CC hw/core/ptimer.o
CC hw/core/sysbus.o
CC hw/core/machine.o
CC hw/core/loader.o
CC hw/core/qdev-properties-system.o
CC hw/core/register.o
CC hw/core/or-irq.o
CC hw/core/split-irq.o
CC hw/core/platform-bus.o
CC hw/cpu/core.o
CC hw/display/ramfb.o
CC hw/display/ramfb-standalone.o
CC hw/display/ads7846.o
CC hw/display/cirrus_vga.o
CC hw/display/pl110.o
CC hw/display/sii9022.o
CC hw/display/ssd0303.o
CC hw/display/ssd0323.o
CC hw/display/xenfb.o
CC hw/display/vga-pci.o
CC hw/display/bochs-display.o
CC hw/display/vga-isa.o
CC hw/display/vmware_vga.o
CC hw/display/blizzard.o
CC hw/display/exynos4210_fimd.o
CC hw/display/framebuffer.o
CC hw/display/tc6393xb.o
CC hw/display/qxl.o
CC hw/display/qxl-logger.o
CC hw/display/qxl-render.o
CC hw/dma/pl080.o
CC hw/dma/pl330.o
CC hw/dma/i8257.o
CC hw/dma/xilinx_axidma.o
CC hw/dma/xlnx-zynq-devcfg.o
CC hw/dma/xlnx-zdma.o
CC hw/gpio/max7310.o
CC hw/gpio/pl061.o
CC hw/gpio/zaurus.o
CC hw/gpio/gpio_key.o
CC hw/i2c/core.o
CC hw/i2c/smbus.o
CC hw/i2c/smbus_eeprom.o
CC hw/i2c/i2c-ddc.o
CC hw/i2c/versatile_i2c.o
CC hw/i2c/smbus_ich9.o
CC hw/i2c/pm_smbus.o
CC hw/i2c/bitbang_i2c.o
CC hw/i2c/exynos4210_i2c.o
CC hw/i2c/imx_i2c.o
CC hw/i2c/aspeed_i2c.o
CC hw/ide/core.o
CC hw/ide/atapi.o
CC hw/ide/qdev.o
CC hw/ide/pci.o
CC hw/ide/isa.o
CC hw/ide/piix.o
CC hw/ide/microdrive.o
CC hw/ide/ahci.o
CC hw/ide/ich.o
CC hw/ide/ahci-allwinner.o
CC hw/input/hid.o
CC hw/input/lm832x.o
CC hw/input/pckbd.o
CC hw/input/pl050.o
CC hw/input/ps2.o
CC hw/input/stellaris_input.o
CC hw/input/tsc2005.o
CC hw/input/virtio-input.o
CC hw/input/virtio-input-hid.o
CC hw/input/virtio-input-host.o
CC hw/intc/i8259_common.o
CC hw/intc/i8259.o
CC hw/intc/pl190.o
CC hw/intc/xlnx-pmu-iomod-intc.o
CC hw/intc/xlnx-zynqmp-ipi.o
CC hw/intc/imx_avic.o
CC hw/intc/imx_gpcv2.o
CC hw/intc/realview_gic.o
CC hw/intc/ioapic_common.o
CC hw/intc/arm_gic_common.o
CC hw/intc/arm_gic.o
CC hw/intc/arm_gicv2m.o
CC hw/intc/arm_gicv3_common.o
CC hw/intc/arm_gicv3.o
CC hw/intc/arm_gicv3_dist.o
CC hw/intc/arm_gicv3_redist.o
CC hw/intc/arm_gicv3_its_common.o
CC hw/intc/intc.o
CC hw/ipack/ipack.o
CC hw/ipack/tpci200.o
CC hw/ipmi/ipmi.o
CC hw/ipmi/ipmi_bmc_sim.o
CC hw/ipmi/ipmi_bmc_extern.o
CC hw/ipmi/isa_ipmi_kcs.o
CC hw/ipmi/isa_ipmi_bt.o
CC hw/isa/isa-bus.o
CC hw/isa/isa-superio.o
CC hw/isa/smc37c669-superio.o
CC hw/isa/apm.o
CC hw/mem/pc-dimm.o
CC hw/mem/memory-device.o
CC hw/mem/nvdimm.o
CC hw/misc/applesmc.o
CC hw/misc/max111x.o
CC hw/misc/tmp105.o
CC hw/misc/tmp421.o
CC hw/misc/debugexit.o
CC hw/misc/sga.o
CC hw/misc/pc-testdev.o
CC hw/misc/pci-testdev.o
CC hw/misc/edu.o
CC hw/misc/pca9552.o
CC hw/misc/unimp.o
CC hw/misc/vmcoreinfo.o
CC hw/misc/arm_l2x0.o
CC hw/misc/arm_integrator_debug.o
CC hw/misc/a9scu.o
CC hw/misc/arm11scu.o
CC hw/net/xen_nic.o
CC hw/net/ne2000.o
CC hw/net/eepro100.o
CC hw/net/pcnet-pci.o
CC hw/net/pcnet.o
CC hw/net/e1000.o
CC hw/net/e1000x_common.o
CC hw/net/net_tx_pkt.o
CC hw/net/net_rx_pkt.o
CC hw/net/e1000e.o
CC hw/net/e1000e_core.o
CC hw/net/rtl8139.o
CC hw/net/vmxnet3.o
CC hw/net/smc91c111.o
CC hw/net/lan9118.o
CC hw/net/ne2000-isa.o
CC hw/net/xgmac.o
CC hw/net/xilinx_axienet.o
CC hw/net/allwinner_emac.o
CC hw/net/imx_fec.o
CC hw/net/cadence_gem.o
CC hw/net/stellaris_enet.o
CC hw/net/ftgmac100.o
CC hw/net/rocker/rocker.o
CC hw/net/rocker/rocker_fp.o
CC hw/net/rocker/rocker_desc.o
CC hw/net/rocker/rocker_world.o
CC hw/net/rocker/rocker_of_dpa.o
CC hw/net/can/can_sja1000.o
CC hw/net/can/can_kvaser_pci.o
CC hw/net/can/can_pcm3680_pci.o
CC hw/net/can/can_mioe3680_pci.o
CC hw/nvram/eeprom93xx.o
CC hw/nvram/eeprom_at24c.o
CC hw/nvram/fw_cfg.o
CC hw/nvram/chrp_nvram.o
CC hw/pci-bridge/pci_bridge_dev.o
CC hw/pci-bridge/pcie_root_port.o
CC hw/pci-bridge/gen_pcie_root_port.o
CC hw/pci-bridge/pcie_pci_bridge.o
CC hw/pci-bridge/pci_expander_bridge.o
CC hw/pci-bridge/xio3130_upstream.o
CC hw/pci-bridge/xio3130_downstream.o
CC hw/pci-bridge/ioh3420.o
CC hw/pci-bridge/i82801b11.o
CC hw/pci-host/pam.o
CC hw/pci-host/versatile.o
CC hw/pci-host/piix.o
CC hw/pci-host/q35.o
CC hw/pci-host/gpex.o
CC hw/pci-host/designware.o
CC hw/pci/pci.o
CC hw/pci/pci_bridge.o
CC hw/pci/msix.o
CC hw/pci/msi.o
CC hw/pci/shpc.o
CC hw/pci/slotid_cap.o
CC hw/pci/pci_host.o
CC hw/pci/pcie_host.o
CC hw/pci/pcie.o
CC hw/pci/pcie_aer.o
CC hw/pci/pcie_port.o
CC hw/pci/pci-stub.o
CC hw/pcmcia/pcmcia.o
CC hw/scsi/scsi-disk.o
CC hw/scsi/scsi-generic.o
CC hw/scsi/scsi-bus.o
CC hw/scsi/lsi53c895a.o
CC hw/scsi/mptsas.o
CC hw/scsi/mptconfig.o
CC hw/scsi/mptendian.o
CC hw/scsi/megasas.o
CC hw/scsi/vmw_pvscsi.o
CC hw/scsi/esp.o
CC hw/scsi/esp-pci.o
CC hw/sd/pl181.o
CC hw/sd/ssi-sd.o
CC hw/sd/sd.o
CC hw/sd/core.o
CC hw/sd/sdmmc-internal.o
CC hw/sd/sdhci.o
CC hw/smbios/smbios.o
CC hw/smbios/smbios_type_38.o
CC hw/smbios/smbios-stub.o
CC hw/smbios/smbios_type_38-stub.o
CC hw/ssi/pl022.o
CC hw/ssi/ssi.o
CC hw/ssi/xilinx_spips.o
CC hw/ssi/aspeed_smc.o
CC hw/ssi/stm32f2xx_spi.o
CC hw/ssi/mss-spi.o
CC hw/timer/arm_timer.o
CC hw/timer/arm_mptimer.o
CC hw/timer/armv7m_systick.o
CC hw/timer/a9gtimer.o
CC hw/timer/cadence_ttc.o
CC hw/timer/ds1338.o
CC hw/timer/hpet.o
CC hw/timer/i8254_common.o
CC hw/timer/i8254.o
CC hw/timer/pl031.o
CC hw/timer/twl92230.o
CC hw/timer/imx_epit.o
CC hw/timer/imx_gpt.o
CC hw/timer/xlnx-zynqmp-rtc.o
CC hw/timer/stm32f2xx_timer.o
CC hw/timer/aspeed_timer.o
CC hw/timer/cmsdk-apb-timer.o
CC hw/timer/mss-timer.o
CC hw/tpm/tpm_util.o
CC hw/tpm/tpm_tis.o
CC hw/tpm/tpm_crb.o
CC hw/tpm/tpm_passthrough.o
CC hw/tpm/tpm_emulator.o
CC hw/usb/core.o
CC hw/usb/combined-packet.o
CC hw/usb/bus.o
CC hw/usb/libhw.o
CC hw/usb/desc.o
CC hw/usb/desc-msos.o
CC hw/usb/hcd-uhci.o
CC hw/usb/hcd-ohci.o
CC hw/usb/hcd-ehci.o
CC hw/usb/hcd-ehci-pci.o
CC hw/usb/hcd-ehci-sysbus.o
CC hw/usb/hcd-xhci.o
CC hw/usb/hcd-xhci-nec.o
CC hw/usb/hcd-musb.o
CC hw/usb/dev-hub.o
CC hw/usb/dev-hid.o
CC hw/usb/dev-wacom.o
CC hw/usb/dev-storage.o
CC hw/usb/dev-uas.o
CC hw/usb/dev-audio.o
CC hw/usb/dev-serial.o
CC hw/usb/dev-network.o
CC hw/usb/dev-bluetooth.o
CC hw/usb/dev-smartcard-reader.o
CC hw/usb/ccid-card-passthru.o
CC hw/usb/ccid-card-emulated.o
CC hw/usb/dev-mtp.o
CC hw/usb/host-stub.o
CC hw/virtio/virtio-bus.o
CC hw/virtio/virtio-rng.o
CC hw/virtio/virtio-pci.o
CC hw/virtio/virtio-mmio.o
CC hw/virtio/vhost-stub.o
CC hw/watchdog/watchdog.o
CC hw/watchdog/wdt_i6300esb.o
CC hw/watchdog/wdt_ib700.o
CC hw/watchdog/wdt_aspeed.o
CC hw/xen/xen_backend.o
CC hw/xen/xen_devconfig.o
CC hw/xen/xen_pvdev.o
CC hw/xen/xen-common.o
CC migration/migration.o
CC migration/socket.o
CC migration/fd.o
CC migration/exec.o
CC migration/tls.o
CC migration/channel.o
CC migration/savevm.o
CC migration/colo-comm.o
CC migration/colo.o
CC migration/colo-failover.o
CC migration/vmstate.o
CC migration/vmstate-types.o
CC migration/page_cache.o
CC migration/qemu-file.o
CC migration/global_state.o
CC migration/qemu-file-channel.o
CC migration/xbzrle.o
CC migration/postcopy-ram.o
CC migration/qjson.o
CC migration/block-dirty-bitmap.o
CC migration/rdma.o
CC migration/block.o
CC net/net.o
CC net/queue.o
CC net/checksum.o
CC net/util.o
CC net/hub.o
CC net/socket.o
CC net/dump.o
CC net/eth.o
CC net/l2tpv3.o
CC net/vhost-user.o
CC net/slirp.o
CC net/filter.o
CC net/filter-buffer.o
CC net/filter-mirror.o
CC net/colo-compare.o
CC net/colo.o
CC net/filter-rewriter.o
CC net/filter-replay.o
CC net/tap.o
CC net/tap-linux.o
CC net/can/can_core.o
CC net/can/can_host.o
CC net/can/can_socketcan.o
CC qom/cpu.o
CC replay/replay.o
CC replay/replay-internal.o
CC replay/replay-events.o
CC replay/replay-time.o
CC replay/replay-input.o
CC replay/replay-char.o
CC replay/replay-snapshot.o
CC replay/replay-net.o
CC replay/replay-audio.o
CC slirp/cksum.o
CC slirp/if.o
CC slirp/ip_icmp.o
CC slirp/ip6_icmp.o
CC slirp/ip6_input.o
CC slirp/ip6_output.o
CC slirp/ip_input.o
CC slirp/ip_output.o
CC slirp/dnssearch.o
CC slirp/dhcpv6.o
CC slirp/slirp.o
CC slirp/mbuf.o
CC slirp/misc.o
CC slirp/sbuf.o
CC slirp/socket.o
CC slirp/tcp_input.o
CC slirp/tcp_output.o
CC slirp/tcp_subr.o
CC slirp/tcp_timer.o
CC slirp/udp.o
CC slirp/udp6.o
CC slirp/bootp.o
CC slirp/tftp.o
CC slirp/arp_table.o
CC slirp/ndp_table.o
CC slirp/ncsi.o
CC ui/keymaps.o
CC ui/console.o
CC ui/cursor.o
CC ui/qemu-pixman.o
CC ui/input.o
CC ui/input-keymap.o
CC ui/input-legacy.o
CC ui/input-linux.o
CC ui/spice-core.o
CC ui/spice-input.o
CC ui/spice-display.o
CC ui/vnc.o
CC ui/vnc-enc-zlib.o
CC ui/vnc-enc-hextile.o
CC ui/vnc-enc-tight.o
CC ui/vnc-palette.o
CC ui/vnc-enc-zrle.o
CC ui/vnc-auth-vencrypt.o
CC ui/vnc-ws.o
CC ui/vnc-jobs.o
VERT ui/shader/texture-blit-vert.h
VERT ui/shader/texture-blit-flip-vert.h
FRAG ui/shader/texture-blit-frag.h
CC ui/console-gl.o
CC ui/egl-helpers.o
CC ui/egl-context.o
CC ui/egl-headless.o
CC audio/ossaudio.o
CC ui/sdl.o
CC ui/sdl_zoom.o
CC ui/x_keymap.o
CC ui/gtk.o
CC ui/gtk-egl.o
CC ui/curses.o
CC chardev/char.o
CC chardev/char-fd.o
CC chardev/char-fe.o
CC chardev/char-file.o
CC chardev/char-io.o
CC chardev/char-mux.o
CC chardev/char-null.o
CC chardev/char-parallel.o
CC chardev/char-pipe.o
CC chardev/char-pty.o
CC chardev/char-ringbuf.o
CC chardev/char-serial.o
CC chardev/char-socket.o
CC chardev/char-stdio.o
CC chardev/char-udp.o
LINK tests/qemu-iotests/socket_scm_helper
CC qga/commands.o
CC qga/guest-agent-command-state.o
AS optionrom/multiboot.o
AS optionrom/linuxboot.o
CC optionrom/linuxboot_dma.o
AS optionrom/kvmvapic.o
BUILD optionrom/multiboot.img
BUILD optionrom/linuxboot.img
BUILD optionrom/kvmvapic.img
CC qga/main.o
BUILD optionrom/multiboot.raw
BUILD optionrom/linuxboot.raw
BUILD optionrom/linuxboot_dma.img
BUILD optionrom/kvmvapic.raw
SIGN optionrom/multiboot.bin
SIGN optionrom/linuxboot.bin
BUILD optionrom/linuxboot_dma.raw
SIGN optionrom/kvmvapic.bin
CC qga/commands-posix.o
SIGN optionrom/linuxboot_dma.bin
CC qga/channel-posix.o
CC qga/qapi-generated/qga-qapi-types.o
CC qga/qapi-generated/qga-qapi-visit.o
CC qga/qapi-generated/qga-qapi-commands.o
AR libqemuutil.a
CC qemu-img.o
LINK qemu-io
LINK scsi/qemu-pr-helper
LINK qemu-bridge-helper
CC ui/shader.o
LINK ivshmem-client
LINK ivshmem-server
LINK qemu-nbd
LINK qemu-ga
GEN x86_64-softmmu/hmp-commands.h
GEN x86_64-softmmu/hmp-commands-info.h
GEN x86_64-softmmu/config-target.h
LINK qemu-img
GEN aarch64-softmmu/hmp-commands-info.h
GEN aarch64-softmmu/config-target.h
GEN aarch64-softmmu/hmp-commands.h
CC x86_64-softmmu/tcg/tcg.o
CC x86_64-softmmu/tcg/tcg-op.o
CC x86_64-softmmu/exec.o
CC x86_64-softmmu/tcg/tcg-op-vec.o
CC x86_64-softmmu/tcg/tcg-op-gvec.o
CC x86_64-softmmu/tcg/tcg-common.o
CC aarch64-softmmu/exec.o
CC x86_64-softmmu/tcg/optimize.o
CC x86_64-softmmu/fpu/softfloat.o
CC x86_64-softmmu/disas.o
GEN x86_64-softmmu/gdbstub-xml.c
CC x86_64-softmmu/arch_init.o
CC x86_64-softmmu/cpus.o
CC x86_64-softmmu/monitor.o
CC x86_64-softmmu/gdbstub.o
CC x86_64-softmmu/balloon.o
CC x86_64-softmmu/ioport.o
CC aarch64-softmmu/tcg/tcg.o
CC x86_64-softmmu/numa.o
CC x86_64-softmmu/qtest.o
CC x86_64-softmmu/memory.o
CC x86_64-softmmu/memory_mapping.o
CC x86_64-softmmu/dump.o
CC x86_64-softmmu/win_dump.o
CC x86_64-softmmu/migration/ram.o
CC x86_64-softmmu/accel/accel.o
CC x86_64-softmmu/accel/kvm/kvm-all.o
CC x86_64-softmmu/accel/stubs/hax-stub.o
CC x86_64-softmmu/accel/stubs/hvf-stub.o
CC x86_64-softmmu/accel/stubs/whpx-stub.o
CC x86_64-softmmu/accel/tcg/tcg-all.o
CC x86_64-softmmu/accel/tcg/cputlb.o
CC x86_64-softmmu/accel/tcg/tcg-runtime.o
CC x86_64-softmmu/accel/tcg/tcg-runtime-gvec.o
CC aarch64-softmmu/tcg/tcg-op.o
CC x86_64-softmmu/accel/tcg/cpu-exec.o
CC x86_64-softmmu/accel/tcg/cpu-exec-common.o
CC x86_64-softmmu/accel/tcg/translate-all.o
CC x86_64-softmmu/accel/tcg/translator.o
CC x86_64-softmmu/hw/block/virtio-blk.o
CC x86_64-softmmu/hw/block/vhost-user-blk.o
CC aarch64-softmmu/tcg/tcg-op-vec.o
CC aarch64-softmmu/tcg/tcg-op-gvec.o
CC x86_64-softmmu/hw/block/dataplane/virtio-blk.o
CC aarch64-softmmu/tcg/tcg-common.o
CC aarch64-softmmu/tcg/optimize.o
CC aarch64-softmmu/fpu/softfloat.o
CC x86_64-softmmu/hw/char/virtio-serial-bus.o
CC x86_64-softmmu/hw/core/generic-loader.o
CC x86_64-softmmu/hw/core/null-machine.o
CC aarch64-softmmu/disas.o
CC x86_64-softmmu/hw/display/vga.o
CC x86_64-softmmu/hw/display/virtio-gpu.o
GEN aarch64-softmmu/gdbstub-xml.c
CC x86_64-softmmu/hw/display/virtio-gpu-3d.o
CC aarch64-softmmu/arch_init.o
CC x86_64-softmmu/hw/display/virtio-gpu-pci.o
CC x86_64-softmmu/hw/display/virtio-vga.o
CC aarch64-softmmu/cpus.o
CC aarch64-softmmu/monitor.o
CC x86_64-softmmu/hw/intc/apic.o
CC aarch64-softmmu/gdbstub.o
CC x86_64-softmmu/hw/intc/apic_common.o
CC x86_64-softmmu/hw/intc/ioapic.o
CC aarch64-softmmu/balloon.o
CC aarch64-softmmu/ioport.o
CC x86_64-softmmu/hw/isa/lpc_ich9.o
CC aarch64-softmmu/numa.o
CC aarch64-softmmu/qtest.o
CC aarch64-softmmu/memory.o
CC aarch64-softmmu/memory_mapping.o
CC aarch64-softmmu/dump.o
CC aarch64-softmmu/migration/ram.o
CC x86_64-softmmu/hw/misc/ivshmem.o
CC aarch64-softmmu/accel/accel.o
CC aarch64-softmmu/accel/stubs/hax-stub.o
CC aarch64-softmmu/accel/stubs/hvf-stub.o
CC aarch64-softmmu/accel/stubs/whpx-stub.o
CC aarch64-softmmu/accel/stubs/kvm-stub.o
CC x86_64-softmmu/hw/misc/pvpanic.o
CC aarch64-softmmu/accel/tcg/tcg-all.o
CC aarch64-softmmu/accel/tcg/cputlb.o
CC aarch64-softmmu/accel/tcg/tcg-runtime.o
CC aarch64-softmmu/accel/tcg/tcg-runtime-gvec.o
CC x86_64-softmmu/hw/misc/hyperv_testdev.o
CC aarch64-softmmu/accel/tcg/cpu-exec.o
CC aarch64-softmmu/accel/tcg/cpu-exec-common.o
CC aarch64-softmmu/accel/tcg/translate-all.o
CC x86_64-softmmu/hw/misc/mmio_interface.o
CC x86_64-softmmu/hw/net/virtio-net.o
CC aarch64-softmmu/accel/tcg/translator.o
CC x86_64-softmmu/hw/net/vhost_net.o
CC x86_64-softmmu/hw/rdma/rdma_utils.o
CC x86_64-softmmu/hw/rdma/rdma_backend.o
CC x86_64-softmmu/hw/rdma/rdma_rm.o
CC x86_64-softmmu/hw/rdma/vmw/pvrdma_dev_ring.o
CC aarch64-softmmu/hw/adc/stm32f2xx_adc.o
CC aarch64-softmmu/hw/block/virtio-blk.o
CC x86_64-softmmu/hw/rdma/vmw/pvrdma_cmd.o
CC x86_64-softmmu/hw/rdma/vmw/pvrdma_qp_ops.o
CC x86_64-softmmu/hw/rdma/vmw/pvrdma_main.o
CC aarch64-softmmu/hw/block/vhost-user-blk.o
CC x86_64-softmmu/hw/scsi/virtio-scsi.o
CC aarch64-softmmu/hw/block/dataplane/virtio-blk.o
CC x86_64-softmmu/hw/scsi/virtio-scsi-dataplane.o
CC x86_64-softmmu/hw/scsi/vhost-scsi-common.o
CC aarch64-softmmu/hw/char/exynos4210_uart.o
CC x86_64-softmmu/hw/scsi/vhost-scsi.o
CC aarch64-softmmu/hw/char/omap_uart.o
CC aarch64-softmmu/hw/char/digic-uart.o
CC x86_64-softmmu/hw/scsi/vhost-user-scsi.o
CC x86_64-softmmu/hw/timer/mc146818rtc.o
CC aarch64-softmmu/hw/char/stm32f2xx_usart.o
CC x86_64-softmmu/hw/vfio/common.o
CC x86_64-softmmu/hw/vfio/pci.o
CC aarch64-softmmu/hw/char/bcm2835_aux.o
CC aarch64-softmmu/hw/char/virtio-serial-bus.o
CC x86_64-softmmu/hw/vfio/pci-quirks.o
CC aarch64-softmmu/hw/core/generic-loader.o
CC aarch64-softmmu/hw/core/null-machine.o
CC x86_64-softmmu/hw/vfio/display.o
CC aarch64-softmmu/hw/cpu/arm11mpcore.o
CC aarch64-softmmu/hw/cpu/realview_mpcore.o
CC aarch64-softmmu/hw/cpu/a9mpcore.o
CC x86_64-softmmu/hw/vfio/platform.o
CC aarch64-softmmu/hw/cpu/a15mpcore.o
CC x86_64-softmmu/hw/vfio/spapr.o
CC aarch64-softmmu/hw/display/omap_dss.o
CC aarch64-softmmu/hw/display/omap_lcdc.o
CC aarch64-softmmu/hw/display/pxa2xx_lcd.o
CC x86_64-softmmu/hw/virtio/virtio.o
CC x86_64-softmmu/hw/virtio/virtio-balloon.o
CC x86_64-softmmu/hw/virtio/virtio-crypto.o
CC aarch64-softmmu/hw/display/bcm2835_fb.o
CC x86_64-softmmu/hw/virtio/virtio-crypto-pci.o
CC aarch64-softmmu/hw/display/vga.o
CC aarch64-softmmu/hw/display/virtio-gpu.o
CC x86_64-softmmu/hw/virtio/vhost.o
CC x86_64-softmmu/hw/virtio/vhost-backend.o
CC x86_64-softmmu/hw/virtio/vhost-user.o
CC x86_64-softmmu/hw/virtio/vhost-vsock.o
CC x86_64-softmmu/hw/xen/xen-host-pci-device.o
CC aarch64-softmmu/hw/display/virtio-gpu-3d.o
CC x86_64-softmmu/hw/xen/xen_pt.o
CC aarch64-softmmu/hw/display/virtio-gpu-pci.o
CC aarch64-softmmu/hw/display/dpcd.o
CC aarch64-softmmu/hw/display/xlnx_dp.o
CC x86_64-softmmu/hw/xen/xen_pt_config_init.o
CC x86_64-softmmu/hw/xen/xen_pt_graphics.o
CC x86_64-softmmu/hw/xen/xen_pt_msi.o
CC x86_64-softmmu/hw/xen/xen_pt_load_rom.o
CC aarch64-softmmu/hw/dma/xlnx_dpdma.o
CC x86_64-softmmu/hw/i386/multiboot.o
CC x86_64-softmmu/hw/i386/pc.o
CC aarch64-softmmu/hw/dma/omap_dma.o
CC aarch64-softmmu/hw/dma/soc_dma.o
CC aarch64-softmmu/hw/dma/pxa2xx_dma.o
CC aarch64-softmmu/hw/dma/bcm2835_dma.o
CC x86_64-softmmu/hw/i386/pc_piix.o
CC x86_64-softmmu/hw/i386/pc_q35.o
CC x86_64-softmmu/hw/i386/pc_sysfw.o
CC x86_64-softmmu/hw/i386/x86-iommu.o
CC aarch64-softmmu/hw/gpio/omap_gpio.o
CC aarch64-softmmu/hw/gpio/imx_gpio.o
CC x86_64-softmmu/hw/i386/intel_iommu.o
CC x86_64-softmmu/hw/i386/amd_iommu.o
CC x86_64-softmmu/hw/i386/vmport.o
CC x86_64-softmmu/hw/i386/vmmouse.o
CC x86_64-softmmu/hw/i386/kvmvapic.o
CC x86_64-softmmu/hw/i386/acpi-build.o
CC aarch64-softmmu/hw/gpio/bcm2835_gpio.o
CC x86_64-softmmu/hw/i386/../xenpv/xen_machine_pv.o
CC x86_64-softmmu/hw/i386/kvm/clock.o
CC x86_64-softmmu/hw/i386/kvm/apic.o
CC x86_64-softmmu/hw/i386/kvm/i8259.o
CC aarch64-softmmu/hw/i2c/omap_i2c.o
CC aarch64-softmmu/hw/input/pxa2xx_keypad.o
CC x86_64-softmmu/hw/i386/kvm/ioapic.o
CC aarch64-softmmu/hw/input/tsc210x.o
CC aarch64-softmmu/hw/intc/armv7m_nvic.o
CC x86_64-softmmu/hw/i386/kvm/i8254.o
CC aarch64-softmmu/hw/intc/exynos4210_gic.o
CC aarch64-softmmu/hw/intc/exynos4210_combiner.o
CC x86_64-softmmu/hw/i386/xen/xen_platform.o
CC aarch64-softmmu/hw/intc/omap_intc.o
CC x86_64-softmmu/hw/i386/xen/xen_apic.o
CC x86_64-softmmu/hw/i386/xen/xen_pvdevice.o
CC aarch64-softmmu/hw/intc/bcm2835_ic.o
CC x86_64-softmmu/hw/i386/xen/xen-hvm.o
CC aarch64-softmmu/hw/intc/bcm2836_control.o
CC x86_64-softmmu/hw/i386/xen/xen-mapcache.o
CC x86_64-softmmu/target/i386/helper.o
CC x86_64-softmmu/target/i386/cpu.o
CC aarch64-softmmu/hw/intc/allwinner-a10-pic.o
CC aarch64-softmmu/hw/intc/aspeed_vic.o
CC aarch64-softmmu/hw/intc/arm_gicv3_cpuif.o
CC aarch64-softmmu/hw/misc/ivshmem.o
CC aarch64-softmmu/hw/misc/arm_sysctl.o
CC aarch64-softmmu/hw/misc/cbus.o
CC x86_64-softmmu/target/i386/gdbstub.o
CC x86_64-softmmu/target/i386/xsave_helper.o
CC x86_64-softmmu/target/i386/translate.o
CC aarch64-softmmu/hw/misc/exynos4210_pmu.o
CC aarch64-softmmu/hw/misc/exynos4210_clk.o
CC x86_64-softmmu/target/i386/bpt_helper.o
CC aarch64-softmmu/hw/misc/exynos4210_rng.o
CC x86_64-softmmu/target/i386/cc_helper.o
CC aarch64-softmmu/hw/misc/imx_ccm.o
CC aarch64-softmmu/hw/misc/imx31_ccm.o
CC aarch64-softmmu/hw/misc/imx25_ccm.o
CC x86_64-softmmu/target/i386/excp_helper.o
CC x86_64-softmmu/target/i386/fpu_helper.o
CC aarch64-softmmu/hw/misc/imx6_ccm.o
CC aarch64-softmmu/hw/misc/imx6_src.o
CC aarch64-softmmu/hw/misc/imx7_ccm.o
CC x86_64-softmmu/target/i386/int_helper.o
CC aarch64-softmmu/hw/misc/imx2_wdt.o
CC x86_64-softmmu/target/i386/mem_helper.o
CC aarch64-softmmu/hw/misc/imx7_snvs.o
CC aarch64-softmmu/hw/misc/imx7_gpr.o
CC aarch64-softmmu/hw/misc/mst_fpga.o
CC aarch64-softmmu/hw/misc/omap_clk.o
CC x86_64-softmmu/target/i386/misc_helper.o
CC aarch64-softmmu/hw/misc/omap_gpmc.o
CC aarch64-softmmu/hw/misc/omap_l4.o
CC x86_64-softmmu/target/i386/mpx_helper.o
CC x86_64-softmmu/target/i386/seg_helper.o
CC aarch64-softmmu/hw/misc/omap_sdrc.o
CC aarch64-softmmu/hw/misc/omap_tap.o
CC aarch64-softmmu/hw/misc/bcm2835_mbox.o
CC aarch64-softmmu/hw/misc/bcm2835_property.o
CC x86_64-softmmu/target/i386/smm_helper.o
CC aarch64-softmmu/hw/misc/bcm2835_rng.o
CC x86_64-softmmu/target/i386/svm_helper.o
CC aarch64-softmmu/hw/misc/zynq_slcr.o
CC aarch64-softmmu/hw/misc/zynq-xadc.o
CC aarch64-softmmu/hw/misc/stm32f2xx_syscfg.o
CC aarch64-softmmu/hw/misc/mps2-fpgaio.o
CC x86_64-softmmu/target/i386/machine.o
CC aarch64-softmmu/hw/misc/mps2-scc.o
CC aarch64-softmmu/hw/misc/tz-mpc.o
CC x86_64-softmmu/target/i386/arch_memory_mapping.o
CC aarch64-softmmu/hw/misc/tz-ppc.o
CC x86_64-softmmu/target/i386/arch_dump.o
CC aarch64-softmmu/hw/misc/iotkit-secctl.o
CC aarch64-softmmu/hw/misc/auxbus.o
CC x86_64-softmmu/target/i386/monitor.o
CC aarch64-softmmu/hw/misc/aspeed_scu.o
CC x86_64-softmmu/target/i386/hyperv.o
CC x86_64-softmmu/target/i386/kvm.o
CC aarch64-softmmu/hw/misc/aspeed_sdmc.o
CC aarch64-softmmu/hw/misc/mmio_interface.o
CC aarch64-softmmu/hw/misc/msf2-sysreg.o
CC aarch64-softmmu/hw/net/virtio-net.o
CC x86_64-softmmu/target/i386/sev.o
CC aarch64-softmmu/hw/net/vhost_net.o
CC aarch64-softmmu/hw/pcmcia/pxa2xx.o
GEN trace/generated-helpers.c
CC aarch64-softmmu/hw/rdma/rdma_utils.o
CC aarch64-softmmu/hw/rdma/rdma_backend.o
CC aarch64-softmmu/hw/rdma/rdma_rm.o
CC x86_64-softmmu/trace/control-target.o
CC aarch64-softmmu/hw/rdma/vmw/pvrdma_dev_ring.o
CC aarch64-softmmu/hw/rdma/vmw/pvrdma_cmd.o
CC x86_64-softmmu/gdbstub-xml.o
CC aarch64-softmmu/hw/rdma/vmw/pvrdma_qp_ops.o
CC aarch64-softmmu/hw/rdma/vmw/pvrdma_main.o
CC aarch64-softmmu/hw/scsi/virtio-scsi.o
CC x86_64-softmmu/trace/generated-helpers.o
CC aarch64-softmmu/hw/scsi/virtio-scsi-dataplane.o
CC aarch64-softmmu/hw/scsi/vhost-scsi-common.o
CC aarch64-softmmu/hw/scsi/vhost-scsi.o
CC aarch64-softmmu/hw/scsi/vhost-user-scsi.o
CC aarch64-softmmu/hw/sd/omap_mmc.o
CC aarch64-softmmu/hw/sd/pxa2xx_mmci.o
CC aarch64-softmmu/hw/sd/bcm2835_sdhost.o
CC aarch64-softmmu/hw/ssi/omap_spi.o
CC aarch64-softmmu/hw/ssi/imx_spi.o
CC aarch64-softmmu/hw/timer/exynos4210_mct.o
CC aarch64-softmmu/hw/timer/exynos4210_pwm.o
CC aarch64-softmmu/hw/timer/exynos4210_rtc.o
CC aarch64-softmmu/hw/timer/omap_gptimer.o
CC aarch64-softmmu/hw/timer/omap_synctimer.o
CC aarch64-softmmu/hw/timer/pxa2xx_timer.o
CC aarch64-softmmu/hw/timer/digic-timer.o
CC aarch64-softmmu/hw/timer/allwinner-a10-pit.o
CC aarch64-softmmu/hw/usb/tusb6010.o
CC aarch64-softmmu/hw/usb/chipidea.o
CC aarch64-softmmu/hw/vfio/common.o
CC aarch64-softmmu/hw/vfio/pci.o
CC aarch64-softmmu/hw/vfio/pci-quirks.o
CC aarch64-softmmu/hw/vfio/display.o
CC aarch64-softmmu/hw/vfio/platform.o
CC aarch64-softmmu/hw/vfio/calxeda-xgmac.o
CC aarch64-softmmu/hw/vfio/amd-xgbe.o
CC aarch64-softmmu/hw/vfio/spapr.o
CC aarch64-softmmu/hw/virtio/virtio.o
CC aarch64-softmmu/hw/virtio/virtio-balloon.o
CC aarch64-softmmu/hw/virtio/virtio-crypto.o
CC aarch64-softmmu/hw/virtio/virtio-crypto-pci.o
CC aarch64-softmmu/hw/virtio/vhost.o
CC aarch64-softmmu/hw/virtio/vhost-backend.o
CC aarch64-softmmu/hw/virtio/vhost-user.o
LINK x86_64-softmmu/qemu-system-x86_64
CC aarch64-softmmu/hw/virtio/vhost-vsock.o
CC aarch64-softmmu/hw/arm/boot.o
CC aarch64-softmmu/hw/arm/virt.o
CC aarch64-softmmu/hw/arm/sysbus-fdt.o
CC aarch64-softmmu/hw/arm/virt-acpi-build.o
CC aarch64-softmmu/hw/arm/digic_boards.o
CC aarch64-softmmu/hw/arm/exynos4_boards.o
CC aarch64-softmmu/hw/arm/highbank.o
CC aarch64-softmmu/hw/arm/integratorcp.o
CC aarch64-softmmu/hw/arm/mainstone.o
CC aarch64-softmmu/hw/arm/musicpal.o
CC aarch64-softmmu/hw/arm/netduino2.o
CC aarch64-softmmu/hw/arm/nseries.o
CC aarch64-softmmu/hw/arm/omap_sx1.o
CC aarch64-softmmu/hw/arm/palm.o
CC aarch64-softmmu/hw/arm/gumstix.o
CC aarch64-softmmu/hw/arm/spitz.o
CC aarch64-softmmu/hw/arm/tosa.o
CC aarch64-softmmu/hw/arm/z2.o
CC aarch64-softmmu/hw/arm/realview.o
CC aarch64-softmmu/hw/arm/stellaris.o
CC aarch64-softmmu/hw/arm/collie.o
CC aarch64-softmmu/hw/arm/vexpress.o
CC aarch64-softmmu/hw/arm/versatilepb.o
CC aarch64-softmmu/hw/arm/xilinx_zynq.o
CC aarch64-softmmu/hw/arm/armv7m.o
CC aarch64-softmmu/hw/arm/exynos4210.o
CC aarch64-softmmu/hw/arm/pxa2xx.o
CC aarch64-softmmu/hw/arm/pxa2xx_gpio.o
CC aarch64-softmmu/hw/arm/pxa2xx_pic.o
CC aarch64-softmmu/hw/arm/digic.o
CC aarch64-softmmu/hw/arm/omap1.o
CC aarch64-softmmu/hw/arm/omap2.o
CC aarch64-softmmu/hw/arm/strongarm.o
CC aarch64-softmmu/hw/arm/allwinner-a10.o
CC aarch64-softmmu/hw/arm/cubieboard.o
CC aarch64-softmmu/hw/arm/bcm2835_peripherals.o
CC aarch64-softmmu/hw/arm/bcm2836.o
CC aarch64-softmmu/hw/arm/raspi.o
CC aarch64-softmmu/hw/arm/stm32f205_soc.o
CC aarch64-softmmu/hw/arm/xlnx-zynqmp.o
CC aarch64-softmmu/hw/arm/xlnx-zcu102.o
CC aarch64-softmmu/hw/arm/fsl-imx25.o
CC aarch64-softmmu/hw/arm/imx25_pdk.o
CC aarch64-softmmu/hw/arm/fsl-imx31.o
CC aarch64-softmmu/hw/arm/kzm.o
CC aarch64-softmmu/hw/arm/fsl-imx6.o
CC aarch64-softmmu/hw/arm/sabrelite.o
CC aarch64-softmmu/hw/arm/aspeed_soc.o
CC aarch64-softmmu/hw/arm/aspeed.o
CC aarch64-softmmu/hw/arm/mps2.o
CC aarch64-softmmu/hw/arm/mps2-tz.o
CC aarch64-softmmu/hw/arm/msf2-soc.o
CC aarch64-softmmu/hw/arm/msf2-som.o
CC aarch64-softmmu/hw/arm/iotkit.o
CC aarch64-softmmu/hw/arm/fsl-imx7.o
CC aarch64-softmmu/hw/arm/mcimx7d-sabre.o
CC aarch64-softmmu/hw/arm/smmu-common.o
CC aarch64-softmmu/hw/arm/smmuv3.o
CC aarch64-softmmu/target/arm/arm-semi.o
CC aarch64-softmmu/target/arm/machine.o
CC aarch64-softmmu/target/arm/psci.o
CC aarch64-softmmu/target/arm/arch_dump.o
CC aarch64-softmmu/target/arm/monitor.o
CC aarch64-softmmu/target/arm/kvm-stub.o
CC aarch64-softmmu/target/arm/translate.o
CC aarch64-softmmu/target/arm/op_helper.o
CC aarch64-softmmu/target/arm/helper.o
CC aarch64-softmmu/target/arm/cpu.o
CC aarch64-softmmu/target/arm/neon_helper.o
CC aarch64-softmmu/target/arm/iwmmxt_helper.o
CC aarch64-softmmu/target/arm/vec_helper.o
CC aarch64-softmmu/target/arm/gdbstub.o
CC aarch64-softmmu/target/arm/cpu64.o
CC aarch64-softmmu/target/arm/translate-a64.o
CC aarch64-softmmu/target/arm/helper-a64.o
CC aarch64-softmmu/target/arm/gdbstub64.o
CC aarch64-softmmu/target/arm/crypto_helper.o
CC aarch64-softmmu/target/arm/arm-powerctl.o
GEN aarch64-softmmu/target/arm/decode-sve.inc.c
CC aarch64-softmmu/target/arm/sve_helper.o
GEN trace/generated-helpers.c
CC aarch64-softmmu/trace/control-target.o
CC aarch64-softmmu/gdbstub-xml.o
CC aarch64-softmmu/target/arm/translate-sve.o
CC aarch64-softmmu/trace/generated-helpers.o
/tmp/qemu-test/src/target/arm/sve_helper.c: In function 'sve_ld1_r':
/tmp/qemu-test/src/target/arm/sve_helper.c:4326:5: error: 'for' loop initial declarations are only allowed in C99 mode
for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
^
/tmp/qemu-test/src/target/arm/sve_helper.c:4326:5: note: use option -std=c99 or -std=gnu99 to compile your code
make[1]: *** [target/arm/sve_helper.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [subdir-aarch64-softmmu] Error 2
Traceback (most recent call last):
File "./tests/docker/docker.py", line 565, in <module>
sys.exit(main())
File "./tests/docker/docker.py", line 562, in main
return args.cmdobj.run(args, argv)
File "./tests/docker/docker.py", line 308, in run
return Docker().run(argv, args.keep, quiet=args.quiet)
File "./tests/docker/docker.py", line 276, in run
quiet=quiet)
File "./tests/docker/docker.py", line 183, in _do_check
return subprocess.check_call(self._command + cmd, **kwargs)
File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=31b00c3aa2cd11e8b3b752540069c830', '-u', '1000', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-6uc3dv26/src/docker-src.2018-08-18-05.57.51.13090:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2
make[1]: *** [tests/docker/Makefile.include:213: docker-run] Error 1
make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-6uc3dv26/src'
make: *** [tests/docker/Makefile.include:247: docker-run-test-quick@centos7] Error 2
real 3m54.181s
user 0m5.184s
sys 0m3.798s
=== OUTPUT END ===
Test command exited with code: 2
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ
2018-08-09 4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
@ 2018-08-23 15:21 ` Peter Maydell
2018-08-23 15:37 ` Richard Henderson
0 siblings, 1 reply; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 15:21 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The 16-byte load only uses 16 predicate bits. But while
> reusing the other load infrastructure, we find other bits
> that are set and trigger an assert. To avoid this and
> retain the assert, zero-extend the predicate that we pass
> to the LD1 helper.
>
> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate-sve.c | 25 +++++++++++++++++++++++--
> 1 file changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index d27bc8c946..bef6b8242d 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -4765,12 +4765,33 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
> unsigned vsz = vec_full_reg_size(s);
> TCGv_ptr t_pg;
> TCGv_i32 desc;
> + int poff;
>
> /* Load the first quadword using the normal predicated load helpers. */
> desc = tcg_const_i32(simd_desc(16, 16, zt));
> - t_pg = tcg_temp_new_ptr();
>
> - tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
> + poff = pred_full_reg_offset(s, pg);
> + if (vsz > 16) {
> + /*
> + * Zero-extend the first 16 bits of the predicate into a temporary.
> + * This avoids triggering an assert making sure we don't have bits
> + * set within a predicate beyond VQ, but we have lowered VQ to 1
> + * for this load operation.
> + */
> + TCGv_i64 tmp = tcg_temp_new_i64();
> +#ifdef HOST_WORDS_BIGENDIAN
> + poff += 6;
> +#endif
> + tcg_gen_ld16u_i64(tmp, cpu_env, poff);
> +
> + poff = offsetof(CPUARMState, vfp.preg_tmp);
> + tcg_gen_st_i64(tmp, cpu_env, poff);
> + tcg_temp_free_i64(tmp);
> + }
> +
> + t_pg = tcg_temp_new_ptr();
> + tcg_gen_addi_ptr(t_pg, cpu_env, poff);
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
The bigendian #ifdef in the middle of the code is a little
ugly, though -- I don't suppose it's possible to avoid it
(or abstract it away) somehow?
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ
2018-08-23 15:21 ` Peter Maydell
@ 2018-08-23 15:37 ` Richard Henderson
0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-23 15:37 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 08/23/2018 08:21 AM, Peter Maydell wrote:
> On 9 August 2018 at 05:21, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> The 16-byte load only uses 16 predicate bits. But while
>> reusing the other load infrastructure, we find other bits
>> that are set and trigger an assert. To avoid this and
>> retain the assert, zero-extend the predicate that we pass
>> to the LD1 helper.
>>
>> Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/translate-sve.c | 25 +++++++++++++++++++++++--
>> 1 file changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
>> index d27bc8c946..bef6b8242d 100644
>> --- a/target/arm/translate-sve.c
>> +++ b/target/arm/translate-sve.c
>> @@ -4765,12 +4765,33 @@ static void do_ldrq(DisasContext *s, int zt, int pg, TCGv_i64 addr, int msz)
>> unsigned vsz = vec_full_reg_size(s);
>> TCGv_ptr t_pg;
>> TCGv_i32 desc;
>> + int poff;
>>
>> /* Load the first quadword using the normal predicated load helpers. */
>> desc = tcg_const_i32(simd_desc(16, 16, zt));
>> - t_pg = tcg_temp_new_ptr();
>>
>> - tcg_gen_addi_ptr(t_pg, cpu_env, pred_full_reg_offset(s, pg));
>> + poff = pred_full_reg_offset(s, pg);
>> + if (vsz > 16) {
>> + /*
>> + * Zero-extend the first 16 bits of the predicate into a temporary.
>> + * This avoids triggering an assert making sure we don't have bits
>> + * set within a predicate beyond VQ, but we have lowered VQ to 1
>> + * for this load operation.
>> + */
>> + TCGv_i64 tmp = tcg_temp_new_i64();
>> +#ifdef HOST_WORDS_BIGENDIAN
>> + poff += 6;
>> +#endif
>> + tcg_gen_ld16u_i64(tmp, cpu_env, poff);
>> +
>> + poff = offsetof(CPUARMState, vfp.preg_tmp);
>> + tcg_gen_st_i64(tmp, cpu_env, poff);
>> + tcg_temp_free_i64(tmp);
>> + }
>> +
>> + t_pg = tcg_temp_new_ptr();
>> + tcg_gen_addi_ptr(t_pg, cpu_env, poff);
>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
>
> The bigendian #ifdef in the middle of the code is a little
> ugly, though -- I don't suppose it's possible to avoid it
> (or abstract it away) somehow?
Adding a helper function for this one use didn't seem worthwhile. But I
certainly can duplicate the form of vec_reg_offset if you prefer.
r~
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages
2018-08-09 4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
2018-08-10 9:13 ` Alex Bennée
@ 2018-08-23 16:01 ` Peter Maydell
1 sibling, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:01 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Uses tlb_vaddr_to_host for correct operation with softmmu.
> Optimize for accesses within a single page or pair of pages.
>
> Perf report comparison for cortex-strings test-strlen
> with aarch64-linux-user:
>
> before:
> 1.59% qemu-aarch64 qemu-aarch64 [.] do_sve_ld1bb_r
> 0.86% qemu-aarch64 qemu-aarch64 [.] do_sve_ldff1bb_r
> after:
> 0.09% qemu-aarch64 qemu-aarch64 [.] helper_sve_ldff1bb_r
> 0.01% qemu-aarch64 qemu-aarch64 [.] sve_ld1bb_host
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/sve_helper.c | 839 ++++++++++++++++++++++++++++++++--------
> 1 file changed, 675 insertions(+), 164 deletions(-)
Oof, this is a large patch...
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index e03f954a26..4ca9412e20 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -1688,6 +1688,45 @@ static void swap_memmove(void *vd, void *vs, size_t n)
> }
> }
>
> +/* Similarly for memset of 0. */
> +static void swap_memzero(void *vd, size_t n)
> +{
> + uintptr_t d = (uintptr_t)vd;
> + uintptr_t o = (d | n) & 7;
> + size_t i;
> +
> + if (likely(n == 0)) {
Why is "caller asked us to do nothing" the likely case?
> + return;
> + }
> +#ifndef HOST_WORDS_BIGENDIAN
> + o = 0;
> +#endif
> + switch (o) {
> + case 0:
> + memset(vd, 0, n);
> + break;
> +
> + case 4:
> + for (i = 0; i < n; i += 4) {
> + *(uint32_t *)H1_4(d + i) = 0;
> + }
> + break;
> +
> + case 2:
> + case 6:
> + for (i = 0; i < n; i += 2) {
> + *(uint16_t *)H1_2(d + i) = 0;
> + }
> + break;
> +
> + default:
> + for (i = 0; i < n; i++) {
> + *(uint8_t *)H1(d + i) = 0;
> + }
> + break;
> + }
> +}
> +
> void HELPER(sve_ext)(void *vd, void *vn, void *vm, uint32_t desc)
> {
> intptr_t opr_sz = simd_oprsz(desc);
> @@ -3927,32 +3966,438 @@ void HELPER(sve_fcmla_zpzzz_d)(CPUARMState *env, void *vg, uint32_t desc)
> /*
> * Load contiguous data, protected by a governing predicate.
> */
> -#define DO_LD1(NAME, FN, TYPEE, TYPEM, H) \
> -static void do_##NAME(CPUARMState *env, void *vd, void *vg, \
> - target_ulong addr, intptr_t oprsz, \
> - uintptr_t ra) \
> -{ \
> - intptr_t i = 0; \
> - do { \
> - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
> - do { \
> - TYPEM m = 0; \
> - if (pg & 1) { \
> - m = FN(env, addr, ra); \
> - } \
> - *(TYPEE *)(vd + H(i)) = m; \
> - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
> - addr += sizeof(TYPEM); \
> - } while (i & 15); \
> - } while (i < oprsz); \
> -} \
> -void HELPER(NAME)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> -{ \
> - do_##NAME(env, &env->vfp.zregs[simd_data(desc)], vg, \
> - addr, simd_oprsz(desc), GETPC()); \
> +
> +/* Load elements into VD, controlled by VG, from HOST+MEM_OFS.
> + * Memory is valid through MEM_MAX. The register element indicies
> + * are inferred from MEM_OFS, as modified by the types for which
> + * the helper is built. Return the MEM_OFS of the first element
> + * not loaded (which is MEM_MAX if they are all loaded).
"@vd", "@mem_ofs" is the usual style.
> + *
> + * For softmmu, we have fully validated the guest page. For user-only,
> + * we cannot fully validate without taking the mmap lock, but since we
> + * know the access is within one host page, if any access is valid they
> + * all must be valid. However, it may be that no access is valid and
> + * they have all been predicated false.
> + */
> +typedef intptr_t sve_ld1_host_fn(void *vd, void *vg, void *host,
> + intptr_t mem_ofs, intptr_t mem_max);
> +
> +/* Load one element into VD+REG_OFF from (ENV,VADDR,RA).
> + * The controlling predicate is known to be true.
> + */
> +typedef void sve_ld1_tlb_fn(CPUARMState *env, void *vd, intptr_t reg_off,
> + target_ulong vaddr, int mmu_idx, uintptr_t ra);
> +
> +/*
> + * Generate the above primitives.
> + */
> +
> +#define DO_LD_HOST(NAME, H, TYPEE, TYPEM, HOST) \
> +static intptr_t sve_##NAME##_host(void *vd, void *vg, void *host, \
> + intptr_t mem_off, const intptr_t mem_max) \
> +{ \
> + intptr_t reg_off = mem_off * (sizeof(TYPEE) / sizeof(TYPEM)); \
> + uint64_t *pg = vg; \
> + while (mem_off + sizeof(TYPEM) <= mem_max) { \
> + TYPEM val = 0; \
> + if (likely((pg[reg_off >> 6] >> (reg_off & 63)) & 1)) { \
> + val = HOST(host + mem_off); \
> + } \
> + *(TYPEE *)(vd + H(reg_off)) = val; \
> + mem_off += sizeof(TYPEM), reg_off += sizeof(TYPEE); \
> + } \
> + return mem_off; \
> }
>
> +#ifdef CONFIG_SOFTMMU
> +#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
> +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
> + target_ulong addr, int mmu_idx, uintptr_t ra) \
> +{ \
> + TCGMemOpIdx oi = make_memop_idx(ctz32(sizeof(TYPEM)) | MOEND, mmu_idx); \
> + TYPEM val = TLB(env, addr, oi, ra); \
> + *(TYPEE *)(vd + H(reg_off)) = val; \
> +}
> +#else
> +#define DO_LD_TLB(NAME, H, TYPEE, TYPEM, HOST, MOEND, TLB) \
> +static void sve_##NAME##_tlb(CPUARMState *env, void *vd, intptr_t reg_off, \
> + target_ulong addr, int mmu_idx, uintptr_t ra) \
> +{ \
> + TYPEM val = HOST(g2h(addr)); \
> + *(TYPEE *)(vd + H(reg_off)) = val; \
> +}
> +#endif
> +
> +DO_LD_TLB(ld1bb, H1, uint8_t, uint8_t, ldub_p, 0, helper_ret_ldub_mmu)
> +
> +#define DO_LD_PRIM_1(NAME, H, TE, TM) \
> + DO_LD_HOST(NAME, H, TE, TM, ldub_p) \
> + DO_LD_TLB(NAME, H, TE, TM, ldub_p, 0, helper_ret_ldub_mmu)
> +
> +DO_LD_PRIM_1(ld1bhu, H1_2, uint16_t, uint8_t)
> +DO_LD_PRIM_1(ld1bhs, H1_2, uint16_t, int8_t)
> +DO_LD_PRIM_1(ld1bsu, H1_4, uint32_t, uint8_t)
> +DO_LD_PRIM_1(ld1bss, H1_4, uint32_t, int8_t)
> +DO_LD_PRIM_1(ld1bdu, , uint64_t, uint8_t)
> +DO_LD_PRIM_1(ld1bds, , uint64_t, int8_t)
> +
> +#define DO_LD_PRIM_2(NAME, end, MOEND, H, TE, TM, PH, PT) \
> + DO_LD_HOST(NAME##_##end, H, TE, TM, PH##_##end##_p) \
> + DO_LD_TLB(NAME##_##end, H, TE, TM, PH##_##end##_p, \
> + MOEND, helper_##end##_##PT##_mmu)
> +
> +DO_LD_PRIM_2(ld1hh, le, MO_LE, H1_2, uint16_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hsu, le, MO_LE, H1_4, uint32_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hss, le, MO_LE, H1_4, uint32_t, int16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hdu, le, MO_LE, , uint64_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hds, le, MO_LE, , uint64_t, int16_t, lduw, lduw)
> +
> +DO_LD_PRIM_2(ld1ss, le, MO_LE, H1_4, uint32_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sdu, le, MO_LE, , uint64_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sds, le, MO_LE, , uint64_t, int32_t, ldl, ldul)
It's a shame that we have two different conventions here
that mean one part of this is using 'ldl' and the other 'ldul'...
> +
> +DO_LD_PRIM_2(ld1dd, le, MO_LE, , uint64_t, uint64_t, ldq, ldq)
> +
> +DO_LD_PRIM_2(ld1hh, be, MO_BE, H1_2, uint16_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hsu, be, MO_BE, H1_4, uint32_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hss, be, MO_BE, H1_4, uint32_t, int16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hdu, be, MO_BE, , uint64_t, uint16_t, lduw, lduw)
> +DO_LD_PRIM_2(ld1hds, be, MO_BE, , uint64_t, int16_t, lduw, lduw)
> +
> +DO_LD_PRIM_2(ld1ss, be, MO_BE, H1_4, uint32_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sdu, be, MO_BE, , uint64_t, uint32_t, ldl, ldul)
> +DO_LD_PRIM_2(ld1sds, be, MO_BE, , uint64_t, int32_t, ldl, ldul)
> +
> +DO_LD_PRIM_2(ld1dd, be, MO_BE, , uint64_t, uint64_t, ldq, ldq)
> +
> +#undef DO_LD_TLB
> +#undef DO_LD_HOST
> +#undef DO_LD_PRIM_1
> +#undef DO_LD_PRIM_2
> +
> +/*
> + * Special case contiguous loads of bytes to accellerate strings.
"accelerate"
> + *
> + * The assumption is that the governing predicate will be mostly true.
> + * When it is not all true, it has been set by whilelo and so has a
> + * block of true elements followed by a block of false elements.
> + * Thus anything we can do to handle as many bytes as possible in one
> + * step will pay dividends.
> + *
> + * Because of how vector registers are represented in CPUARMState,
> + * each block of 8 can be read with a little-endian load to be stored
> + * into the vector register in host-endian order.
> + *
> + * TODO: For LE host and LE guest (by far the most common combination),
> + * the only difference for other non-extending loads is the controlling
> + * predicate. Even for other combinations, it might be fastest to use
> + * this primitive to block load all of the data and then reorder the
> + * bytes afterward.
> + */
> +
> +/* For user-only, conditionally load and mask from HOST, returning 0
> + * if the predicate is false. This is required because, as described
> + * above, we have not fully validated the page, and faults are not
> + * permitted when the predicate is false.
> + * For softmmu, we never arrive here with invalid host memory; just mask.
> + */
> +static inline uint64_t ldq_le_pred_b(uint8_t pg, void *host)
> +{
> +#ifdef CONFIG_USER_ONLY
> + if (pg == 0) {
> + return 0;
> + }
> +#endif
> + return ldq_le_p(host) & expand_pred_b(pg);
> +}
> +
> +static inline uint8_t ldub_pred(uint8_t pg, void *host)
> +{
> +#ifdef CONFIG_USER_ONLY
> + return pg & 1 ? ldub_p(host) : 0;
> +#else
> + return ldub_p(host) & -(pg & 1);
> +#endif
> +}
> +
> +static intptr_t sve_ld1bb_host(void *vd, void *vg, void *host,
> + intptr_t off, const intptr_t max)
> +{
> + uint64_t *d = vd;
> + uint8_t *g = vg;
> +
> + /* Assuming OFF and MAX may be misaligned, but also the most common
> + * case is an entire vector register: OFF == 0, MAX % 16 == 0.
> + */
> + if (likely(off + 8 <= max)) {
> + const intptr_t max_div_8 = max >> 3;
> + intptr_t off_div_8 = off >> 3;
> + uint64_t data;
> +
> + if (unlikely(off & 63)) {
> + /* Align for a loop-of-8. We know from the range check
> + * above that we have enough remaining to load 8 bytes.
> + */
> + if (unlikely(off & 7)) {
> + int off_7 = off & 7;
> + uint8_t pg = g[H1(off_div_8)] >> off_7;
> +
> + off_7 *= 8;
> + data = ldq_le_pred_b(pg, host + off);
> + data = deposit64(d[off_div_8], off_7, 64 - off_7, data);
> + d[off_div_8] = data;
> +
> + off_div_8 += 1;
> + }
> +
> + /* If there are not sufficient bytes to align for 64
> + * and also execute that loop at least once, skip to tail.
> + */
> + if (ROUND_UP(off_div_8, 8) + 8 > max_div_8) {
> + goto skip_64;
> + }
> +
> + /* Align for the loop-of-64. */
> + if (unlikely(off_div_8 & 7)) {
> + do {
> + uint8_t pg = g[off_div_8];
> + data = ldq_le_pred_b(pg, host + off_div_8 * 8);
> + d[off_div_8] = data;
> + } while (++off_div_8 & 7);
> + }
> + }
> +
> + /* While we have blocks of 64 remaining, we can perform tests
> + * against large blocks of predicates at once.
> + */
> + for (; off_div_8 + 8 <= max_div_8; off_div_8 += 8) {
> + uint64_t pg = *(uint64_t *)(g + off_div_8);
> + if (likely(pg == -1ULL)) {
> +#ifndef HOST_WORDS_BIGENDIAN
> + memcpy(d + off_div_8, host + off_div_8 * 8, 64);
> +#else
> + intptr_t j;
> + for (j = 0; j < 8; j++) {
> + data = ldq_le_p(host + (off_div_8 + j) * 8);
> + d[off_div_8 + j] = data;
> + }
> +#endif
> + } else if (pg == 0) {
> + memset(d + off_div_8, 0, 64);
> + } else {
> + intptr_t j;
> + for (j = 0; j < 8; j++) {
> + data = ldq_le_pred_b(pg >> (j * 8),
> + host + (off_div_8 + j) * 8);
> + d[off_div_8 + j] = data;
> + }
> + }
> + }
> +
> + skip_64:
> + /* Final tail or a copy smaller than 64 bytes. */
> + for (; off_div_8 < max_div_8; off_div_8++) {
> + uint8_t pg = g[H1(off_div_8)];
> + data = ldq_le_pred_b(pg, host + off_div_8 * 8);
> + d[off_div_8] = data;
> + }
> +
> + /* Restore using OFF. */
> + off = off_div_8 * 8;
> + }
> +
> + /* Final tail or a really small copy. */
> + if (unlikely(off < max)) {
> + do {
> + uint8_t pg = g[H1(off >> 3)] >> (off & 7);
> + ((uint8_t *)vd)[H1(off)] = ldub_pred(pg, host + off);
> + } while (++off < max);
> + }
> +
> + return max;
> +}
This is an awful lot of cleverness for optimisation purposes.
Can't we start with "simple but works" and add the optimisation
later?
> +
> +/* Skip through a sequence of inactive elements in the guarding predicate VG,
> + * beginning at REG_OFF bounded by REG_MAX. Return the offset of the active
> + * element >= REG_OFF, or REG_MAX if there were no active elements at all.
> + */
> +static intptr_t find_next_active(uint64_t *vg, intptr_t reg_off,
> + intptr_t reg_max, int esz)
> +{
> + uint64_t pg_mask = pred_esz_masks[esz];
> + uint64_t pg = (vg[reg_off >> 6] & pg_mask) >> (reg_off & 63);
> +
> + /* In normal usage, the first element is active. */
> + if (likely(pg & 1)) {
> + return reg_off;
> + }
> +
> + if (pg == 0) {
> + reg_off &= -64;
> + do {
> + reg_off += 64;
> + if (unlikely(reg_off >= reg_max)) {
> + /* The entire predicate was false. */
> + return reg_max;
> + }
> + pg = vg[reg_off >> 6] & pg_mask;
> + } while (pg == 0);
> + }
> + reg_off += ctz64(pg);
> +
> + /* We should never see an out of range predicate bit set. */
> + tcg_debug_assert(reg_off < reg_max);
> + return reg_off;
> +}
> +
> +/* Return the maximum offset <= MEM_MAX which is still within the page
> + * referenced by BASE+MEM_OFF.
> + */
> +static intptr_t max_for_page(target_ulong base, intptr_t mem_off,
> + intptr_t mem_max)
> +{
> + target_ulong addr = base + mem_off;
> + intptr_t split = -(intptr_t)(addr | TARGET_PAGE_MASK);
> + return MIN(split, mem_max - mem_off) + mem_off;
> +}
> +
> +static inline void set_helper_retaddr(uintptr_t ra)
> +{
> +#ifdef CONFIG_USER_ONLY
> + helper_retaddr = ra;
> +#endif
> +}
> +
> +static inline bool test_host_page(void *host)
> +{
> +#ifdef CONFIG_USER_ONLY
> + return true;
> +#else
> + return likely(host != NULL);
> +#endif
> +}
> +
> +/*
> + * Common helper for all contiguous one-register predicated loads.
> + */
> +static void sve_ld1_r(CPUARMState *env, void *vg, const target_ulong addr,
> + uint32_t desc, const uintptr_t retaddr,
> + const int esz, const int msz,
> + sve_ld1_host_fn *host_fn,
> + sve_ld1_tlb_fn *tlb_fn)
> +{
> + void *vd = &env->vfp.zregs[simd_data(desc)];
> + const int diffsz = esz - msz;
> + const intptr_t reg_max = simd_oprsz(desc);
> + const intptr_t mem_max = reg_max >> diffsz;
> + const int mmu_idx = cpu_mmu_index(env, false);
> + ARMVectorReg scratch;
> + void *host, *result;
> + intptr_t split;
> +
> + set_helper_retaddr(retaddr);
> +
> + host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> + if (test_host_page(host)) {
> + split = max_for_page(addr, 0, mem_max);
> + if (likely(split == mem_max)) {
> + /* The load is entirely within a valid page. For softmmu,
> + * no faults. For user-only, if the first byte does not
> + * fault then none of them will fault, so Vd will never be
> + * partially modified.
> + */
> + host_fn(vd, vg, host, 0, mem_max);
> + set_helper_retaddr(0);
> + return;
> + }
> + }
> +
> + /* Perform the predicated read into a temporary, thus ensuring
> + * if the load of the last element faults, Vd is not modified.
> + */
> + result = &scratch;
> +#ifdef CONFIG_USER_ONLY
> + host_fn(vd, vg, host, 0, mem_max);
> +#else
> + memset(result, 0, reg_max);
> + for (intptr_t reg_off = find_next_active(vg, 0, reg_max, esz);
> + reg_off < reg_max;
> + reg_off = find_next_active(vg, reg_off, reg_max, esz)) {
> + intptr_t mem_off = reg_off >> diffsz;
> +
> + split = max_for_page(addr, mem_off, mem_max);
> + if (msz == 0 || split - mem_off >= (1 << msz)) {
> + /* At least one whole element on this page. */
> + host = tlb_vaddr_to_host(env, addr + mem_off,
> + MMU_DATA_LOAD, mmu_idx);
> + if (host) {
> + mem_off = host_fn(result, vg, host - mem_off, mem_off, split);
> + reg_off = mem_off << diffsz;
> + continue;
> + }
> + }
> +
> + /* Perform one normal read. This may fault, longjmping out to the
> + * main loop in order to raise an exception. It may succeed, and
> + * as a side-effect load the TLB entry for the next round. Finally,
> + * in the extremely unlikely case we're performing this operation
> + * on I/O memory, it may succeed but not bring in the TLB entry.
> + * But even then we have still made forward progress.
> + */
> + tlb_fn(env, result, reg_off, addr + mem_off, mmu_idx, retaddr);
> + reg_off += 1 << esz;
> + }
> +#endif
> +
> + set_helper_retaddr(0);
> + memcpy(vd, result, reg_max);
> +}
> +
> +#define DO_LD1_1(NAME, ESZ) \
> +void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, 0, \
> + sve_##NAME##_host, sve_##NAME##_tlb); \
> +}
> +
> +/* TODO: Propagate the endian check back to the translator. */
> +#define DO_LD1_2(NAME, ESZ, MSZ) \
> +void HELPER(sve_##NAME##_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + if (arm_cpu_data_is_big_endian(env)) { \
> + sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_##NAME##_be_host, sve_##NAME##_be_tlb); \
> + } else { \
> + sve_ld1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_##NAME##_le_host, sve_##NAME##_le_tlb); \
> + } \
> +}
> +
> +DO_LD1_1(ld1bb, 0)
> +DO_LD1_1(ld1bhu, 1)
> +DO_LD1_1(ld1bhs, 1)
> +DO_LD1_1(ld1bsu, 2)
> +DO_LD1_1(ld1bss, 2)
> +DO_LD1_1(ld1bdu, 3)
> +DO_LD1_1(ld1bds, 3)
> +
> +DO_LD1_2(ld1hh, 1, 1)
> +DO_LD1_2(ld1hsu, 2, 1)
> +DO_LD1_2(ld1hss, 2, 1)
> +DO_LD1_2(ld1hdu, 3, 1)
> +DO_LD1_2(ld1hds, 3, 1)
> +
> +DO_LD1_2(ld1ss, 2, 2)
> +DO_LD1_2(ld1sdu, 3, 2)
> +DO_LD1_2(ld1sds, 3, 2)
> +
> +DO_LD1_2(ld1dd, 3, 3)
> +
> +#undef DO_LD1_1
> +#undef DO_LD1_2
> +
> #define DO_LD2(NAME, FN, TYPEE, TYPEM, H) \
> void HELPER(NAME)(CPUARMState *env, void *vg, \
> target_ulong addr, uint32_t desc) \
> @@ -4037,52 +4482,40 @@ void HELPER(NAME)(CPUARMState *env, void *vg, \
> } \
> }
>
> -DO_LD1(sve_ld1bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
> -DO_LD1(sve_ld1bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
> -DO_LD1(sve_ld1bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
> -DO_LD1(sve_ld1bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
> -DO_LD1(sve_ld1bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
> -DO_LD1(sve_ld1bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
> -
> -DO_LD1(sve_ld1hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
> -DO_LD1(sve_ld1hss_r, cpu_ldsw_data_ra, uint32_t, int16_t, H1_4)
> -DO_LD1(sve_ld1hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
> -DO_LD1(sve_ld1hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
> -
> -DO_LD1(sve_ld1sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
> -DO_LD1(sve_ld1sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
> -
> -DO_LD1(sve_ld1bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
> DO_LD2(sve_ld2bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
> DO_LD3(sve_ld3bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
> DO_LD4(sve_ld4bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
>
> -DO_LD1(sve_ld1hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
> DO_LD2(sve_ld2hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
> DO_LD3(sve_ld3hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
> DO_LD4(sve_ld4hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
>
> -DO_LD1(sve_ld1ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
> DO_LD2(sve_ld2ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
> DO_LD3(sve_ld3ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
> DO_LD4(sve_ld4ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
>
> -DO_LD1(sve_ld1dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
> DO_LD2(sve_ld2dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
> DO_LD3(sve_ld3dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
> DO_LD4(sve_ld4dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
>
> -#undef DO_LD1
> #undef DO_LD2
> #undef DO_LD3
> #undef DO_LD4
>
> /*
> * Load contiguous data, first-fault and no-fault.
> + *
> + * For user-only, one could argue that we should hold the mmap_lock during
> + * the operation so that there is no race between page_check_range and the
> + * load operation. However, unmapping pages out from under operating thread
missing "an" ?
> + * is extrodinarily unlikely. This theoretical race condition also affects
"extraordinarily"
> + * linux-user/ in its get_user/put_user macros.
> + *
> + * TODO: Construct some helpers, written in assembly, that interact with
> + * handle_cpu_signal to produce memory ops which can properly report errors
> + * without racing.
> */
>
> -#ifdef CONFIG_USER_ONLY
> -
> /* Fault on byte I. All bits in FFR from I are cleared. The vector
> * result from I is CONSTRAINED UNPREDICTABLE; we choose the MERGE
> * option, which leaves subsequent data unchanged.
> @@ -4092,147 +4525,225 @@ static void record_fault(CPUARMState *env, uintptr_t i, uintptr_t oprsz)
> uint64_t *ffr = env->vfp.pregs[FFR_PRED_NUM].p;
>
> if (i & 63) {
> - ffr[i / 64] &= MAKE_64BIT_MASK(0, i & 63);
> + ffr[i >> 6] &= MAKE_64BIT_MASK(0, i & 63);
> i = ROUND_UP(i, 64);
> }
> for (; i < oprsz; i += 64) {
> - ffr[i / 64] = 0;
> + ffr[i >> 6] = 0;
> }
> }
Should be in different patch ?
>
> -/* Hold the mmap lock during the operation so that there is no race
> - * between page_check_range and the load operation. We expect the
> - * usual case to have no faults at all, so we check the whole range
> - * first and if successful defer to the normal load operation.
> - *
> - * TODO: Change mmap_lock to a rwlock so that multiple readers
> - * can run simultaneously. This will probably help other uses
> - * within QEMU as well.
> +/*
> + * Common helper for all contiguous first-fault loads.
> */
> -#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) \
> -static void do_sve_ldff1##PART(CPUARMState *env, void *vd, void *vg, \
> - target_ulong addr, intptr_t oprsz, \
> - bool first, uintptr_t ra) \
> -{ \
> - intptr_t i = 0; \
> - do { \
> - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
> - do { \
> - TYPEM m = 0; \
> - if (pg & 1) { \
> - if (!first && \
> - unlikely(page_check_range(addr, sizeof(TYPEM), \
> - PAGE_READ))) { \
> - record_fault(env, i, oprsz); \
> - return; \
> - } \
> - m = FN(env, addr, ra); \
> - first = false; \
> - } \
> - *(TYPEE *)(vd + H(i)) = m; \
> - i += sizeof(TYPEE), pg >>= sizeof(TYPEE); \
> - addr += sizeof(TYPEM); \
> - } while (i & 15); \
> - } while (i < oprsz); \
> -} \
> -void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> -{ \
> - intptr_t oprsz = simd_oprsz(desc); \
> - unsigned rd = simd_data(desc); \
> - void *vd = &env->vfp.zregs[rd]; \
> - mmap_lock(); \
> - if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) { \
> - do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC()); \
> - } else { \
> - do_sve_ldff1##PART(env, vd, vg, addr, oprsz, true, GETPC()); \
> - } \
> - mmap_unlock(); \
> -}
> +static void sve_ldff1_r(CPUARMState *env, void *vg, const target_ulong addr,
> + uint32_t desc, const uintptr_t retaddr,
> + const int esz, const int msz,
> + sve_ld1_host_fn *host_fn,
> + sve_ld1_tlb_fn *tlb_fn)
> +{
> + void *vd = &env->vfp.zregs[simd_data(desc)];
> + const int diffsz = esz - msz;
> + const intptr_t reg_max = simd_oprsz(desc);
> + const intptr_t mem_max = reg_max >> diffsz;
> + const int mmu_idx = cpu_mmu_index(env, false);
> + intptr_t split, reg_off, mem_off;
> + void *host;
>
> -/* No-fault loads are like first-fault loads without the
> - * first faulting special case.
> - */
> -#define DO_LDNF1(PART) \
> -void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> -{ \
> - intptr_t oprsz = simd_oprsz(desc); \
> - unsigned rd = simd_data(desc); \
> - void *vd = &env->vfp.zregs[rd]; \
> - mmap_lock(); \
> - if (likely(page_check_range(addr, oprsz, PAGE_READ) == 0)) { \
> - do_sve_ld1##PART(env, vd, vg, addr, oprsz, GETPC()); \
> - } else { \
> - do_sve_ldff1##PART(env, vd, vg, addr, oprsz, false, GETPC()); \
> - } \
> - mmap_unlock(); \
> -}
> + set_helper_retaddr(retaddr);
>
> + split = max_for_page(addr, 0, mem_max);
> + if (likely(split == mem_max)) {
> + /* The entire operation is within one page. */
> + host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> + if (test_host_page(host)) {
> + mem_off = host_fn(vd, vg, host, 0, mem_max);
> + tcg_debug_assert(mem_off == mem_max);
> + set_helper_retaddr(0);
> + return;
> + }
> + }
> +
> + /* Skip to the first true predicate. */
> + reg_off = find_next_active(vg, 0, reg_max, esz);
> + if (unlikely(reg_off == reg_max)) {
> + /* The entire predicate was false; no load occurs. */
> + set_helper_retaddr(0);
> + memset(vd, 0, reg_max);
> + return;
> + }
> + mem_off = reg_off >> diffsz;
> +
> +#ifdef CONFIG_USER_ONLY
> + /* The page(s) containing this first element at ADDR+MEM_OFF must
> + * be valid. Considering that this first element may be misaligned
> + * and cross a page boundary itself, take the rest of the page from
> + * the last byte of the element.
> + */
> + split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
> + mem_off = host_fn(vd, vg, g2h(addr), mem_off, split);
> +
> + /* After any fault, zero any leading predicated false elts. */
> + swap_memzero(vd, reg_off);
> + reg_off = mem_off << diffsz;
> #else
> + /* Perform one normal read, which will fault or not.
> + * But it is likely to bring the page into the tlb.
> + */
> + tlb_fn(env, vd, reg_off, addr + mem_off, mmu_idx, retaddr);
>
> -/* TODO: System mode is not yet supported.
> - * This would probably use tlb_vaddr_to_host.
> - */
> -#define DO_LDFF1(PART, FN, TYPEE, TYPEM, H) \
> -void HELPER(sve_ldff1##PART)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> -{ \
> - g_assert_not_reached(); \
> -}
> -
> -#define DO_LDNF1(PART) \
> -void HELPER(sve_ldnf1##PART)(CPUARMState *env, void *vg, \
> - target_ulong addr, uint32_t desc) \
> -{ \
> - g_assert_not_reached(); \
> -}
> + /* After any fault, zero any leading predicated false elts. */
> + swap_memzero(vd, reg_off);
> + mem_off += 1 << msz;
> + reg_off += 1 << esz;
>
> + /* Try again to read the balance of the page. */
> + split = max_for_page(addr, mem_off - 1, mem_max);
> + if (split >= (1 << msz)) {
> + host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
> + if (host) {
> + mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
> + reg_off = mem_off << diffsz;
> + }
> + }
> #endif
>
> -DO_LDFF1(bb_r, cpu_ldub_data_ra, uint8_t, uint8_t, H1)
> -DO_LDFF1(bhu_r, cpu_ldub_data_ra, uint16_t, uint8_t, H1_2)
> -DO_LDFF1(bhs_r, cpu_ldsb_data_ra, uint16_t, int8_t, H1_2)
> -DO_LDFF1(bsu_r, cpu_ldub_data_ra, uint32_t, uint8_t, H1_4)
> -DO_LDFF1(bss_r, cpu_ldsb_data_ra, uint32_t, int8_t, H1_4)
> -DO_LDFF1(bdu_r, cpu_ldub_data_ra, uint64_t, uint8_t, )
> -DO_LDFF1(bds_r, cpu_ldsb_data_ra, uint64_t, int8_t, )
> + set_helper_retaddr(0);
> + record_fault(env, reg_off, reg_max);
> +}
>
> -DO_LDFF1(hh_r, cpu_lduw_data_ra, uint16_t, uint16_t, H1_2)
> -DO_LDFF1(hsu_r, cpu_lduw_data_ra, uint32_t, uint16_t, H1_4)
> -DO_LDFF1(hss_r, cpu_ldsw_data_ra, uint32_t, int8_t, H1_4)
> -DO_LDFF1(hdu_r, cpu_lduw_data_ra, uint64_t, uint16_t, )
> -DO_LDFF1(hds_r, cpu_ldsw_data_ra, uint64_t, int16_t, )
> +/*
> + * Common helper for all contiguous no-fault loads.
> + */
> +static void sve_ldnf1_r(CPUARMState *env, void *vg, const target_ulong addr,
> + uint32_t desc, const int esz, const int msz,
> + sve_ld1_host_fn *host_fn)
> +{
> + void *vd = &env->vfp.zregs[simd_data(desc)];
> + const int diffsz = esz - msz;
> + const intptr_t reg_max = simd_oprsz(desc);
> + const intptr_t mem_max = reg_max >> diffsz;
> + intptr_t split, reg_off, mem_off;
> + void *host;
>
> -DO_LDFF1(ss_r, cpu_ldl_data_ra, uint32_t, uint32_t, H1_4)
> -DO_LDFF1(sdu_r, cpu_ldl_data_ra, uint64_t, uint32_t, )
> -DO_LDFF1(sds_r, cpu_ldl_data_ra, uint64_t, int32_t, )
> +#ifdef CONFIG_USER_ONLY
> + /* Do not set helper_retaddr as there should be no fault. */
> + host = g2h(addr);
> + if (likely(page_check_range(addr, mem_max, PAGE_READ) == 0)) {
> + /* The entire operation is valid. */
> + host_fn(vd, vg, host, 0, mem_max);
> + return;
> + }
> +#else
> + const int mmu_idx = extract32(desc, SIMD_DATA_SHIFT, 4);
> + /* Unless we can load the entire vector from the same page,
> + * we need to search for the first active element.
> + */
> + split = max_for_page(addr, 0, mem_max);
> + if (likely(split == mem_max)) {
> + host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_idx);
> + if (host) {
> + host_fn(vd, vg, host, 0, mem_max);
> + return;
> + }
> + }
> +#endif
>
> -DO_LDFF1(dd_r, cpu_ldq_data_ra, uint64_t, uint64_t, )
> + /* There will be no fault, so we may modify in advance. */
> + memset(vd, 0, reg_max);
>
> -#undef DO_LDFF1
> + /* Skip to the first true predicate. */
> + reg_off = find_next_active(vg, 0, reg_max, esz);
> + if (unlikely(reg_off == reg_max)) {
> + /* The entire predicate was false; no load occurs. */
> + return;
> + }
> + mem_off = reg_off >> diffsz;
>
> -DO_LDNF1(bb_r)
> -DO_LDNF1(bhu_r)
> -DO_LDNF1(bhs_r)
> -DO_LDNF1(bsu_r)
> -DO_LDNF1(bss_r)
> -DO_LDNF1(bdu_r)
> -DO_LDNF1(bds_r)
> +#ifdef CONFIG_USER_ONLY
> + if (page_check_range(addr + mem_off, 1 << msz, PAGE_READ) == 0) {
> + /* At least one load is valid; take the rest of the page. */
> + split = max_for_page(addr, mem_off + (1 << msz) - 1, mem_max);
> + mem_off = host_fn(vd, vg, host, mem_off, split);
> + reg_off = mem_off << diffsz;
> + }
> +#else
> + /* If the address is not in the TLB, we have no way to bring the
> + * entry into the TLB without also risking a fault. Note that
> + * the corollary is that we never load from an address not in RAM.
> + * ??? This last may be out of spec.
Yes, it is out of spec, in a slightly weird corner case.
Per the MemNF/MemSingleNF pseudocode, an NF load from
Device memory mustn't actually go externally -- it returns
UNKNOWN data instead. But if you map non-RAM with Normal
memory attributes and do an NF load then it should really
access the bus. (Nobody's going to actually do this in the
real world, obviously.)
To do this fully correctly you would need to do a get_phys_addr()
then an address_space_ld*(), checking for either VA->PA failure or
getting back a non-MEMTX_OK result on the physical access.
You'd also need to make get_phys_addr() tell you the Arm memory
attributes so you could avoid doing the access on Device memory.
There are probably also annoying special cases with interactions
with watchpoints set up via the architectural debug.
We can probably ignore all this for now apart from making
a TODO comment...
> + */
> + host = tlb_vaddr_to_host(env, addr + mem_off, MMU_DATA_LOAD, mmu_idx);
> + split = max_for_page(addr, mem_off, mem_max);
> + if (host && split >= (1 << msz)) {
> + mem_off = host_fn(vd, vg, host - mem_off, mem_off, split);
> + reg_off = mem_off << diffsz;
> + }
> +#endif
>
> -DO_LDNF1(hh_r)
> -DO_LDNF1(hsu_r)
> -DO_LDNF1(hss_r)
> -DO_LDNF1(hdu_r)
> -DO_LDNF1(hds_r)
> + record_fault(env, reg_off, reg_max);
> +}
>
> -DO_LDNF1(ss_r)
> -DO_LDNF1(sdu_r)
> -DO_LDNF1(sds_r)
> +#define DO_LDFF1_LDNF1_1(PART, ESZ) \
> +void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, 0, \
> + sve_ld1##PART##_host, sve_ld1##PART##_tlb); \
> +} \
> +void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + sve_ldnf1_r(env, vg, addr, desc, ESZ, 0, sve_ld1##PART##_host); \
> +}
>
> -DO_LDNF1(dd_r)
> +/* TODO: Propagate the endian check back to the translator. */
> +#define DO_LDFF1_LDNF1_2(PART, ESZ, MSZ) \
> +void HELPER(sve_ldff1##PART##_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + if (arm_cpu_data_is_big_endian(env)) { \
> + sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_ld1##PART##_be_host, sve_ld1##PART##_be_tlb); \
> + } else { \
> + sve_ldff1_r(env, vg, addr, desc, GETPC(), ESZ, MSZ, \
> + sve_ld1##PART##_le_host, sve_ld1##PART##_le_tlb); \
> + } \
> +} \
> +void HELPER(sve_ldnf1##PART##_r)(CPUARMState *env, void *vg, \
> + target_ulong addr, uint32_t desc) \
> +{ \
> + if (arm_cpu_data_is_big_endian(env)) { \
> + sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
> + sve_ld1##PART##_be_host); \
> + } else { \
> + sve_ldnf1_r(env, vg, addr, desc, ESZ, MSZ, \
> + sve_ld1##PART##_le_host); \
> + } \
> +}
>
> -#undef DO_LDNF1
> +DO_LDFF1_LDNF1_1(bb, 0)
> +DO_LDFF1_LDNF1_1(bhu, 1)
> +DO_LDFF1_LDNF1_1(bhs, 1)
> +DO_LDFF1_LDNF1_1(bsu, 2)
> +DO_LDFF1_LDNF1_1(bss, 2)
> +DO_LDFF1_LDNF1_1(bdu, 3)
> +DO_LDFF1_LDNF1_1(bds, 3)
> +
> +DO_LDFF1_LDNF1_2(hh, 1, 1)
> +DO_LDFF1_LDNF1_2(hsu, 2, 1)
> +DO_LDFF1_LDNF1_2(hss, 2, 1)
> +DO_LDFF1_LDNF1_2(hdu, 3, 1)
> +DO_LDFF1_LDNF1_2(hds, 3, 1)
> +
> +DO_LDFF1_LDNF1_2(ss, 2, 2)
> +DO_LDFF1_LDNF1_2(sdu, 3, 2)
> +DO_LDFF1_LDNF1_2(sds, 3, 2)
> +
> +DO_LDFF1_LDNF1_2(dd, 3, 3)
> +
> +#undef DO_LDFF1_LDNF1_1
> +#undef DO_LDFF1_LDNF1_2
>
> /*
> * Store contiguous data, protected by a governing predicate.
Generally the code looks ok, but there was so much of it that
I kind of stopped checking the details...
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r
2018-08-09 4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
@ 2018-08-23 16:04 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:04 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:21, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Use the same *_tlb primitives as we use for ld1. This is not
> a significant change, but does (for linux-user) hoist the set
> of helper_retaddr, and (for softmmu) hoist the computation of
> the current mmu_idx outside the loop.
>
> This does fix the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r
2018-08-09 4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
@ 2018-08-23 16:06 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:06 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This fixes the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads
2018-08-09 4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
@ 2018-08-23 16:08 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:08 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This fixes the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper-sve.h | 84 +++++++++----
> target/arm/sve_helper.c | 218 +++++++++++++++++++++++----------
> target/arm/translate-sve.c | 244 +++++++++++++++++++++++++------------
> 3 files changed, 380 insertions(+), 166 deletions(-)
>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores
2018-08-09 4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
@ 2018-08-23 16:09 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:09 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This fixes the endianness problem for softmmu, and does
> move the main loop out of a macro and into an inlined function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper-sve.h | 52 ++++++++++----
> target/arm/sve_helper.c | 139 ++++++++++++++++++++++++-------------
> target/arm/translate-sve.c | 74 +++++++++++++-------
> 3 files changed, 177 insertions(+), 88 deletions(-)
>
> diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
> index 76d3f021e4..0a4756bff9 100644
> --- a/target/arm/sve_helper.c
> +++ b/target/arm/sve_helper.c
> @@ -5235,61 +5235,100 @@ DO_LDFF1_ZPZ_D(sve_ldffsds_zd, uint64_t, int32_t, cpu_ldl_data_ra)
>
> /* Stores with a vector index. */
>
> -#define DO_ST1_ZPZ_S(NAME, TYPEI, FN) \
> -void HELPER(NAME)(CPUARMState *env, void *vd, void *vg, void *vm, \
> - target_ulong base, uint32_t desc) \
> -{ \
> - intptr_t i, oprsz = simd_oprsz(desc); \
> - unsigned scale = simd_data(desc); \
> - uintptr_t ra = GETPC(); \
> - for (i = 0; i < oprsz; ) { \
> - uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3)); \
> - do { \
> - if (likely(pg & 1)) { \
> - target_ulong off = *(TYPEI *)(vm + H1_4(i)); \
> - uint32_t d = *(uint32_t *)(vd + H1_4(i)); \
> - FN(env, base + (off << scale), d, ra); \
> - } \
> - i += sizeof(uint32_t), pg >>= sizeof(uint32_t); \
> - } while (i & 15); \
> - } \
> +static void sve_st1_zs(CPUARMState *env, void *vd, void *vg, void *vm,
> + target_ulong base, uint32_t desc, uintptr_t ra,
> + zreg_off_fn *off_fn, sve_ld1_tlb_fn *tlb_fn)
> +{
> + const int mmu_idx = cpu_mmu_index(env, false);
> + intptr_t i, oprsz = simd_oprsz(desc);
> + unsigned scale = simd_data(desc);
> +
> + set_helper_retaddr(ra);
> + for (i = 0; i < oprsz; ) {
> + uint16_t pg = *(uint16_t *)(vg + H1_2(i >> 3));
> + do {
> + if (pg & 1) {
Is dropping the "likely()" off this conditional intentional ?
> + target_ulong off = off_fn(vm, i);
> + tlb_fn(env, vd, i, base + (off << scale), mmu_idx, ra);
> + }
> + i += 4, pg >>= 4;
> + } while (i & 15);
> + }
> + set_helper_retaddr(0);
> }
>
Either way,
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads
2018-08-09 4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
@ 2018-08-23 16:10 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:10 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This implements the feature for softmmu, and moves the
> main loop out of a macro and into a function.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper-sve.h | 84 ++++++++---
> target/arm/sve_helper.c | 290 +++++++++++++++++++++++++++----------
> target/arm/translate-sve.c | 84 +++++------
> 3 files changed, 321 insertions(+), 137 deletions(-)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers
2018-08-09 4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
@ 2018-08-23 16:23 ` Peter Maydell
0 siblings, 0 replies; 51+ messages in thread
From: Peter Maydell @ 2018-08-23 16:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 9 August 2018 at 05:22, Richard Henderson
<richard.henderson@linaro.org> wrote:
> There is quite a lot of code required to compute cpu_mem_index,
> or even put together the full TCGMemOpIdx. This can easily be
> done at translation time.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/internals.h | 5 ++
> target/arm/sve_helper.c | 138 +++++++++++++++++++------------------
> target/arm/translate-sve.c | 67 +++++++++++-------
> 3 files changed, 121 insertions(+), 89 deletions(-)
>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode
2018-08-17 16:22 ` Peter Maydell
@ 2018-08-25 19:41 ` Richard Henderson
0 siblings, 0 replies; 51+ messages in thread
From: Richard Henderson @ 2018-08-25 19:41 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, Laurent Desnogues, Alex Bennée
On 08/17/2018 09:22 AM, Peter Maydell wrote:
>> + /*
>> + * When FP is enabled, but SVE is disabled, the effective len is 0.
>> + * ??? How should sve_exception_el interact with AArch32 state?
>> + * That isn't included in the CheckSVEEnabled pseudocode, so is the
>> + * host kernel required to explicitly disable SVE for an EL using aa32?
>> + */
> I'm not clear what you're asking here. If the EL is AArch32
> then why does it make a difference if SVE is enabled or disabled?
> You can't get at it...
>
>> + old_len = (sve_exception_el(env, old_el)
>> + ? 0 : sve_zcr_len_for_el(env, old_el));
>> + new_len = (sve_exception_el(env, new_el)
>> + ? 0 : sve_zcr_len_for_el(env, new_el));
Yes the registers are inaccessible. But...
It may be that we must produce old_len/new_len == 0 if old_el/new_el is in
32-bit mode, so that the high part of the SVE registers are zeroed.
However, it may be UNDEFINED what happens if the OS switches to an el in 32-bit
mode while CPACR.ZEN == 3. And if so, then there may be no point in adding an
additional test here.
So far I have re-worded the comment as:
* ??? Do we need a conditional for old_el/new_el in aa32 state?
* That isn't included in the CheckSVEEnabled pseudocode, so is the
* host kernel required to explicitly disable SVE for an EL using aa32?
Thoughts on the underlying issue?
r~
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2018-08-25 19:41 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-09 4:21 [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 01/20] target/arm: Set ISAR bits for -cpu max Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 02/20] target/arm: Set ID_AA64PFR0 bits for SVE " Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 03/20] target/arm: Define ID_AA64ZFR0_EL1 Richard Henderson
2018-08-17 15:50 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 04/20] target/arm: Adjust sve_exception_el Richard Henderson
2018-08-17 15:57 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 05/20] target/arm: Fix arm_cpu_data_is_big_endian for aa64 user-only Richard Henderson
2018-08-17 16:02 ` Peter Maydell
2018-08-17 16:47 ` Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 06/20] target/arm: Fix arm_current_el for user-only Richard Henderson
2018-08-17 16:03 ` Peter Maydell
2018-08-17 16:51 ` Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 07/20] target/arm: Fix is_a64 " Richard Henderson
2018-08-17 16:03 ` Peter Maydell
2018-08-17 16:10 ` Laurent Desnogues
2018-08-17 16:23 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 08/20] target/arm: Pass in current_el to fp and sve_exception_el Richard Henderson
2018-08-09 18:01 ` Alex Bennée
2018-08-09 18:50 ` Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 09/20] target/arm: Handle SVE vector length changes in system mode Richard Henderson
2018-08-17 16:22 ` Peter Maydell
2018-08-25 19:41 ` Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 10/20] target/arm: Adjust aarch64_cpu_dump_state for system mode SVE Richard Henderson
2018-08-17 16:35 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 11/20] target/arm: Clear unused predicate bits for LD1RQ Richard Henderson
2018-08-23 15:21 ` Peter Maydell
2018-08-23 15:37 ` Richard Henderson
2018-08-09 4:21 ` [Qemu-devel] [PATCH 12/20] target/arm: Rewrite helper_sve_ld1*_r using pages Richard Henderson
2018-08-10 9:13 ` Alex Bennée
2018-08-10 19:15 ` Richard Henderson
2018-08-23 16:01 ` Peter Maydell
2018-08-09 4:21 ` [Qemu-devel] [PATCH 13/20] target/arm: Rewrite helper_sve_ld[234]*_r Richard Henderson
2018-08-23 16:04 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 14/20] target/arm: Rewrite helper_sve_st[1234]*_r Richard Henderson
2018-08-23 16:06 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 15/20] target/arm: Split contiguous loads for endianness Richard Henderson
2018-08-11 5:40 ` Philippe Mathieu-Daudé
2018-08-09 4:22 ` [Qemu-devel] [PATCH 16/20] target/arm: Split contiguous stores " Richard Henderson
2018-08-11 5:41 ` Philippe Mathieu-Daudé
2018-08-09 4:22 ` [Qemu-devel] [PATCH 17/20] target/arm: Rewrite vector gather loads Richard Henderson
2018-08-23 16:08 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 18/20] target/arm: Rewrite vector gather stores Richard Henderson
2018-08-23 16:09 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 19/20] target/arm: Rewrite vector gather first-fault loads Richard Henderson
2018-08-23 16:10 ` Peter Maydell
2018-08-09 4:22 ` [Qemu-devel] [PATCH 20/20] target/arm: Pass TCGMemOpIdx to sve memory helpers Richard Henderson
2018-08-23 16:23 ` Peter Maydell
2018-08-09 5:48 ` [Qemu-devel] [PATCH 00/20] target/arm: sve system mode patches Laurent Desnogues
2018-08-18 9:15 ` no-reply
2018-08-18 10:01 ` no-reply
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).