* [Qemu-devel] [PATCH 1/9] target/arm: Mark disas_set_insn_syndrome inline
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2017-12-18 20:27 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2018-01-11 18:45 ` [Qemu-devel] " Peter Maydell
2017-12-18 17:30 ` [Qemu-devel] [PATCH 2/9] target/arm: Use pointers in crypto helpers Richard Henderson
` (7 subsequent siblings)
8 siblings, 2 replies; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
If it isn't used when translate.h is included,
we'll get a compiler Werror.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/arm/translate.h b/target/arm/translate.h
index cd7313ace7..3f4df91e5e 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -108,7 +108,7 @@ static inline int default_exception_el(DisasContext *s)
? 3 : MAX(1, s->current_el);
}
-static void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
+static inline void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
{
/* We don't need to save all of the syndrome so we mask and shift
* out unneeded bits to help the sleb128 encoder do a better job.
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [Qemu-arm] [PATCH 1/9] target/arm: Mark disas_set_insn_syndrome inline
2017-12-18 17:30 ` [Qemu-devel] [PATCH 1/9] target/arm: Mark disas_set_insn_syndrome inline Richard Henderson
@ 2017-12-18 20:27 ` Philippe Mathieu-Daudé
2018-01-11 18:45 ` [Qemu-devel] " Peter Maydell
1 sibling, 0 replies; 31+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-12-18 20:27 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: peter.maydell, qemu-arm
On 12/18/2017 02:30 PM, Richard Henderson wrote:
> If it isn't used when translate.h is included,
> we'll get a compiler Werror.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
> target/arm/translate.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate.h b/target/arm/translate.h
> index cd7313ace7..3f4df91e5e 100644
> --- a/target/arm/translate.h
> +++ b/target/arm/translate.h
> @@ -108,7 +108,7 @@ static inline int default_exception_el(DisasContext *s)
> ? 3 : MAX(1, s->current_el);
> }
>
> -static void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
> +static inline void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
> {
> /* We don't need to save all of the syndrome so we mask and shift
> * out unneeded bits to help the sleb128 encoder do a better job.
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 1/9] target/arm: Mark disas_set_insn_syndrome inline
2017-12-18 17:30 ` [Qemu-devel] [PATCH 1/9] target/arm: Mark disas_set_insn_syndrome inline Richard Henderson
2017-12-18 20:27 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
@ 2018-01-11 18:45 ` Peter Maydell
1 sibling, 0 replies; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:45 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> If it isn't used when translate.h is included,
> we'll get a compiler Werror.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/translate.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/arm/translate.h b/target/arm/translate.h
> index cd7313ace7..3f4df91e5e 100644
> --- a/target/arm/translate.h
> +++ b/target/arm/translate.h
> @@ -108,7 +108,7 @@ static inline int default_exception_el(DisasContext *s)
> ? 3 : MAX(1, s->current_el);
> }
>
> -static void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
> +static inline void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
> {
> /* We don't need to save all of the syndrome so we mask and shift
> * out unneeded bits to help the sleb128 encoder do a better job.
> --
> 2.14.3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 2/9] target/arm: Use pointers in crypto helpers
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
2017-12-18 17:30 ` [Qemu-devel] [PATCH 1/9] target/arm: Mark disas_set_insn_syndrome inline Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2018-01-11 18:45 ` Peter Maydell
2017-12-18 17:30 ` [Qemu-devel] [PATCH 3/9] target/arm: Use pointers in neon zip/uzp helpers Richard Henderson
` (6 subsequent siblings)
8 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Rather than passing regnos to the helpers, pass pointers to the
vector registers directly. This eliminates the need to pass in
the environment pointer and reduces the number of places that
directly access env->vfp.regs[].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 18 ++---
target/arm/crypto_helper.c | 184 +++++++++++++++++----------------------------
target/arm/translate-a64.c | 68 +++++++++--------
target/arm/translate.c | 68 +++++++++--------
4 files changed, 154 insertions(+), 184 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 5b6333347d..f9dd7432c1 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -539,17 +539,17 @@ DEF_HELPER_3(neon_qzip8, void, env, i32, i32)
DEF_HELPER_3(neon_qzip16, void, env, i32, i32)
DEF_HELPER_3(neon_qzip32, void, env, i32, i32)
-DEF_HELPER_4(crypto_aese, void, env, i32, i32, i32)
-DEF_HELPER_4(crypto_aesmc, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_3(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
+DEF_HELPER_FLAGS_3(crypto_aesmc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
-DEF_HELPER_5(crypto_sha1_3reg, void, env, i32, i32, i32, i32)
-DEF_HELPER_3(crypto_sha1h, void, env, i32, i32)
-DEF_HELPER_3(crypto_sha1su1, void, env, i32, i32)
+DEF_HELPER_FLAGS_4(crypto_sha1_3reg, TCG_CALL_NO_RWG, void, ptr, ptr, ptr, i32)
+DEF_HELPER_FLAGS_2(crypto_sha1h, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(crypto_sha1su1, TCG_CALL_NO_RWG, void, ptr, ptr)
-DEF_HELPER_4(crypto_sha256h, void, env, i32, i32, i32)
-DEF_HELPER_4(crypto_sha256h2, void, env, i32, i32, i32)
-DEF_HELPER_3(crypto_sha256su0, void, env, i32, i32)
-DEF_HELPER_4(crypto_sha256su1, void, env, i32, i32, i32)
+DEF_HELPER_FLAGS_3(crypto_sha256h, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_3(crypto_sha256h2, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
+DEF_HELPER_FLAGS_2(crypto_sha256su0, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_3(crypto_sha256su1, TCG_CALL_NO_RWG, void, ptr, ptr, ptr)
DEF_HELPER_FLAGS_3(crc32, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
DEF_HELPER_FLAGS_3(crc32c, TCG_CALL_NO_RWG_SE, i32, i32, i32, i32)
diff --git a/target/arm/crypto_helper.c b/target/arm/crypto_helper.c
index 3b6df3f41a..9ca0bdead7 100644
--- a/target/arm/crypto_helper.c
+++ b/target/arm/crypto_helper.c
@@ -30,20 +30,14 @@ union CRYPTO_STATE {
#define CR_ST_WORD(state, i) (state.words[i])
#endif
-void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
- uint32_t decrypt)
+void HELPER(crypto_aese)(void *vd, void *vm, uint32_t decrypt)
{
static uint8_t const * const sbox[2] = { AES_sbox, AES_isbox };
static uint8_t const * const shift[2] = { AES_shifts, AES_ishifts };
-
- union CRYPTO_STATE rk = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
- union CRYPTO_STATE st = { .l = {
- float64_val(env->vfp.regs[rd]),
- float64_val(env->vfp.regs[rd + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE rk = { .l = { rm[0], rm[1] } };
+ union CRYPTO_STATE st = { .l = { rd[0], rd[1] } };
int i;
assert(decrypt < 2);
@@ -57,12 +51,11 @@ void HELPER(crypto_aese)(CPUARMState *env, uint32_t rd, uint32_t rm,
CR_ST_BYTE(st, i) = sbox[decrypt][CR_ST_BYTE(rk, shift[decrypt][i])];
}
- env->vfp.regs[rd] = make_float64(st.l[0]);
- env->vfp.regs[rd + 1] = make_float64(st.l[1]);
+ rd[0] = st.l[0];
+ rd[1] = st.l[1];
}
-void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
- uint32_t decrypt)
+void HELPER(crypto_aesmc)(void *vd, void *vm, uint32_t decrypt)
{
static uint32_t const mc[][256] = { {
/* MixColumns lookup table */
@@ -197,10 +190,10 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
0x92b479a7, 0x99b970a9, 0x84ae6bbb, 0x8fa362b5,
0xbe805d9f, 0xb58d5491, 0xa89a4f83, 0xa397468d,
} };
- union CRYPTO_STATE st = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+
+ uint64_t *rd = vd;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE st = { .l = { rm[0], rm[1] } };
int i;
assert(decrypt < 2);
@@ -213,8 +206,8 @@ void HELPER(crypto_aesmc)(CPUARMState *env, uint32_t rd, uint32_t rm,
rol32(mc[decrypt][CR_ST_BYTE(st, i + 3)], 24);
}
- env->vfp.regs[rd] = make_float64(st.l[0]);
- env->vfp.regs[rd + 1] = make_float64(st.l[1]);
+ rd[0] = st.l[0];
+ rd[1] = st.l[1];
}
/*
@@ -236,21 +229,14 @@ static uint32_t maj(uint32_t x, uint32_t y, uint32_t z)
return (x & y) | ((x | y) & z);
}
-void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn,
- uint32_t rm, uint32_t op)
+void HELPER(crypto_sha1_3reg)(void *vd, void *vn, void *vm, uint32_t op)
{
- union CRYPTO_STATE d = { .l = {
- float64_val(env->vfp.regs[rd]),
- float64_val(env->vfp.regs[rd + 1])
- } };
- union CRYPTO_STATE n = { .l = {
- float64_val(env->vfp.regs[rn]),
- float64_val(env->vfp.regs[rn + 1])
- } };
- union CRYPTO_STATE m = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rn = vn;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
+ union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
+ union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
if (op == 3) { /* sha1su0 */
d.l[0] ^= d.l[1] ^ m.l[0];
@@ -284,42 +270,37 @@ void HELPER(crypto_sha1_3reg)(CPUARMState *env, uint32_t rd, uint32_t rn,
CR_ST_WORD(d, 0) = t;
}
}
- env->vfp.regs[rd] = make_float64(d.l[0]);
- env->vfp.regs[rd + 1] = make_float64(d.l[1]);
+ rd[0] = d.l[0];
+ rd[1] = d.l[1];
}
-void HELPER(crypto_sha1h)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(crypto_sha1h)(void *vd, void *vm)
{
- union CRYPTO_STATE m = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
CR_ST_WORD(m, 0) = ror32(CR_ST_WORD(m, 0), 2);
CR_ST_WORD(m, 1) = CR_ST_WORD(m, 2) = CR_ST_WORD(m, 3) = 0;
- env->vfp.regs[rd] = make_float64(m.l[0]);
- env->vfp.regs[rd + 1] = make_float64(m.l[1]);
+ rd[0] = m.l[0];
+ rd[1] = m.l[1];
}
-void HELPER(crypto_sha1su1)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(crypto_sha1su1)(void *vd, void *vm)
{
- union CRYPTO_STATE d = { .l = {
- float64_val(env->vfp.regs[rd]),
- float64_val(env->vfp.regs[rd + 1])
- } };
- union CRYPTO_STATE m = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
+ union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
CR_ST_WORD(d, 0) = rol32(CR_ST_WORD(d, 0) ^ CR_ST_WORD(m, 1), 1);
CR_ST_WORD(d, 1) = rol32(CR_ST_WORD(d, 1) ^ CR_ST_WORD(m, 2), 1);
CR_ST_WORD(d, 2) = rol32(CR_ST_WORD(d, 2) ^ CR_ST_WORD(m, 3), 1);
CR_ST_WORD(d, 3) = rol32(CR_ST_WORD(d, 3) ^ CR_ST_WORD(d, 0), 1);
- env->vfp.regs[rd] = make_float64(d.l[0]);
- env->vfp.regs[rd + 1] = make_float64(d.l[1]);
+ rd[0] = d.l[0];
+ rd[1] = d.l[1];
}
/*
@@ -347,21 +328,14 @@ static uint32_t s1(uint32_t x)
return ror32(x, 17) ^ ror32(x, 19) ^ (x >> 10);
}
-void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t rd, uint32_t rn,
- uint32_t rm)
+void HELPER(crypto_sha256h)(void *vd, void *vn, void *vm)
{
- union CRYPTO_STATE d = { .l = {
- float64_val(env->vfp.regs[rd]),
- float64_val(env->vfp.regs[rd + 1])
- } };
- union CRYPTO_STATE n = { .l = {
- float64_val(env->vfp.regs[rn]),
- float64_val(env->vfp.regs[rn + 1])
- } };
- union CRYPTO_STATE m = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rn = vn;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
+ union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
+ union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
int i;
for (i = 0; i < 4; i++) {
@@ -383,25 +357,18 @@ void HELPER(crypto_sha256h)(CPUARMState *env, uint32_t rd, uint32_t rn,
CR_ST_WORD(d, 0) = t;
}
- env->vfp.regs[rd] = make_float64(d.l[0]);
- env->vfp.regs[rd + 1] = make_float64(d.l[1]);
+ rd[0] = d.l[0];
+ rd[1] = d.l[1];
}
-void HELPER(crypto_sha256h2)(CPUARMState *env, uint32_t rd, uint32_t rn,
- uint32_t rm)
+void HELPER(crypto_sha256h2)(void *vd, void *vn, void *vm)
{
- union CRYPTO_STATE d = { .l = {
- float64_val(env->vfp.regs[rd]),
- float64_val(env->vfp.regs[rd + 1])
- } };
- union CRYPTO_STATE n = { .l = {
- float64_val(env->vfp.regs[rn]),
- float64_val(env->vfp.regs[rn + 1])
- } };
- union CRYPTO_STATE m = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rn = vn;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
+ union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
+ union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
int i;
for (i = 0; i < 4; i++) {
@@ -415,51 +382,40 @@ void HELPER(crypto_sha256h2)(CPUARMState *env, uint32_t rd, uint32_t rn,
CR_ST_WORD(d, 0) = CR_ST_WORD(n, 3 - i) + t;
}
- env->vfp.regs[rd] = make_float64(d.l[0]);
- env->vfp.regs[rd + 1] = make_float64(d.l[1]);
+ rd[0] = d.l[0];
+ rd[1] = d.l[1];
}
-void HELPER(crypto_sha256su0)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(crypto_sha256su0)(void *vd, void *vm)
{
- union CRYPTO_STATE d = { .l = {
- float64_val(env->vfp.regs[rd]),
- float64_val(env->vfp.regs[rd + 1])
- } };
- union CRYPTO_STATE m = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
+ union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
CR_ST_WORD(d, 0) += s0(CR_ST_WORD(d, 1));
CR_ST_WORD(d, 1) += s0(CR_ST_WORD(d, 2));
CR_ST_WORD(d, 2) += s0(CR_ST_WORD(d, 3));
CR_ST_WORD(d, 3) += s0(CR_ST_WORD(m, 0));
- env->vfp.regs[rd] = make_float64(d.l[0]);
- env->vfp.regs[rd + 1] = make_float64(d.l[1]);
+ rd[0] = d.l[0];
+ rd[1] = d.l[1];
}
-void HELPER(crypto_sha256su1)(CPUARMState *env, uint32_t rd, uint32_t rn,
- uint32_t rm)
+void HELPER(crypto_sha256su1)(void *vd, void *vn, void *vm)
{
- union CRYPTO_STATE d = { .l = {
- float64_val(env->vfp.regs[rd]),
- float64_val(env->vfp.regs[rd + 1])
- } };
- union CRYPTO_STATE n = { .l = {
- float64_val(env->vfp.regs[rn]),
- float64_val(env->vfp.regs[rn + 1])
- } };
- union CRYPTO_STATE m = { .l = {
- float64_val(env->vfp.regs[rm]),
- float64_val(env->vfp.regs[rm + 1])
- } };
+ uint64_t *rd = vd;
+ uint64_t *rn = vn;
+ uint64_t *rm = vm;
+ union CRYPTO_STATE d = { .l = { rd[0], rd[1] } };
+ union CRYPTO_STATE n = { .l = { rn[0], rn[1] } };
+ union CRYPTO_STATE m = { .l = { rm[0], rm[1] } };
CR_ST_WORD(d, 0) += s1(CR_ST_WORD(m, 2)) + CR_ST_WORD(n, 1);
CR_ST_WORD(d, 1) += s1(CR_ST_WORD(m, 3)) + CR_ST_WORD(n, 2);
CR_ST_WORD(d, 2) += s1(CR_ST_WORD(d, 0)) + CR_ST_WORD(n, 3);
CR_ST_WORD(d, 3) += s1(CR_ST_WORD(d, 1)) + CR_ST_WORD(m, 0);
- env->vfp.regs[rd] = make_float64(d.l[0]);
- env->vfp.regs[rd + 1] = make_float64(d.l[1]);
+ rd[0] = d.l[0];
+ rd[1] = d.l[1];
}
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 79fede35c1..d246e8f6b5 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -81,8 +81,9 @@ typedef void NeonGenWidenFn(TCGv_i64, TCGv_i32);
typedef void NeonGenTwoSingleOPFn(TCGv_i32, TCGv_i32, TCGv_i32, TCGv_ptr);
typedef void NeonGenTwoDoubleOPFn(TCGv_i64, TCGv_i64, TCGv_i64, TCGv_ptr);
typedef void NeonGenOneOpFn(TCGv_i64, TCGv_i64);
-typedef void CryptoTwoOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32);
-typedef void CryptoThreeOpEnvFn(TCGv_ptr, TCGv_i32, TCGv_i32, TCGv_i32);
+typedef void CryptoTwoOpFn(TCGv_ptr, TCGv_ptr);
+typedef void CryptoThreeOpIntFn(TCGv_ptr, TCGv_ptr, TCGv_i32);
+typedef void CryptoThreeOpFn(TCGv_ptr, TCGv_ptr, TCGv_ptr);
/* Note that the gvec expanders operate on offsets + sizes. */
typedef void GVecGen2Fn(unsigned, uint32_t, uint32_t, uint32_t, uint32_t);
@@ -556,6 +557,14 @@ static inline int vec_full_reg_size(DisasContext *s)
return 128 / 8;
}
+/* Return a newly allocated pointer to the vector register. */
+static TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
+{
+ TCGv_ptr ret = tcg_temp_new_ptr();
+ tcg_gen_addi_ptr(ret, cpu_env, vec_full_reg_offset(s, regno));
+ return ret;
+}
+
/* Return the offset into CPUARMState of a slice (from
* the least significant end) of FP register Qn (ie
* Dn, Sn, Hn or Bn).
@@ -12455,8 +12464,9 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
int decrypt;
- TCGv_i32 tcg_rd_regno, tcg_rn_regno, tcg_decrypt;
- CryptoThreeOpEnvFn *genfn;
+ TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
+ TCGv_i32 tcg_decrypt;
+ CryptoThreeOpIntFn *genfn;
if (!arm_dc_feature(s, ARM_FEATURE_V8_AES)
|| size != 0) {
@@ -12490,18 +12500,14 @@ static void disas_crypto_aes(DisasContext *s, uint32_t insn)
return;
}
- /* Note that we convert the Vx register indexes into the
- * index within the vfp.regs[] array, so we can share the
- * helper with the AArch32 instructions.
- */
- tcg_rd_regno = tcg_const_i32(rd << 1);
- tcg_rn_regno = tcg_const_i32(rn << 1);
+ tcg_rd_ptr = vec_full_reg_ptr(s, rd);
+ tcg_rn_ptr = vec_full_reg_ptr(s, rn);
tcg_decrypt = tcg_const_i32(decrypt);
- genfn(cpu_env, tcg_rd_regno, tcg_rn_regno, tcg_decrypt);
+ genfn(tcg_rd_ptr, tcg_rn_ptr, tcg_decrypt);
- tcg_temp_free_i32(tcg_rd_regno);
- tcg_temp_free_i32(tcg_rn_regno);
+ tcg_temp_free_ptr(tcg_rd_ptr);
+ tcg_temp_free_ptr(tcg_rn_ptr);
tcg_temp_free_i32(tcg_decrypt);
}
@@ -12518,8 +12524,8 @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
int rm = extract32(insn, 16, 5);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- CryptoThreeOpEnvFn *genfn;
- TCGv_i32 tcg_rd_regno, tcg_rn_regno, tcg_rm_regno;
+ CryptoThreeOpFn *genfn;
+ TCGv_ptr tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr;
int feature = ARM_FEATURE_V8_SHA256;
if (size != 0) {
@@ -12558,23 +12564,23 @@ static void disas_crypto_three_reg_sha(DisasContext *s, uint32_t insn)
return;
}
- tcg_rd_regno = tcg_const_i32(rd << 1);
- tcg_rn_regno = tcg_const_i32(rn << 1);
- tcg_rm_regno = tcg_const_i32(rm << 1);
+ tcg_rd_ptr = vec_full_reg_ptr(s, rd);
+ tcg_rn_ptr = vec_full_reg_ptr(s, rn);
+ tcg_rm_ptr = vec_full_reg_ptr(s, rm);
if (genfn) {
- genfn(cpu_env, tcg_rd_regno, tcg_rn_regno, tcg_rm_regno);
+ genfn(tcg_rd_ptr, tcg_rn_ptr, tcg_rm_ptr);
} else {
TCGv_i32 tcg_opcode = tcg_const_i32(opcode);
- gen_helper_crypto_sha1_3reg(cpu_env, tcg_rd_regno,
- tcg_rn_regno, tcg_rm_regno, tcg_opcode);
+ gen_helper_crypto_sha1_3reg(tcg_rd_ptr, tcg_rn_ptr,
+ tcg_rm_ptr, tcg_opcode);
tcg_temp_free_i32(tcg_opcode);
}
- tcg_temp_free_i32(tcg_rd_regno);
- tcg_temp_free_i32(tcg_rn_regno);
- tcg_temp_free_i32(tcg_rm_regno);
+ tcg_temp_free_ptr(tcg_rd_ptr);
+ tcg_temp_free_ptr(tcg_rn_ptr);
+ tcg_temp_free_ptr(tcg_rm_ptr);
}
/* Crypto two-reg SHA
@@ -12589,9 +12595,9 @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
int opcode = extract32(insn, 12, 5);
int rn = extract32(insn, 5, 5);
int rd = extract32(insn, 0, 5);
- CryptoTwoOpEnvFn *genfn;
+ CryptoTwoOpFn *genfn;
int feature;
- TCGv_i32 tcg_rd_regno, tcg_rn_regno;
+ TCGv_ptr tcg_rd_ptr, tcg_rn_ptr;
if (size != 0) {
unallocated_encoding(s);
@@ -12625,13 +12631,13 @@ static void disas_crypto_two_reg_sha(DisasContext *s, uint32_t insn)
return;
}
- tcg_rd_regno = tcg_const_i32(rd << 1);
- tcg_rn_regno = tcg_const_i32(rn << 1);
+ tcg_rd_ptr = vec_full_reg_ptr(s, rd);
+ tcg_rn_ptr = vec_full_reg_ptr(s, rn);
- genfn(cpu_env, tcg_rd_regno, tcg_rn_regno);
+ genfn(tcg_rd_ptr, tcg_rn_ptr);
- tcg_temp_free_i32(tcg_rd_regno);
- tcg_temp_free_i32(tcg_rn_regno);
+ tcg_temp_free_ptr(tcg_rd_ptr);
+ tcg_temp_free_ptr(tcg_rn_ptr);
}
/* C3.6 Data processing - SIMD, inc Crypto
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 490e120b0b..0c35c39069 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1560,6 +1560,13 @@ static inline void neon_store_reg64(TCGv_i64 var, int reg)
tcg_gen_st_i64(var, cpu_env, vfp_reg_offset(1, reg));
}
+static TCGv_ptr vfp_reg_ptr(bool dp, int reg)
+{
+ TCGv_ptr ret = tcg_temp_new_ptr();
+ tcg_gen_addi_ptr(ret, cpu_env, vfp_reg_offset(dp, reg));
+ return ret;
+}
+
#define tcg_gen_ld_f32 tcg_gen_ld_i32
#define tcg_gen_ld_f64 tcg_gen_ld_i64
#define tcg_gen_st_f32 tcg_gen_st_i32
@@ -5614,6 +5621,7 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
int u;
uint32_t imm, mask;
TCGv_i32 tmp, tmp2, tmp3, tmp4, tmp5;
+ TCGv_ptr ptr1, ptr2, ptr3;
TCGv_i64 tmp64;
/* FIXME: this access check should not take precedence over UNDEF
@@ -5660,34 +5668,34 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)) {
return 1;
}
- tmp = tcg_const_i32(rd);
- tmp2 = tcg_const_i32(rn);
- tmp3 = tcg_const_i32(rm);
+ ptr1 = vfp_reg_ptr(true, rd);
+ ptr2 = vfp_reg_ptr(true, rn);
+ ptr3 = vfp_reg_ptr(true, rm);
tmp4 = tcg_const_i32(size);
- gen_helper_crypto_sha1_3reg(cpu_env, tmp, tmp2, tmp3, tmp4);
+ gen_helper_crypto_sha1_3reg(ptr1, ptr2, ptr3, tmp4);
tcg_temp_free_i32(tmp4);
} else { /* SHA-256 */
if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA256) || size == 3) {
return 1;
}
- tmp = tcg_const_i32(rd);
- tmp2 = tcg_const_i32(rn);
- tmp3 = tcg_const_i32(rm);
+ ptr1 = vfp_reg_ptr(true, rd);
+ ptr2 = vfp_reg_ptr(true, rn);
+ ptr3 = vfp_reg_ptr(true, rm);
switch (size) {
case 0:
- gen_helper_crypto_sha256h(cpu_env, tmp, tmp2, tmp3);
+ gen_helper_crypto_sha256h(ptr1, ptr2, ptr3);
break;
case 1:
- gen_helper_crypto_sha256h2(cpu_env, tmp, tmp2, tmp3);
+ gen_helper_crypto_sha256h2(ptr1, ptr2, ptr3);
break;
case 2:
- gen_helper_crypto_sha256su1(cpu_env, tmp, tmp2, tmp3);
+ gen_helper_crypto_sha256su1(ptr1, ptr2, ptr3);
break;
}
}
- tcg_temp_free_i32(tmp);
- tcg_temp_free_i32(tmp2);
- tcg_temp_free_i32(tmp3);
+ tcg_temp_free_ptr(ptr1);
+ tcg_temp_free_ptr(ptr2);
+ tcg_temp_free_ptr(ptr3);
return 0;
case NEON_3R_VPADD_VQRDMLAH:
@@ -7238,8 +7246,8 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
|| ((rm | rd) & 1)) {
return 1;
}
- tmp = tcg_const_i32(rd);
- tmp2 = tcg_const_i32(rm);
+ ptr1 = vfp_reg_ptr(true, rd);
+ ptr2 = vfp_reg_ptr(true, rm);
/* Bit 6 is the lowest opcode bit; it distinguishes between
* encryption (AESE/AESMC) and decryption (AESD/AESIMC)
@@ -7247,12 +7255,12 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
tmp3 = tcg_const_i32(extract32(insn, 6, 1));
if (op == NEON_2RM_AESE) {
- gen_helper_crypto_aese(cpu_env, tmp, tmp2, tmp3);
+ gen_helper_crypto_aese(ptr1, ptr2, tmp3);
} else {
- gen_helper_crypto_aesmc(cpu_env, tmp, tmp2, tmp3);
+ gen_helper_crypto_aesmc(ptr1, ptr2, tmp3);
}
- tcg_temp_free_i32(tmp);
- tcg_temp_free_i32(tmp2);
+ tcg_temp_free_ptr(ptr1);
+ tcg_temp_free_ptr(ptr2);
tcg_temp_free_i32(tmp3);
break;
case NEON_2RM_SHA1H:
@@ -7260,13 +7268,13 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
|| ((rm | rd) & 1)) {
return 1;
}
- tmp = tcg_const_i32(rd);
- tmp2 = tcg_const_i32(rm);
+ ptr1 = vfp_reg_ptr(true, rd);
+ ptr2 = vfp_reg_ptr(true, rm);
- gen_helper_crypto_sha1h(cpu_env, tmp, tmp2);
+ gen_helper_crypto_sha1h(ptr1, ptr2);
- tcg_temp_free_i32(tmp);
- tcg_temp_free_i32(tmp2);
+ tcg_temp_free_ptr(ptr1);
+ tcg_temp_free_ptr(ptr2);
break;
case NEON_2RM_SHA1SU1:
if ((rm | rd) & 1) {
@@ -7280,15 +7288,15 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
} else if (!arm_dc_feature(s, ARM_FEATURE_V8_SHA1)) {
return 1;
}
- tmp = tcg_const_i32(rd);
- tmp2 = tcg_const_i32(rm);
+ ptr1 = vfp_reg_ptr(true, rd);
+ ptr2 = vfp_reg_ptr(true, rm);
if (q) {
- gen_helper_crypto_sha256su0(cpu_env, tmp, tmp2);
+ gen_helper_crypto_sha256su0(ptr1, ptr2);
} else {
- gen_helper_crypto_sha1su1(cpu_env, tmp, tmp2);
+ gen_helper_crypto_sha1su1(ptr1, ptr2);
}
- tcg_temp_free_i32(tmp);
- tcg_temp_free_i32(tmp2);
+ tcg_temp_free_ptr(ptr1);
+ tcg_temp_free_ptr(ptr2);
break;
default:
elementwise:
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 2/9] target/arm: Use pointers in crypto helpers
2017-12-18 17:30 ` [Qemu-devel] [PATCH 2/9] target/arm: Use pointers in crypto helpers Richard Henderson
@ 2018-01-11 18:45 ` Peter Maydell
0 siblings, 0 replies; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:45 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Rather than passing regnos to the helpers, pass pointers to the
> vector registers directly. This eliminates the need to pass in
> the environment pointer and reduces the number of places that
> directly access env->vfp.regs[].
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.h | 18 ++---
> target/arm/crypto_helper.c | 184 +++++++++++++++++----------------------------
> target/arm/translate-a64.c | 68 +++++++++--------
> target/arm/translate.c | 68 +++++++++--------
> 4 files changed, 154 insertions(+), 184 deletions(-)
>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 3/9] target/arm: Use pointers in neon zip/uzp helpers
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
2017-12-18 17:30 ` [Qemu-devel] [PATCH 1/9] target/arm: Mark disas_set_insn_syndrome inline Richard Henderson
2017-12-18 17:30 ` [Qemu-devel] [PATCH 2/9] target/arm: Use pointers in crypto helpers Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2018-01-11 18:46 ` Peter Maydell
2017-12-18 17:30 ` [Qemu-devel] [PATCH 4/9] target/arm: Use pointers in neon tbl helper Richard Henderson
` (5 subsequent siblings)
8 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Rather than passing regnos to the helpers, pass pointers to the
vector registers directly. This eliminates the need to pass in
the environment pointer and reduces the number of places that
directly access env->vfp.regs[].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 20 +++---
target/arm/neon_helper.c | 162 +++++++++++++++++++++++++----------------------
target/arm/translate.c | 42 ++++++------
3 files changed, 120 insertions(+), 104 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index f9dd7432c1..d39ca11cbd 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -528,16 +528,16 @@ DEF_HELPER_3(iwmmxt_muladdsl, i64, i64, i32, i32)
DEF_HELPER_3(iwmmxt_muladdsw, i64, i64, i32, i32)
DEF_HELPER_3(iwmmxt_muladdswl, i64, i64, i32, i32)
-DEF_HELPER_3(neon_unzip8, void, env, i32, i32)
-DEF_HELPER_3(neon_unzip16, void, env, i32, i32)
-DEF_HELPER_3(neon_qunzip8, void, env, i32, i32)
-DEF_HELPER_3(neon_qunzip16, void, env, i32, i32)
-DEF_HELPER_3(neon_qunzip32, void, env, i32, i32)
-DEF_HELPER_3(neon_zip8, void, env, i32, i32)
-DEF_HELPER_3(neon_zip16, void, env, i32, i32)
-DEF_HELPER_3(neon_qzip8, void, env, i32, i32)
-DEF_HELPER_3(neon_qzip16, void, env, i32, i32)
-DEF_HELPER_3(neon_qzip32, void, env, i32, i32)
+DEF_HELPER_FLAGS_2(neon_unzip8, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_unzip16, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_qunzip8, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_qunzip16, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_qunzip32, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_zip8, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_zip16, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_qzip8, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_qzip16, TCG_CALL_NO_RWG, void, ptr, ptr)
+DEF_HELPER_FLAGS_2(neon_qzip32, TCG_CALL_NO_RWG, void, ptr, ptr)
DEF_HELPER_FLAGS_3(crypto_aese, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
DEF_HELPER_FLAGS_3(crypto_aesmc, TCG_CALL_NO_RWG, void, ptr, ptr, i32)
diff --git a/target/arm/neon_helper.c b/target/arm/neon_helper.c
index ebdf7c9b10..689491cad3 100644
--- a/target/arm/neon_helper.c
+++ b/target/arm/neon_helper.c
@@ -2027,12 +2027,12 @@ uint64_t HELPER(neon_acgt_f64)(uint64_t a, uint64_t b, void *fpstp)
#define ELEM(V, N, SIZE) (((V) >> ((N) * (SIZE))) & ((1ull << (SIZE)) - 1))
-void HELPER(neon_qunzip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_qunzip8)(void *vd, void *vm)
{
- uint64_t zm0 = float64_val(env->vfp.regs[rm]);
- uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
- uint64_t zd0 = float64_val(env->vfp.regs[rd]);
- uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd0 = rd[0], zd1 = rd[1];
+ uint64_t zm0 = rm[0], zm1 = rm[1];
+
uint64_t d0 = ELEM(zd0, 0, 8) | (ELEM(zd0, 2, 8) << 8)
| (ELEM(zd0, 4, 8) << 16) | (ELEM(zd0, 6, 8) << 24)
| (ELEM(zd1, 0, 8) << 32) | (ELEM(zd1, 2, 8) << 40)
@@ -2049,18 +2049,19 @@ void HELPER(neon_qunzip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
| (ELEM(zm0, 5, 8) << 16) | (ELEM(zm0, 7, 8) << 24)
| (ELEM(zm1, 1, 8) << 32) | (ELEM(zm1, 3, 8) << 40)
| (ELEM(zm1, 5, 8) << 48) | (ELEM(zm1, 7, 8) << 56);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rm + 1] = make_float64(m1);
- env->vfp.regs[rd] = make_float64(d0);
- env->vfp.regs[rd + 1] = make_float64(d1);
+
+ rm[0] = m0;
+ rm[1] = m1;
+ rd[0] = d0;
+ rd[1] = d1;
}
-void HELPER(neon_qunzip16)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_qunzip16)(void *vd, void *vm)
{
- uint64_t zm0 = float64_val(env->vfp.regs[rm]);
- uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
- uint64_t zd0 = float64_val(env->vfp.regs[rd]);
- uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd0 = rd[0], zd1 = rd[1];
+ uint64_t zm0 = rm[0], zm1 = rm[1];
+
uint64_t d0 = ELEM(zd0, 0, 16) | (ELEM(zd0, 2, 16) << 16)
| (ELEM(zd1, 0, 16) << 32) | (ELEM(zd1, 2, 16) << 48);
uint64_t d1 = ELEM(zm0, 0, 16) | (ELEM(zm0, 2, 16) << 16)
@@ -2069,32 +2070,35 @@ void HELPER(neon_qunzip16)(CPUARMState *env, uint32_t rd, uint32_t rm)
| (ELEM(zd1, 1, 16) << 32) | (ELEM(zd1, 3, 16) << 48);
uint64_t m1 = ELEM(zm0, 1, 16) | (ELEM(zm0, 3, 16) << 16)
| (ELEM(zm1, 1, 16) << 32) | (ELEM(zm1, 3, 16) << 48);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rm + 1] = make_float64(m1);
- env->vfp.regs[rd] = make_float64(d0);
- env->vfp.regs[rd + 1] = make_float64(d1);
+
+ rm[0] = m0;
+ rm[1] = m1;
+ rd[0] = d0;
+ rd[1] = d1;
}
-void HELPER(neon_qunzip32)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_qunzip32)(void *vd, void *vm)
{
- uint64_t zm0 = float64_val(env->vfp.regs[rm]);
- uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
- uint64_t zd0 = float64_val(env->vfp.regs[rd]);
- uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd0 = rd[0], zd1 = rd[1];
+ uint64_t zm0 = rm[0], zm1 = rm[1];
+
uint64_t d0 = ELEM(zd0, 0, 32) | (ELEM(zd1, 0, 32) << 32);
uint64_t d1 = ELEM(zm0, 0, 32) | (ELEM(zm1, 0, 32) << 32);
uint64_t m0 = ELEM(zd0, 1, 32) | (ELEM(zd1, 1, 32) << 32);
uint64_t m1 = ELEM(zm0, 1, 32) | (ELEM(zm1, 1, 32) << 32);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rm + 1] = make_float64(m1);
- env->vfp.regs[rd] = make_float64(d0);
- env->vfp.regs[rd + 1] = make_float64(d1);
+
+ rm[0] = m0;
+ rm[1] = m1;
+ rd[0] = d0;
+ rd[1] = d1;
}
-void HELPER(neon_unzip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_unzip8)(void *vd, void *vm)
{
- uint64_t zm = float64_val(env->vfp.regs[rm]);
- uint64_t zd = float64_val(env->vfp.regs[rd]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd = rd[0], zm = rm[0];
+
uint64_t d0 = ELEM(zd, 0, 8) | (ELEM(zd, 2, 8) << 8)
| (ELEM(zd, 4, 8) << 16) | (ELEM(zd, 6, 8) << 24)
| (ELEM(zm, 0, 8) << 32) | (ELEM(zm, 2, 8) << 40)
@@ -2103,28 +2107,31 @@ void HELPER(neon_unzip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
| (ELEM(zd, 5, 8) << 16) | (ELEM(zd, 7, 8) << 24)
| (ELEM(zm, 1, 8) << 32) | (ELEM(zm, 3, 8) << 40)
| (ELEM(zm, 5, 8) << 48) | (ELEM(zm, 7, 8) << 56);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rd] = make_float64(d0);
+
+ rm[0] = m0;
+ rd[0] = d0;
}
-void HELPER(neon_unzip16)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_unzip16)(void *vd, void *vm)
{
- uint64_t zm = float64_val(env->vfp.regs[rm]);
- uint64_t zd = float64_val(env->vfp.regs[rd]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd = rd[0], zm = rm[0];
+
uint64_t d0 = ELEM(zd, 0, 16) | (ELEM(zd, 2, 16) << 16)
| (ELEM(zm, 0, 16) << 32) | (ELEM(zm, 2, 16) << 48);
uint64_t m0 = ELEM(zd, 1, 16) | (ELEM(zd, 3, 16) << 16)
| (ELEM(zm, 1, 16) << 32) | (ELEM(zm, 3, 16) << 48);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rd] = make_float64(d0);
+
+ rm[0] = m0;
+ rd[0] = d0;
}
-void HELPER(neon_qzip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_qzip8)(void *vd, void *vm)
{
- uint64_t zm0 = float64_val(env->vfp.regs[rm]);
- uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
- uint64_t zd0 = float64_val(env->vfp.regs[rd]);
- uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd0 = rd[0], zd1 = rd[1];
+ uint64_t zm0 = rm[0], zm1 = rm[1];
+
uint64_t d0 = ELEM(zd0, 0, 8) | (ELEM(zm0, 0, 8) << 8)
| (ELEM(zd0, 1, 8) << 16) | (ELEM(zm0, 1, 8) << 24)
| (ELEM(zd0, 2, 8) << 32) | (ELEM(zm0, 2, 8) << 40)
@@ -2141,18 +2148,19 @@ void HELPER(neon_qzip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
| (ELEM(zd1, 5, 8) << 16) | (ELEM(zm1, 5, 8) << 24)
| (ELEM(zd1, 6, 8) << 32) | (ELEM(zm1, 6, 8) << 40)
| (ELEM(zd1, 7, 8) << 48) | (ELEM(zm1, 7, 8) << 56);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rm + 1] = make_float64(m1);
- env->vfp.regs[rd] = make_float64(d0);
- env->vfp.regs[rd + 1] = make_float64(d1);
+
+ rm[0] = m0;
+ rm[1] = m1;
+ rd[0] = d0;
+ rd[1] = d1;
}
-void HELPER(neon_qzip16)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_qzip16)(void *vd, void *vm)
{
- uint64_t zm0 = float64_val(env->vfp.regs[rm]);
- uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
- uint64_t zd0 = float64_val(env->vfp.regs[rd]);
- uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd0 = rd[0], zd1 = rd[1];
+ uint64_t zm0 = rm[0], zm1 = rm[1];
+
uint64_t d0 = ELEM(zd0, 0, 16) | (ELEM(zm0, 0, 16) << 16)
| (ELEM(zd0, 1, 16) << 32) | (ELEM(zm0, 1, 16) << 48);
uint64_t d1 = ELEM(zd0, 2, 16) | (ELEM(zm0, 2, 16) << 16)
@@ -2161,32 +2169,35 @@ void HELPER(neon_qzip16)(CPUARMState *env, uint32_t rd, uint32_t rm)
| (ELEM(zd1, 1, 16) << 32) | (ELEM(zm1, 1, 16) << 48);
uint64_t m1 = ELEM(zd1, 2, 16) | (ELEM(zm1, 2, 16) << 16)
| (ELEM(zd1, 3, 16) << 32) | (ELEM(zm1, 3, 16) << 48);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rm + 1] = make_float64(m1);
- env->vfp.regs[rd] = make_float64(d0);
- env->vfp.regs[rd + 1] = make_float64(d1);
+
+ rm[0] = m0;
+ rm[1] = m1;
+ rd[0] = d0;
+ rd[1] = d1;
}
-void HELPER(neon_qzip32)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_qzip32)(void *vd, void *vm)
{
- uint64_t zm0 = float64_val(env->vfp.regs[rm]);
- uint64_t zm1 = float64_val(env->vfp.regs[rm + 1]);
- uint64_t zd0 = float64_val(env->vfp.regs[rd]);
- uint64_t zd1 = float64_val(env->vfp.regs[rd + 1]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd0 = rd[0], zd1 = rd[1];
+ uint64_t zm0 = rm[0], zm1 = rm[1];
+
uint64_t d0 = ELEM(zd0, 0, 32) | (ELEM(zm0, 0, 32) << 32);
uint64_t d1 = ELEM(zd0, 1, 32) | (ELEM(zm0, 1, 32) << 32);
uint64_t m0 = ELEM(zd1, 0, 32) | (ELEM(zm1, 0, 32) << 32);
uint64_t m1 = ELEM(zd1, 1, 32) | (ELEM(zm1, 1, 32) << 32);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rm + 1] = make_float64(m1);
- env->vfp.regs[rd] = make_float64(d0);
- env->vfp.regs[rd + 1] = make_float64(d1);
+
+ rm[0] = m0;
+ rm[1] = m1;
+ rd[0] = d0;
+ rd[1] = d1;
}
-void HELPER(neon_zip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_zip8)(void *vd, void *vm)
{
- uint64_t zm = float64_val(env->vfp.regs[rm]);
- uint64_t zd = float64_val(env->vfp.regs[rd]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd = rd[0], zm = rm[0];
+
uint64_t d0 = ELEM(zd, 0, 8) | (ELEM(zm, 0, 8) << 8)
| (ELEM(zd, 1, 8) << 16) | (ELEM(zm, 1, 8) << 24)
| (ELEM(zd, 2, 8) << 32) | (ELEM(zm, 2, 8) << 40)
@@ -2195,20 +2206,23 @@ void HELPER(neon_zip8)(CPUARMState *env, uint32_t rd, uint32_t rm)
| (ELEM(zd, 5, 8) << 16) | (ELEM(zm, 5, 8) << 24)
| (ELEM(zd, 6, 8) << 32) | (ELEM(zm, 6, 8) << 40)
| (ELEM(zd, 7, 8) << 48) | (ELEM(zm, 7, 8) << 56);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rd] = make_float64(d0);
+
+ rm[0] = m0;
+ rd[0] = d0;
}
-void HELPER(neon_zip16)(CPUARMState *env, uint32_t rd, uint32_t rm)
+void HELPER(neon_zip16)(void *vd, void *vm)
{
- uint64_t zm = float64_val(env->vfp.regs[rm]);
- uint64_t zd = float64_val(env->vfp.regs[rd]);
+ uint64_t *rd = vd, *rm = vm;
+ uint64_t zd = rd[0], zm = rm[0];
+
uint64_t d0 = ELEM(zd, 0, 16) | (ELEM(zm, 0, 16) << 16)
| (ELEM(zd, 1, 16) << 32) | (ELEM(zm, 1, 16) << 48);
uint64_t m0 = ELEM(zd, 2, 16) | (ELEM(zm, 2, 16) << 16)
| (ELEM(zd, 3, 16) << 32) | (ELEM(zm, 3, 16) << 48);
- env->vfp.regs[rm] = make_float64(m0);
- env->vfp.regs[rd] = make_float64(d0);
+
+ rm[0] = m0;
+ rd[0] = d0;
}
/* Helper function for 64 bit polynomial multiply case:
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 0c35c39069..68e928640f 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -4688,22 +4688,23 @@ static inline TCGv_i32 neon_get_scalar(int size, int reg)
static int gen_neon_unzip(int rd, int rm, int size, int q)
{
- TCGv_i32 tmp, tmp2;
+ TCGv_ptr pd, pm;
+
if (!q && size == 2) {
return 1;
}
- tmp = tcg_const_i32(rd);
- tmp2 = tcg_const_i32(rm);
+ pd = vfp_reg_ptr(true, rd);
+ pm = vfp_reg_ptr(true, rm);
if (q) {
switch (size) {
case 0:
- gen_helper_neon_qunzip8(cpu_env, tmp, tmp2);
+ gen_helper_neon_qunzip8(pd, pm);
break;
case 1:
- gen_helper_neon_qunzip16(cpu_env, tmp, tmp2);
+ gen_helper_neon_qunzip16(pd, pm);
break;
case 2:
- gen_helper_neon_qunzip32(cpu_env, tmp, tmp2);
+ gen_helper_neon_qunzip32(pd, pm);
break;
default:
abort();
@@ -4711,38 +4712,39 @@ static int gen_neon_unzip(int rd, int rm, int size, int q)
} else {
switch (size) {
case 0:
- gen_helper_neon_unzip8(cpu_env, tmp, tmp2);
+ gen_helper_neon_unzip8(pd, pm);
break;
case 1:
- gen_helper_neon_unzip16(cpu_env, tmp, tmp2);
+ gen_helper_neon_unzip16(pd, pm);
break;
default:
abort();
}
}
- tcg_temp_free_i32(tmp);
- tcg_temp_free_i32(tmp2);
+ tcg_temp_free_ptr(pd);
+ tcg_temp_free_ptr(pm);
return 0;
}
static int gen_neon_zip(int rd, int rm, int size, int q)
{
- TCGv_i32 tmp, tmp2;
+ TCGv_ptr pd, pm;
+
if (!q && size == 2) {
return 1;
}
- tmp = tcg_const_i32(rd);
- tmp2 = tcg_const_i32(rm);
+ pd = vfp_reg_ptr(true, rd);
+ pm = vfp_reg_ptr(true, rm);
if (q) {
switch (size) {
case 0:
- gen_helper_neon_qzip8(cpu_env, tmp, tmp2);
+ gen_helper_neon_qzip8(pd, pm);
break;
case 1:
- gen_helper_neon_qzip16(cpu_env, tmp, tmp2);
+ gen_helper_neon_qzip16(pd, pm);
break;
case 2:
- gen_helper_neon_qzip32(cpu_env, tmp, tmp2);
+ gen_helper_neon_qzip32(pd, pm);
break;
default:
abort();
@@ -4750,17 +4752,17 @@ static int gen_neon_zip(int rd, int rm, int size, int q)
} else {
switch (size) {
case 0:
- gen_helper_neon_zip8(cpu_env, tmp, tmp2);
+ gen_helper_neon_zip8(pd, pm);
break;
case 1:
- gen_helper_neon_zip16(cpu_env, tmp, tmp2);
+ gen_helper_neon_zip16(pd, pm);
break;
default:
abort();
}
}
- tcg_temp_free_i32(tmp);
- tcg_temp_free_i32(tmp2);
+ tcg_temp_free_ptr(pd);
+ tcg_temp_free_ptr(pm);
return 0;
}
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 3/9] target/arm: Use pointers in neon zip/uzp helpers
2017-12-18 17:30 ` [Qemu-devel] [PATCH 3/9] target/arm: Use pointers in neon zip/uzp helpers Richard Henderson
@ 2018-01-11 18:46 ` Peter Maydell
0 siblings, 0 replies; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Rather than passing regnos to the helpers, pass pointers to the
> vector registers directly. This eliminates the need to pass in
> the environment pointer and reduces the number of places that
> directly access env->vfp.regs[].
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.h | 20 +++---
> target/arm/neon_helper.c | 162 +++++++++++++++++++++++++----------------------
> target/arm/translate.c | 42 ++++++------
> 3 files changed, 120 insertions(+), 104 deletions(-)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 4/9] target/arm: Use pointers in neon tbl helper
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
` (2 preceding siblings ...)
2017-12-18 17:30 ` [Qemu-devel] [PATCH 3/9] target/arm: Use pointers in neon zip/uzp helpers Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2018-01-11 18:46 ` Peter Maydell
2017-12-18 17:30 ` [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers Richard Henderson
` (4 subsequent siblings)
8 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Rather than passing a regno to the helper, pass pointers to the
vector register directly. This eliminates the need to pass in
the environment pointer and reduces the number of places that
directly access env->vfp.regs[].
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/helper.h | 2 +-
target/arm/op_helper.c | 17 +++++++----------
target/arm/translate.c | 8 ++++----
3 files changed, 12 insertions(+), 15 deletions(-)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index d39ca11cbd..206e39a207 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -201,7 +201,7 @@ DEF_HELPER_FLAGS_2(rsqrte_f32, TCG_CALL_NO_RWG, f32, f32, ptr)
DEF_HELPER_FLAGS_2(rsqrte_f64, TCG_CALL_NO_RWG, f64, f64, ptr)
DEF_HELPER_2(recpe_u32, i32, i32, ptr)
DEF_HELPER_FLAGS_2(rsqrte_u32, TCG_CALL_NO_RWG, i32, i32, ptr)
-DEF_HELPER_5(neon_tbl, i32, env, i32, i32, i32, i32)
+DEF_HELPER_FLAGS_4(neon_tbl, TCG_CALL_NO_RWG, i32, i32, i32, ptr, i32)
DEF_HELPER_3(shl_cc, i32, env, i32, i32)
DEF_HELPER_3(shr_cc, i32, env, i32, i32)
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index c2bb4f3a43..df3aab170b 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -54,20 +54,17 @@ static int exception_target_el(CPUARMState *env)
return target_el;
}
-uint32_t HELPER(neon_tbl)(CPUARMState *env, uint32_t ireg, uint32_t def,
- uint32_t rn, uint32_t maxindex)
+uint32_t HELPER(neon_tbl)(uint32_t ireg, uint32_t def, void *vn,
+ uint32_t maxindex)
{
- uint32_t val;
- uint32_t tmp;
- int index;
- int shift;
- uint64_t *table;
- table = (uint64_t *)&env->vfp.regs[rn];
+ uint32_t val, shift;
+ uint64_t *table = vn;
+
val = 0;
for (shift = 0; shift < 32; shift += 8) {
- index = (ireg >> shift) & 0xff;
+ uint32_t index = (ireg >> shift) & 0xff;
if (index < maxindex) {
- tmp = (table[index >> 3] >> ((index & 7) << 3)) & 0xff;
+ uint32_t tmp = (table[index >> 3] >> ((index & 7) << 3)) & 0xff;
val |= tmp << shift;
} else {
val |= def & (0xff << shift);
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 68e928640f..55afd29b21 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -7623,9 +7623,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
tcg_gen_movi_i32(tmp, 0);
}
tmp2 = neon_load_reg(rm, 0);
- tmp4 = tcg_const_i32(rn);
+ ptr1 = vfp_reg_ptr(true, rn);
tmp5 = tcg_const_i32(n);
- gen_helper_neon_tbl(tmp2, cpu_env, tmp2, tmp, tmp4, tmp5);
+ gen_helper_neon_tbl(tmp2, tmp2, tmp, ptr1, tmp5);
tcg_temp_free_i32(tmp);
if (insn & (1 << 6)) {
tmp = neon_load_reg(rd, 1);
@@ -7634,9 +7634,9 @@ static int disas_neon_data_insn(DisasContext *s, uint32_t insn)
tcg_gen_movi_i32(tmp, 0);
}
tmp3 = neon_load_reg(rm, 1);
- gen_helper_neon_tbl(tmp3, cpu_env, tmp3, tmp, tmp4, tmp5);
+ gen_helper_neon_tbl(tmp3, tmp3, tmp, ptr1, tmp5);
tcg_temp_free_i32(tmp5);
- tcg_temp_free_i32(tmp4);
+ tcg_temp_free_ptr(ptr1);
neon_store_reg(rd, 0, tmp2);
neon_store_reg(rd, 1, tmp3);
tcg_temp_free_i32(tmp);
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 4/9] target/arm: Use pointers in neon tbl helper
2017-12-18 17:30 ` [Qemu-devel] [PATCH 4/9] target/arm: Use pointers in neon tbl helper Richard Henderson
@ 2018-01-11 18:46 ` Peter Maydell
0 siblings, 0 replies; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Rather than passing a regno to the helper, pass pointers to the
> vector register directly. This eliminates the need to pass in
> the environment pointer and reduces the number of places that
> directly access env->vfp.regs[].
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/helper.h | 2 +-
> target/arm/op_helper.c | 17 +++++++----------
> target/arm/translate.c | 8 ++++----
> 3 files changed, 12 insertions(+), 15 deletions(-)
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
` (3 preceding siblings ...)
2017-12-18 17:30 ` [Qemu-devel] [PATCH 4/9] target/arm: Use pointers in neon tbl helper Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2018-01-11 18:39 ` Peter Maydell
` (2 more replies)
2017-12-18 17:30 ` [Qemu-devel] [PATCH 6/9] vmstate: Add VMSTATE_UINT64_SUB_ARRAY Richard Henderson
` (3 subsequent siblings)
8 siblings, 3 replies; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Helpers that return a pointer into env->vfp.regs so that we isolate
the logic of how to index the regs array for different cpu modes.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 20 +++++++++++++++++++-
linux-user/signal.c | 22 ++++++++++++----------
target/arm/arch_dump.c | 8 +++++---
target/arm/helper-a64.c | 13 +++++++------
target/arm/helper.c | 32 ++++++++++++++++++++------------
target/arm/kvm32.c | 4 ++--
target/arm/kvm64.c | 31 ++++++++++---------------------
target/arm/machine.c | 2 +-
target/arm/translate-a64.c | 25 ++++++++-----------------
target/arm/translate.c | 16 +++++++++-------
10 files changed, 93 insertions(+), 80 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7a705a09a1..e1a8e2880d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -493,7 +493,7 @@ typedef struct CPUARMState {
* the two execution states, and means we do not need to explicitly
* map these registers when changing states.
*/
- float64 regs[64] QEMU_ALIGNED(16);
+ uint64_t regs[64] QEMU_ALIGNED(16);
uint32_t xregs[16];
/* We store these fpcsr fields separately for convenience. */
@@ -2899,4 +2899,22 @@ static inline void *arm_get_el_change_hook_opaque(ARMCPU *cpu)
return cpu->el_change_hook_opaque;
}
+/**
+ * aa32_vfp_dreg:
+ * Return a pointer to the Dn register within env in 32-bit mode.
+ */
+static inline uint64_t *aa32_vfp_dreg(CPUARMState *env, unsigned regno)
+{
+ return &env->vfp.regs[regno];
+}
+
+/**
+ * aa64_vfp_dreg:
+ * Return a pointer to the Qn register within env in 64-bit mode.
+ */
+static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
+{
+ return &env->vfp.regs[2 * regno];
+}
+
#endif
diff --git a/linux-user/signal.c b/linux-user/signal.c
index cf35473671..a9a3f41721 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -1487,12 +1487,13 @@ static int target_setup_sigframe(struct target_rt_sigframe *sf,
}
for (i = 0; i < 32; i++) {
+ uint64_t *q = aa64_vfp_qreg(env, i);
#ifdef TARGET_WORDS_BIGENDIAN
- __put_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2 + 1]);
- __put_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2]);
+ __put_user(q[0], &aux->fpsimd.vregs[i * 2 + 1]);
+ __put_user(q[1], &aux->fpsimd.vregs[i * 2]);
#else
- __put_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2]);
- __put_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2 + 1]);
+ __put_user(q[0], &aux->fpsimd.vregs[i * 2]);
+ __put_user(q[1], &aux->fpsimd.vregs[i * 2 + 1]);
#endif
}
__put_user(vfp_get_fpsr(env), &aux->fpsimd.fpsr);
@@ -1539,12 +1540,13 @@ static int target_restore_sigframe(CPUARMState *env,
}
for (i = 0; i < 32; i++) {
+ uint64_t *q = aa64_vfp_qreg(env, i);
#ifdef TARGET_WORDS_BIGENDIAN
- __get_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2 + 1]);
- __get_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2]);
+ __get_user(q[0], &aux->fpsimd.vregs[i * 2 + 1]);
+ __get_user(q[1], &aux->fpsimd.vregs[i * 2]);
#else
- __get_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2]);
- __get_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2 + 1]);
+ __get_user(q[0], &aux->fpsimd.vregs[i * 2]);
+ __get_user(q[1], &aux->fpsimd.vregs[i * 2 + 1]);
#endif
}
__get_user(fpsr, &aux->fpsimd.fpsr);
@@ -1899,7 +1901,7 @@ static abi_ulong *setup_sigframe_v2_vfp(abi_ulong *regspace, CPUARMState *env)
__put_user(TARGET_VFP_MAGIC, &vfpframe->magic);
__put_user(sizeof(*vfpframe), &vfpframe->size);
for (i = 0; i < 32; i++) {
- __put_user(float64_val(env->vfp.regs[i]), &vfpframe->ufp.fpregs[i]);
+ __put_user(*aa32_vfp_dreg(env, i), &vfpframe->ufp.fpregs[i]);
}
__put_user(vfp_get_fpscr(env), &vfpframe->ufp.fpscr);
__put_user(env->vfp.xregs[ARM_VFP_FPEXC], &vfpframe->ufp_exc.fpexc);
@@ -2206,7 +2208,7 @@ static abi_ulong *restore_sigframe_v2_vfp(CPUARMState *env, abi_ulong *regspace)
return 0;
}
for (i = 0; i < 32; i++) {
- __get_user(float64_val(env->vfp.regs[i]), &vfpframe->ufp.fpregs[i]);
+ __get_user(*aa32_vfp_dreg(env, i), &vfpframe->ufp.fpregs[i]);
}
__get_user(fpscr, &vfpframe->ufp.fpscr);
vfp_set_fpscr(env, fpscr);
diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
index 9e5b2fb31c..26a2c09868 100644
--- a/target/arm/arch_dump.c
+++ b/target/arm/arch_dump.c
@@ -99,8 +99,10 @@ static int aarch64_write_elf64_prfpreg(WriteCoreDumpFunction f,
aarch64_note_init(¬e, s, "CORE", 5, NT_PRFPREG, sizeof(note.vfp));
- for (i = 0; i < 64; ++i) {
- note.vfp.vregs[i] = cpu_to_dump64(s, float64_val(env->vfp.regs[i]));
+ for (i = 0; i < 32; ++i) {
+ uint64_t *q = aa64_vfp_qreg(env, i);
+ note.vfp.vregs[2*i + 0] = cpu_to_dump64(s, q[0]);
+ note.vfp.vregs[2*i + 1] = cpu_to_dump64(s, q[1]);
}
if (s->dump_info.d_endian == ELFDATA2MSB) {
@@ -229,7 +231,7 @@ static int arm_write_elf32_vfp(WriteCoreDumpFunction f, CPUARMState *env,
arm_note_init(¬e, s, "LINUX", 6, NT_ARM_VFP, sizeof(note.vfp));
for (i = 0; i < 32; ++i) {
- note.vfp.vregs[i] = cpu_to_dump64(s, float64_val(env->vfp.regs[i]));
+ note.vfp.vregs[i] = cpu_to_dump64(s, *aa32_vfp_dreg(env, i));
}
note.vfp.fpscr = cpu_to_dump32(s, vfp_get_fpscr(env));
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 0bcf02eeb5..750a088803 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -146,20 +146,21 @@ uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
* the table starts, and numregs the number of registers in the table.
* We return the results of the lookups.
*/
- int shift;
+ unsigned shift;
for (shift = 0; shift < 64; shift += 8) {
- int index = extract64(indices, shift, 8);
+ unsigned index = extract64(indices, shift, 8);
if (index < 16 * numregs) {
/* Convert index (a byte offset into the virtual table
* which is a series of 128-bit vectors concatenated)
- * into the correct vfp.regs[] element plus a bit offset
+ * into the correct register element plus a bit offset
* into that element, bearing in mind that the table
* can wrap around from V31 to V0.
*/
- int elt = (rn * 2 + (index >> 3)) % 64;
- int bitidx = (index & 7) * 8;
- uint64_t val = extract64(env->vfp.regs[elt], bitidx, 8);
+ unsigned elt = (rn * 2 + (index >> 3)) % 64;
+ unsigned bitidx = (index & 7) * 8;
+ uint64_t *q = aa64_vfp_qreg(env, elt >> 1);
+ uint64_t val = extract64(q[elt & 1], bitidx, 8);
result = deposit64(result, shift, 8, val);
}
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 02d1b57501..7f304111f0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -64,15 +64,16 @@ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
/* VFP data registers are always little-endian. */
nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
if (reg < nregs) {
- stfq_le_p(buf, env->vfp.regs[reg]);
+ stfq_le_p(buf, *aa32_vfp_dreg(env, reg));
return 8;
}
if (arm_feature(env, ARM_FEATURE_NEON)) {
/* Aliases for Q regs. */
nregs += 16;
if (reg < nregs) {
- stfq_le_p(buf, env->vfp.regs[(reg - 32) * 2]);
- stfq_le_p(buf + 8, env->vfp.regs[(reg - 32) * 2 + 1]);
+ uint64_t *q = aa32_vfp_dreg(env, (reg - 32) * 2);
+ stfq_le_p(buf, q[0]);
+ stfq_le_p(buf + 8, q[1]);
return 16;
}
}
@@ -90,14 +91,15 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
if (reg < nregs) {
- env->vfp.regs[reg] = ldfq_le_p(buf);
+ *aa32_vfp_dreg(env, reg) = ldfq_le_p(buf);
return 8;
}
if (arm_feature(env, ARM_FEATURE_NEON)) {
nregs += 16;
if (reg < nregs) {
- env->vfp.regs[(reg - 32) * 2] = ldfq_le_p(buf);
- env->vfp.regs[(reg - 32) * 2 + 1] = ldfq_le_p(buf + 8);
+ uint64_t *q = aa32_vfp_dreg(env, (reg - 32) * 2);
+ q[0] = ldfq_le_p(buf);
+ q[1] = ldfq_le_p(buf + 8);
return 16;
}
}
@@ -114,9 +116,12 @@ static int aarch64_fpu_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
switch (reg) {
case 0 ... 31:
/* 128 bit FP register */
- stfq_le_p(buf, env->vfp.regs[reg * 2]);
- stfq_le_p(buf + 8, env->vfp.regs[reg * 2 + 1]);
- return 16;
+ {
+ uint64_t *q = aa64_vfp_qreg(env, reg);
+ stfq_le_p(buf, q[0]);
+ stfq_le_p(buf + 8, q[1]);
+ return 16;
+ }
case 32:
/* FPSR */
stl_p(buf, vfp_get_fpsr(env));
@@ -135,9 +140,12 @@ static int aarch64_fpu_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
switch (reg) {
case 0 ... 31:
/* 128 bit FP register */
- env->vfp.regs[reg * 2] = ldfq_le_p(buf);
- env->vfp.regs[reg * 2 + 1] = ldfq_le_p(buf + 8);
- return 16;
+ {
+ uint64_t *q = aa64_vfp_qreg(env, reg);
+ q[0] = ldfq_le_p(buf);
+ q[1] = ldfq_le_p(buf + 8);
+ return 16;
+ }
case 32:
/* FPSR */
vfp_set_fpsr(env, ldl_p(buf));
diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
index f925a21481..f77c9c494b 100644
--- a/target/arm/kvm32.c
+++ b/target/arm/kvm32.c
@@ -358,7 +358,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
/* VFP registers */
r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
for (i = 0; i < 32; i++) {
- r.addr = (uintptr_t)(&env->vfp.regs[i]);
+ r.addr = (uintptr_t)aa32_vfp_dreg(env, i);
ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &r);
if (ret) {
return ret;
@@ -445,7 +445,7 @@ int kvm_arch_get_registers(CPUState *cs)
/* VFP registers */
r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
for (i = 0; i < 32; i++) {
- r.addr = (uintptr_t)(&env->vfp.regs[i]);
+ r.addr = (uintptr_t)aa32_vfp_dreg(env, i);
ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &r);
if (ret) {
return ret;
diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index 6554c30007..ac728494a4 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -696,21 +696,16 @@ int kvm_arch_put_registers(CPUState *cs, int level)
}
}
- /* Advanced SIMD and FP registers
- * We map Qn = regs[2n+1]:regs[2n]
- */
+ /* Advanced SIMD and FP registers. */
for (i = 0; i < 32; i++) {
- int rd = i << 1;
- uint64_t fp_val[2];
+ uint64_t *q = aa64_vfp_qreg(env, i);
#ifdef HOST_WORDS_BIGENDIAN
- fp_val[0] = env->vfp.regs[rd + 1];
- fp_val[1] = env->vfp.regs[rd];
+ uint64_t fp_val[2] = { q[1], q[0] };
+ reg.addr = (uintptr_t)fp_val;
#else
- fp_val[1] = env->vfp.regs[rd + 1];
- fp_val[0] = env->vfp.regs[rd];
+ reg.addr = (uintptr_t)q;
#endif
reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
- reg.addr = (uintptr_t)(&fp_val);
ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
if (ret) {
return ret;
@@ -837,24 +832,18 @@ int kvm_arch_get_registers(CPUState *cs)
env->spsr = env->banked_spsr[i];
}
- /* Advanced SIMD and FP registers
- * We map Qn = regs[2n+1]:regs[2n]
- */
+ /* Advanced SIMD and FP registers */
for (i = 0; i < 32; i++) {
- uint64_t fp_val[2];
+ uint64_t *q = aa64_vfp_qreg(env, i);
reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
- reg.addr = (uintptr_t)(&fp_val);
+ reg.addr = (uintptr_t)q;
ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
if (ret) {
return ret;
} else {
- int rd = i << 1;
#ifdef HOST_WORDS_BIGENDIAN
- env->vfp.regs[rd + 1] = fp_val[0];
- env->vfp.regs[rd] = fp_val[1];
-#else
- env->vfp.regs[rd + 1] = fp_val[1];
- env->vfp.regs[rd] = fp_val[0];
+ uint64_t t;
+ t = q[0], q[0] = q[1], q[1] = t;
#endif
}
}
diff --git a/target/arm/machine.c b/target/arm/machine.c
index 176274629c..a85c2430d3 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -50,7 +50,7 @@ static const VMStateDescription vmstate_vfp = {
.minimum_version_id = 3,
.needed = vfp_needed,
.fields = (VMStateField[]) {
- VMSTATE_FLOAT64_ARRAY(env.vfp.regs, ARMCPU, 64),
+ VMSTATE_UINT64_ARRAY(env.vfp.regs, ARMCPU, 64),
/* The xregs array is a little awkward because element 1 (FPSCR)
* requires a specific accessor, so we have to split it up in
* the vmstate:
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d246e8f6b5..e17c7269d4 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -170,15 +170,12 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
if (flags & CPU_DUMP_FPU) {
int numvfpregs = 32;
- for (i = 0; i < numvfpregs; i += 2) {
- uint64_t vlo = float64_val(env->vfp.regs[i * 2]);
- uint64_t vhi = float64_val(env->vfp.regs[(i * 2) + 1]);
- cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 " ",
- i, vhi, vlo);
- vlo = float64_val(env->vfp.regs[(i + 1) * 2]);
- vhi = float64_val(env->vfp.regs[((i + 1) * 2) + 1]);
- cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 "\n",
- i + 1, vhi, vlo);
+ for (i = 0; i < numvfpregs; i++) {
+ uint64_t *q = aa64_vfp_qreg(env, i);
+ uint64_t vlo = q[0];
+ uint64_t vhi = q[1];
+ cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 "%c",
+ i, vhi, vlo, (i & 1 ? '\n' : ' '));
}
cpu_fprintf(f, "FPCR: %08x FPSR: %08x\n",
vfp_get_fpcr(env), vfp_get_fpsr(env));
@@ -572,19 +569,13 @@ static TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
*/
static inline int fp_reg_offset(DisasContext *s, int regno, TCGMemOp size)
{
- int offs = offsetof(CPUARMState, vfp.regs[regno * 2]);
-#ifdef HOST_WORDS_BIGENDIAN
- offs += (8 - (1 << size));
-#endif
- assert_fp_access_checked(s);
- return offs;
+ return vec_reg_offset(s, regno, 0, size);
}
/* Offset of the high half of the 128 bit vector Qn */
static inline int fp_reg_hi_offset(DisasContext *s, int regno)
{
- assert_fp_access_checked(s);
- return offsetof(CPUARMState, vfp.regs[regno * 2 + 1]);
+ return vec_reg_offset(s, regno, 1, MO_64);
}
/* Convenience accessors for reading and writing single and double
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 55afd29b21..a93e26498e 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1516,14 +1516,16 @@ static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
static inline long
vfp_reg_offset (int dp, int reg)
{
- if (dp)
+ if (dp) {
return offsetof(CPUARMState, vfp.regs[reg]);
- else if (reg & 1) {
- return offsetof(CPUARMState, vfp.regs[reg >> 1])
- + offsetof(CPU_DoubleU, l.upper);
} else {
- return offsetof(CPUARMState, vfp.regs[reg >> 1])
- + offsetof(CPU_DoubleU, l.lower);
+ long ofs = offsetof(CPUARMState, vfp.regs[reg >> 1]);
+ if (reg & 1) {
+ ofs += offsetof(CPU_DoubleU, l.upper);
+ } else {
+ ofs += offsetof(CPU_DoubleU, l.lower);
+ }
+ return ofs;
}
}
@@ -12770,7 +12772,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
numvfpregs += 16;
}
for (i = 0; i < numvfpregs; i++) {
- uint64_t v = float64_val(env->vfp.regs[i]);
+ uint64_t v = *aa32_vfp_dreg(env, i);
cpu_fprintf(f, "s%02d=%08x s%02d=%08x d%02d=%016" PRIx64 "\n",
i * 2, (uint32_t)v,
i * 2 + 1, (uint32_t)(v >> 32),
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers
2017-12-18 17:30 ` [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers Richard Henderson
@ 2018-01-11 18:39 ` Peter Maydell
2018-01-11 19:29 ` Richard Henderson
2018-01-12 18:24 ` Peter Maydell
2018-01-12 18:44 ` Peter Maydell
2 siblings, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:39 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Helpers that return a pointer into env->vfp.regs so that we isolate
> the logic of how to index the regs array for different cpu modes.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 20 +++++++++++++++++++-
> linux-user/signal.c | 22 ++++++++++++----------
> target/arm/arch_dump.c | 8 +++++---
> target/arm/helper-a64.c | 13 +++++++------
> target/arm/helper.c | 32 ++++++++++++++++++++------------
> target/arm/kvm32.c | 4 ++--
> target/arm/kvm64.c | 31 ++++++++++---------------------
> target/arm/machine.c | 2 +-
> target/arm/translate-a64.c | 25 ++++++++-----------------
> target/arm/translate.c | 16 +++++++++-------
> 10 files changed, 93 insertions(+), 80 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 7a705a09a1..e1a8e2880d 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -493,7 +493,7 @@ typedef struct CPUARMState {
> * the two execution states, and means we do not need to explicitly
> * map these registers when changing states.
> */
> - float64 regs[64] QEMU_ALIGNED(16);
> + uint64_t regs[64] QEMU_ALIGNED(16);
Why are we changing the type of this field ?
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers
2018-01-11 18:39 ` Peter Maydell
@ 2018-01-11 19:29 ` Richard Henderson
0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2018-01-11 19:29 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/11/2018 10:39 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:30, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Helpers that return a pointer into env->vfp.regs so that we isolate
>> the logic of how to index the regs array for different cpu modes.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/cpu.h | 20 +++++++++++++++++++-
>> linux-user/signal.c | 22 ++++++++++++----------
>> target/arm/arch_dump.c | 8 +++++---
>> target/arm/helper-a64.c | 13 +++++++------
>> target/arm/helper.c | 32 ++++++++++++++++++++------------
>> target/arm/kvm32.c | 4 ++--
>> target/arm/kvm64.c | 31 ++++++++++---------------------
>> target/arm/machine.c | 2 +-
>> target/arm/translate-a64.c | 25 ++++++++-----------------
>> target/arm/translate.c | 16 +++++++++-------
>> 10 files changed, 93 insertions(+), 80 deletions(-)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 7a705a09a1..e1a8e2880d 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -493,7 +493,7 @@ typedef struct CPUARMState {
>> * the two execution states, and means we do not need to explicitly
>> * map these registers when changing states.
>> */
>> - float64 regs[64] QEMU_ALIGNED(16);
>> + uint64_t regs[64] QEMU_ALIGNED(16);
>
> Why are we changing the type of this field ?
Because that's how it's actually used. We were using casts before.
r~
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers
2017-12-18 17:30 ` [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers Richard Henderson
2018-01-11 18:39 ` Peter Maydell
@ 2018-01-12 18:24 ` Peter Maydell
2018-01-12 18:44 ` Richard Henderson
2018-01-12 18:44 ` Peter Maydell
2 siblings, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2018-01-12 18:24 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Helpers that return a pointer into env->vfp.regs so that we isolate
> the logic of how to index the regs array for different cpu modes.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 20 +++++++++++++++++++-
> linux-user/signal.c | 22 ++++++++++++----------
> target/arm/arch_dump.c | 8 +++++---
> target/arm/helper-a64.c | 13 +++++++------
> target/arm/helper.c | 32 ++++++++++++++++++++------------
> target/arm/kvm32.c | 4 ++--
> target/arm/kvm64.c | 31 ++++++++++---------------------
> target/arm/machine.c | 2 +-
> target/arm/translate-a64.c | 25 ++++++++-----------------
> target/arm/translate.c | 16 +++++++++-------
> 10 files changed, 93 insertions(+), 80 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 7a705a09a1..e1a8e2880d 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -493,7 +493,7 @@ typedef struct CPUARMState {
> * the two execution states, and means we do not need to explicitly
> * map these registers when changing states.
> */
> - float64 regs[64] QEMU_ALIGNED(16);
> + uint64_t regs[64] QEMU_ALIGNED(16);
>
> uint32_t xregs[16];
> /* We store these fpcsr fields separately for convenience. */
> @@ -2899,4 +2899,22 @@ static inline void *arm_get_el_change_hook_opaque(ARMCPU *cpu)
> return cpu->el_change_hook_opaque;
> }
>
> +/**
> + * aa32_vfp_dreg:
> + * Return a pointer to the Dn register within env in 32-bit mode.
> + */
> +static inline uint64_t *aa32_vfp_dreg(CPUARMState *env, unsigned regno)
> +{
> + return &env->vfp.regs[regno];
> +}
> +
> +/**
> + * aa64_vfp_dreg:
> + * Return a pointer to the Qn register within env in 64-bit mode.
> + */
> +static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
Comment and code disagree about the name of the function...
> +{
> + return &env->vfp.regs[2 * regno];
> +}
> +
> #endif
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index cf35473671..a9a3f41721 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -1487,12 +1487,13 @@ static int target_setup_sigframe(struct target_rt_sigframe *sf,
> }
>
> for (i = 0; i < 32; i++) {
> + uint64_t *q = aa64_vfp_qreg(env, i);
> #ifdef TARGET_WORDS_BIGENDIAN
> - __put_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2 + 1]);
> - __put_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2]);
> + __put_user(q[0], &aux->fpsimd.vregs[i * 2 + 1]);
> + __put_user(q[1], &aux->fpsimd.vregs[i * 2]);
> #else
> - __put_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2]);
> - __put_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2 + 1]);
> + __put_user(q[0], &aux->fpsimd.vregs[i * 2]);
> + __put_user(q[1], &aux->fpsimd.vregs[i * 2 + 1]);
> #endif
> }
> __put_user(vfp_get_fpsr(env), &aux->fpsimd.fpsr);
> @@ -1539,12 +1540,13 @@ static int target_restore_sigframe(CPUARMState *env,
> }
>
> for (i = 0; i < 32; i++) {
> + uint64_t *q = aa64_vfp_qreg(env, i);
> #ifdef TARGET_WORDS_BIGENDIAN
> - __get_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2 + 1]);
> - __get_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2]);
> + __get_user(q[0], &aux->fpsimd.vregs[i * 2 + 1]);
> + __get_user(q[1], &aux->fpsimd.vregs[i * 2]);
> #else
> - __get_user(env->vfp.regs[i * 2], &aux->fpsimd.vregs[i * 2]);
> - __get_user(env->vfp.regs[i * 2 + 1], &aux->fpsimd.vregs[i * 2 + 1]);
> + __get_user(q[0], &aux->fpsimd.vregs[i * 2]);
> + __get_user(q[1], &aux->fpsimd.vregs[i * 2 + 1]);
> #endif
> }
> __get_user(fpsr, &aux->fpsimd.fpsr);
> @@ -1899,7 +1901,7 @@ static abi_ulong *setup_sigframe_v2_vfp(abi_ulong *regspace, CPUARMState *env)
> __put_user(TARGET_VFP_MAGIC, &vfpframe->magic);
> __put_user(sizeof(*vfpframe), &vfpframe->size);
> for (i = 0; i < 32; i++) {
> - __put_user(float64_val(env->vfp.regs[i]), &vfpframe->ufp.fpregs[i]);
> + __put_user(*aa32_vfp_dreg(env, i), &vfpframe->ufp.fpregs[i]);
> }
> __put_user(vfp_get_fpscr(env), &vfpframe->ufp.fpscr);
> __put_user(env->vfp.xregs[ARM_VFP_FPEXC], &vfpframe->ufp_exc.fpexc);
> @@ -2206,7 +2208,7 @@ static abi_ulong *restore_sigframe_v2_vfp(CPUARMState *env, abi_ulong *regspace)
> return 0;
> }
> for (i = 0; i < 32; i++) {
> - __get_user(float64_val(env->vfp.regs[i]), &vfpframe->ufp.fpregs[i]);
> + __get_user(*aa32_vfp_dreg(env, i), &vfpframe->ufp.fpregs[i]);
> }
> __get_user(fpscr, &vfpframe->ufp.fpscr);
> vfp_set_fpscr(env, fpscr);
> diff --git a/target/arm/arch_dump.c b/target/arm/arch_dump.c
> index 9e5b2fb31c..26a2c09868 100644
> --- a/target/arm/arch_dump.c
> +++ b/target/arm/arch_dump.c
> @@ -99,8 +99,10 @@ static int aarch64_write_elf64_prfpreg(WriteCoreDumpFunction f,
>
> aarch64_note_init(¬e, s, "CORE", 5, NT_PRFPREG, sizeof(note.vfp));
>
> - for (i = 0; i < 64; ++i) {
> - note.vfp.vregs[i] = cpu_to_dump64(s, float64_val(env->vfp.regs[i]));
> + for (i = 0; i < 32; ++i) {
> + uint64_t *q = aa64_vfp_qreg(env, i);
> + note.vfp.vregs[2*i + 0] = cpu_to_dump64(s, q[0]);
> + note.vfp.vregs[2*i + 1] = cpu_to_dump64(s, q[1]);
This doesn't change the behaviour but I suspect it's wrong for big-endian...
> }
>
> if (s->dump_info.d_endian == ELFDATA2MSB) {
> @@ -229,7 +231,7 @@ static int arm_write_elf32_vfp(WriteCoreDumpFunction f, CPUARMState *env,
> arm_note_init(¬e, s, "LINUX", 6, NT_ARM_VFP, sizeof(note.vfp));
>
> for (i = 0; i < 32; ++i) {
> - note.vfp.vregs[i] = cpu_to_dump64(s, float64_val(env->vfp.regs[i]));
> + note.vfp.vregs[i] = cpu_to_dump64(s, *aa32_vfp_dreg(env, i));
> }
>
> note.vfp.fpscr = cpu_to_dump32(s, vfp_get_fpscr(env));
> diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
> index 0bcf02eeb5..750a088803 100644
> --- a/target/arm/helper-a64.c
> +++ b/target/arm/helper-a64.c
> @@ -146,20 +146,21 @@ uint64_t HELPER(simd_tbl)(CPUARMState *env, uint64_t result, uint64_t indices,
> * the table starts, and numregs the number of registers in the table.
> * We return the results of the lookups.
> */
> - int shift;
> + unsigned shift;
>
> for (shift = 0; shift < 64; shift += 8) {
> - int index = extract64(indices, shift, 8);
> + unsigned index = extract64(indices, shift, 8);
> if (index < 16 * numregs) {
> /* Convert index (a byte offset into the virtual table
> * which is a series of 128-bit vectors concatenated)
> - * into the correct vfp.regs[] element plus a bit offset
> + * into the correct register element plus a bit offset
> * into that element, bearing in mind that the table
> * can wrap around from V31 to V0.
> */
> - int elt = (rn * 2 + (index >> 3)) % 64;
> - int bitidx = (index & 7) * 8;
> - uint64_t val = extract64(env->vfp.regs[elt], bitidx, 8);
> + unsigned elt = (rn * 2 + (index >> 3)) % 64;
> + unsigned bitidx = (index & 7) * 8;
> + uint64_t *q = aa64_vfp_qreg(env, elt >> 1);
> + uint64_t val = extract64(q[elt & 1], bitidx, 8);
Any particular reason for changing all these ints to unsigned ?
>
> result = deposit64(result, shift, 8, val);
> }
> diff --git a/target/arm/helper.c b/target/arm/helper.c
> index 02d1b57501..7f304111f0 100644
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -64,15 +64,16 @@ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
> /* VFP data registers are always little-endian. */
> nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
> if (reg < nregs) {
> - stfq_le_p(buf, env->vfp.regs[reg]);
> + stfq_le_p(buf, *aa32_vfp_dreg(env, reg));
> return 8;
> }
> if (arm_feature(env, ARM_FEATURE_NEON)) {
> /* Aliases for Q regs. */
> nregs += 16;
> if (reg < nregs) {
> - stfq_le_p(buf, env->vfp.regs[(reg - 32) * 2]);
> - stfq_le_p(buf + 8, env->vfp.regs[(reg - 32) * 2 + 1]);
> + uint64_t *q = aa32_vfp_dreg(env, (reg - 32) * 2);
> + stfq_le_p(buf, q[0]);
> + stfq_le_p(buf + 8, q[1]);
> return 16;
> }
> }
> @@ -90,14 +91,15 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
>
> nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
> if (reg < nregs) {
> - env->vfp.regs[reg] = ldfq_le_p(buf);
> + *aa32_vfp_dreg(env, reg) = ldfq_le_p(buf);
> return 8;
> }
> if (arm_feature(env, ARM_FEATURE_NEON)) {
> nregs += 16;
> if (reg < nregs) {
> - env->vfp.regs[(reg - 32) * 2] = ldfq_le_p(buf);
> - env->vfp.regs[(reg - 32) * 2 + 1] = ldfq_le_p(buf + 8);
> + uint64_t *q = aa32_vfp_dreg(env, (reg - 32) * 2);
> + q[0] = ldfq_le_p(buf);
> + q[1] = ldfq_le_p(buf + 8);
> return 16;
> }
> }
> @@ -114,9 +116,12 @@ static int aarch64_fpu_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
> switch (reg) {
> case 0 ... 31:
> /* 128 bit FP register */
> - stfq_le_p(buf, env->vfp.regs[reg * 2]);
> - stfq_le_p(buf + 8, env->vfp.regs[reg * 2 + 1]);
> - return 16;
> + {
> + uint64_t *q = aa64_vfp_qreg(env, reg);
> + stfq_le_p(buf, q[0]);
> + stfq_le_p(buf + 8, q[1]);
> + return 16;
> + }
> case 32:
> /* FPSR */
> stl_p(buf, vfp_get_fpsr(env));
> @@ -135,9 +140,12 @@ static int aarch64_fpu_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
> switch (reg) {
> case 0 ... 31:
> /* 128 bit FP register */
> - env->vfp.regs[reg * 2] = ldfq_le_p(buf);
> - env->vfp.regs[reg * 2 + 1] = ldfq_le_p(buf + 8);
> - return 16;
> + {
> + uint64_t *q = aa64_vfp_qreg(env, reg);
> + q[0] = ldfq_le_p(buf);
> + q[1] = ldfq_le_p(buf + 8);
> + return 16;
> + }
> case 32:
> /* FPSR */
> vfp_set_fpsr(env, ldl_p(buf));
> diff --git a/target/arm/kvm32.c b/target/arm/kvm32.c
> index f925a21481..f77c9c494b 100644
> --- a/target/arm/kvm32.c
> +++ b/target/arm/kvm32.c
> @@ -358,7 +358,7 @@ int kvm_arch_put_registers(CPUState *cs, int level)
> /* VFP registers */
> r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
> for (i = 0; i < 32; i++) {
> - r.addr = (uintptr_t)(&env->vfp.regs[i]);
> + r.addr = (uintptr_t)aa32_vfp_dreg(env, i);
> ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, &r);
> if (ret) {
> return ret;
> @@ -445,7 +445,7 @@ int kvm_arch_get_registers(CPUState *cs)
> /* VFP registers */
> r.id = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
> for (i = 0; i < 32; i++) {
> - r.addr = (uintptr_t)(&env->vfp.regs[i]);
> + r.addr = (uintptr_t)aa32_vfp_dreg(env, i);
> ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, &r);
> if (ret) {
> return ret;
> diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
> index 6554c30007..ac728494a4 100644
> --- a/target/arm/kvm64.c
> +++ b/target/arm/kvm64.c
> @@ -696,21 +696,16 @@ int kvm_arch_put_registers(CPUState *cs, int level)
> }
> }
>
> - /* Advanced SIMD and FP registers
> - * We map Qn = regs[2n+1]:regs[2n]
> - */
> + /* Advanced SIMD and FP registers. */
> for (i = 0; i < 32; i++) {
> - int rd = i << 1;
> - uint64_t fp_val[2];
> + uint64_t *q = aa64_vfp_qreg(env, i);
> #ifdef HOST_WORDS_BIGENDIAN
> - fp_val[0] = env->vfp.regs[rd + 1];
> - fp_val[1] = env->vfp.regs[rd];
> + uint64_t fp_val[2] = { q[1], q[0] };
> + reg.addr = (uintptr_t)fp_val;
> #else
> - fp_val[1] = env->vfp.regs[rd + 1];
> - fp_val[0] = env->vfp.regs[rd];
> + reg.addr = (uintptr_t)q;
> #endif
> reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
> - reg.addr = (uintptr_t)(&fp_val);
> ret = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, ®);
> if (ret) {
> return ret;
> @@ -837,24 +832,18 @@ int kvm_arch_get_registers(CPUState *cs)
> env->spsr = env->banked_spsr[i];
> }
>
> - /* Advanced SIMD and FP registers
> - * We map Qn = regs[2n+1]:regs[2n]
> - */
> + /* Advanced SIMD and FP registers */
> for (i = 0; i < 32; i++) {
> - uint64_t fp_val[2];
> + uint64_t *q = aa64_vfp_qreg(env, i);
> reg.id = AARCH64_SIMD_CORE_REG(fp_regs.vregs[i]);
> - reg.addr = (uintptr_t)(&fp_val);
> + reg.addr = (uintptr_t)q;
> ret = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, ®);
> if (ret) {
> return ret;
> } else {
> - int rd = i << 1;
> #ifdef HOST_WORDS_BIGENDIAN
> - env->vfp.regs[rd + 1] = fp_val[0];
> - env->vfp.regs[rd] = fp_val[1];
> -#else
> - env->vfp.regs[rd + 1] = fp_val[1];
> - env->vfp.regs[rd] = fp_val[0];
> + uint64_t t;
> + t = q[0], q[0] = q[1], q[1] = t;
> #endif
> }
> }
> diff --git a/target/arm/machine.c b/target/arm/machine.c
> index 176274629c..a85c2430d3 100644
> --- a/target/arm/machine.c
> +++ b/target/arm/machine.c
> @@ -50,7 +50,7 @@ static const VMStateDescription vmstate_vfp = {
> .minimum_version_id = 3,
> .needed = vfp_needed,
> .fields = (VMStateField[]) {
> - VMSTATE_FLOAT64_ARRAY(env.vfp.regs, ARMCPU, 64),
> + VMSTATE_UINT64_ARRAY(env.vfp.regs, ARMCPU, 64),
> /* The xregs array is a little awkward because element 1 (FPSCR)
> * requires a specific accessor, so we have to split it up in
> * the vmstate:
> diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
> index d246e8f6b5..e17c7269d4 100644
> --- a/target/arm/translate-a64.c
> +++ b/target/arm/translate-a64.c
> @@ -170,15 +170,12 @@ void aarch64_cpu_dump_state(CPUState *cs, FILE *f,
>
> if (flags & CPU_DUMP_FPU) {
> int numvfpregs = 32;
> - for (i = 0; i < numvfpregs; i += 2) {
> - uint64_t vlo = float64_val(env->vfp.regs[i * 2]);
> - uint64_t vhi = float64_val(env->vfp.regs[(i * 2) + 1]);
> - cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 " ",
> - i, vhi, vlo);
> - vlo = float64_val(env->vfp.regs[(i + 1) * 2]);
> - vhi = float64_val(env->vfp.regs[((i + 1) * 2) + 1]);
> - cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 "\n",
> - i + 1, vhi, vlo);
> + for (i = 0; i < numvfpregs; i++) {
> + uint64_t *q = aa64_vfp_qreg(env, i);
> + uint64_t vlo = q[0];
> + uint64_t vhi = q[1];
> + cpu_fprintf(f, "q%02d=%016" PRIx64 ":%016" PRIx64 "%c",
> + i, vhi, vlo, (i & 1 ? '\n' : ' '));
> }
> cpu_fprintf(f, "FPCR: %08x FPSR: %08x\n",
> vfp_get_fpcr(env), vfp_get_fpsr(env));
> @@ -572,19 +569,13 @@ static TCGv_ptr vec_full_reg_ptr(DisasContext *s, int regno)
> */
> static inline int fp_reg_offset(DisasContext *s, int regno, TCGMemOp size)
> {
> - int offs = offsetof(CPUARMState, vfp.regs[regno * 2]);
> -#ifdef HOST_WORDS_BIGENDIAN
> - offs += (8 - (1 << size));
> -#endif
> - assert_fp_access_checked(s);
> - return offs;
> + return vec_reg_offset(s, regno, 0, size);
> }
>
> /* Offset of the high half of the 128 bit vector Qn */
> static inline int fp_reg_hi_offset(DisasContext *s, int regno)
> {
> - assert_fp_access_checked(s);
> - return offsetof(CPUARMState, vfp.regs[regno * 2 + 1]);
> + return vec_reg_offset(s, regno, 1, MO_64);
> }
>
> /* Convenience accessors for reading and writing single and double
> diff --git a/target/arm/translate.c b/target/arm/translate.c
> index 55afd29b21..a93e26498e 100644
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -1516,14 +1516,16 @@ static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
> static inline long
> vfp_reg_offset (int dp, int reg)
> {
> - if (dp)
> + if (dp) {
> return offsetof(CPUARMState, vfp.regs[reg]);
> - else if (reg & 1) {
> - return offsetof(CPUARMState, vfp.regs[reg >> 1])
> - + offsetof(CPU_DoubleU, l.upper);
> } else {
> - return offsetof(CPUARMState, vfp.regs[reg >> 1])
> - + offsetof(CPU_DoubleU, l.lower);
> + long ofs = offsetof(CPUARMState, vfp.regs[reg >> 1]);
> + if (reg & 1) {
> + ofs += offsetof(CPU_DoubleU, l.upper);
> + } else {
> + ofs += offsetof(CPU_DoubleU, l.lower);
> + }
> + return ofs;
> }
This hunk seems to just be rearranging the if-logic without
doing anything else, right?
> }
>
> @@ -12770,7 +12772,7 @@ void arm_cpu_dump_state(CPUState *cs, FILE *f, fprintf_function cpu_fprintf,
> numvfpregs += 16;
> }
> for (i = 0; i < numvfpregs; i++) {
> - uint64_t v = float64_val(env->vfp.regs[i]);
> + uint64_t v = *aa32_vfp_dreg(env, i);
> cpu_fprintf(f, "s%02d=%08x s%02d=%08x d%02d=%016" PRIx64 "\n",
> i * 2, (uint32_t)v,
> i * 2 + 1, (uint32_t)(v >> 32),
> --
> 2.14.3
>
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers
2018-01-12 18:24 ` Peter Maydell
@ 2018-01-12 18:44 ` Richard Henderson
0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2018-01-12 18:44 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/12/2018 10:24 AM, Peter Maydell wrote:
>> +/**
>> + * aa32_vfp_dreg:
>> + * Return a pointer to the Dn register within env in 32-bit mode.
>> + */
>> +static inline uint64_t *aa32_vfp_dreg(CPUARMState *env, unsigned regno)
>> +{
>> + return &env->vfp.regs[regno];
>> +}
>> +
>> +/**
>> + * aa64_vfp_dreg:
>> + * Return a pointer to the Qn register within env in 64-bit mode.
>> + */
>> +static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
>
> Comment and code disagree about the name of the function...
Oops.
>> @@ -99,8 +99,10 @@ static int aarch64_write_elf64_prfpreg(WriteCoreDumpFunction f,
>>
>> aarch64_note_init(¬e, s, "CORE", 5, NT_PRFPREG, sizeof(note.vfp));
>>
>> - for (i = 0; i < 64; ++i) {
>> - note.vfp.vregs[i] = cpu_to_dump64(s, float64_val(env->vfp.regs[i]));
>> + for (i = 0; i < 32; ++i) {
>> + uint64_t *q = aa64_vfp_qreg(env, i);
>> + note.vfp.vregs[2*i + 0] = cpu_to_dump64(s, q[0]);
>> + note.vfp.vregs[2*i + 1] = cpu_to_dump64(s, q[1]);
>
> This doesn't change the behaviour but I suspect it's wrong for big-endian...
Perhaps. Since this doesn't change behaviour I'll leave this patch as-is.
That said, how does one go about testing aarch64-big-endian? I don't think
I've seen a distro for that...
>> - int elt = (rn * 2 + (index >> 3)) % 64;
>> - int bitidx = (index & 7) * 8;
>> - uint64_t val = extract64(env->vfp.regs[elt], bitidx, 8);
>> + unsigned elt = (rn * 2 + (index >> 3)) % 64;
>> + unsigned bitidx = (index & 7) * 8;
>> + uint64_t *q = aa64_vfp_qreg(env, elt >> 1);
>> + uint64_t val = extract64(q[elt & 1], bitidx, 8);
>
>
> Any particular reason for changing all these ints to unsigned ?
*shrug* unsigned division by power-of-two is simpler than signed. And of
course we know these values must be positive. I can drop that change if you want.
>> vfp_reg_offset (int dp, int reg)
>> {
>> - if (dp)
>> + if (dp) {
>> return offsetof(CPUARMState, vfp.regs[reg]);
>> - else if (reg & 1) {
>> - return offsetof(CPUARMState, vfp.regs[reg >> 1])
>> - + offsetof(CPU_DoubleU, l.upper);
>> } else {
>> - return offsetof(CPUARMState, vfp.regs[reg >> 1])
>> - + offsetof(CPU_DoubleU, l.lower);
>> + long ofs = offsetof(CPUARMState, vfp.regs[reg >> 1]);
>> + if (reg & 1) {
>> + ofs += offsetof(CPU_DoubleU, l.upper);
>> + } else {
>> + ofs += offsetof(CPU_DoubleU, l.lower);
>> + }
>> + return ofs;
>> }
>
> This hunk seems to just be rearranging the if-logic without
> doing anything else, right?
There is a name change in there. But otherwise it's a re-factor to avoid
over-long lines and reduce duplication.
r~
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers
2017-12-18 17:30 ` [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers Richard Henderson
2018-01-11 18:39 ` Peter Maydell
2018-01-12 18:24 ` Peter Maydell
@ 2018-01-12 18:44 ` Peter Maydell
2018-01-12 18:47 ` Richard Henderson
2 siblings, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2018-01-12 18:44 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Helpers that return a pointer into env->vfp.regs so that we isolate
> the logic of how to index the regs array for different cpu modes.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> --- a/target/arm/helper.c
> +++ b/target/arm/helper.c
> @@ -64,15 +64,16 @@ static int vfp_gdb_get_reg(CPUARMState *env, uint8_t *buf, int reg)
> /* VFP data registers are always little-endian. */
> nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
> if (reg < nregs) {
> - stfq_le_p(buf, env->vfp.regs[reg]);
> + stfq_le_p(buf, *aa32_vfp_dreg(env, reg));
> return 8;
> }
> if (arm_feature(env, ARM_FEATURE_NEON)) {
> /* Aliases for Q regs. */
> nregs += 16;
> if (reg < nregs) {
> - stfq_le_p(buf, env->vfp.regs[(reg - 32) * 2]);
> - stfq_le_p(buf + 8, env->vfp.regs[(reg - 32) * 2 + 1]);
> + uint64_t *q = aa32_vfp_dreg(env, (reg - 32) * 2);
> + stfq_le_p(buf, q[0]);
> + stfq_le_p(buf + 8, q[1]);
> return 16;
> }
> }
> @@ -90,14 +91,15 @@ static int vfp_gdb_set_reg(CPUARMState *env, uint8_t *buf, int reg)
>
> nregs = arm_feature(env, ARM_FEATURE_VFP3) ? 32 : 16;
> if (reg < nregs) {
> - env->vfp.regs[reg] = ldfq_le_p(buf);
> + *aa32_vfp_dreg(env, reg) = ldfq_le_p(buf);
> return 8;
> }
> if (arm_feature(env, ARM_FEATURE_NEON)) {
> nregs += 16;
> if (reg < nregs) {
> - env->vfp.regs[(reg - 32) * 2] = ldfq_le_p(buf);
> - env->vfp.regs[(reg - 32) * 2 + 1] = ldfq_le_p(buf + 8);
> + uint64_t *q = aa32_vfp_dreg(env, (reg - 32) * 2);
> + q[0] = ldfq_le_p(buf);
> + q[1] = ldfq_le_p(buf + 8);
> return 16;
> }
> }
After reading patch 7 I came back to this one. I wonder if these two
(which I think are the only ones) justify an aa32_vfp_qreg() ?
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers
2018-01-12 18:44 ` Peter Maydell
@ 2018-01-12 18:47 ` Richard Henderson
0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2018-01-12 18:47 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/12/2018 10:44 AM, Peter Maydell wrote:
>> if (arm_feature(env, ARM_FEATURE_NEON)) {
>> nregs += 16;
>> if (reg < nregs) {
>> - env->vfp.regs[(reg - 32) * 2] = ldfq_le_p(buf);
>> - env->vfp.regs[(reg - 32) * 2 + 1] = ldfq_le_p(buf + 8);
>> + uint64_t *q = aa32_vfp_dreg(env, (reg - 32) * 2);
>> + q[0] = ldfq_le_p(buf);
>> + q[1] = ldfq_le_p(buf + 8);
>> return 16;
>> }
>> }
>
> After reading patch 7 I came back to this one. I wonder if these two
> (which I think are the only ones) justify an aa32_vfp_qreg() ?
That does sound like a nice cleanup. I'll include that next round.
r~
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 6/9] vmstate: Add VMSTATE_UINT64_SUB_ARRAY
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
` (4 preceding siblings ...)
2017-12-18 17:30 ` [Qemu-devel] [PATCH 5/9] target/arm: Add aa32_vfp_dreg/aa64_vfp_qreg helpers Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2017-12-18 20:29 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
2018-01-11 18:46 ` [Qemu-devel] " Peter Maydell
2017-12-18 17:30 ` [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE Richard Henderson
` (2 subsequent siblings)
8 siblings, 2 replies; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
At the same time, move VMSTATE_UINT32_SUB_ARRAY
beside the other UINT32 definitions.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/migration/vmstate.h | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 88b55df5ae..8c3889433c 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -905,6 +905,9 @@ extern const VMStateInfo vmstate_info_qtailq;
#define VMSTATE_UINT32_ARRAY(_f, _s, _n) \
VMSTATE_UINT32_ARRAY_V(_f, _s, _n, 0)
+#define VMSTATE_UINT32_SUB_ARRAY(_f, _s, _start, _num) \
+ VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint32, uint32_t)
+
#define VMSTATE_UINT32_2DARRAY(_f, _s, _n1, _n2) \
VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, 0)
@@ -914,6 +917,9 @@ extern const VMStateInfo vmstate_info_qtailq;
#define VMSTATE_UINT64_ARRAY(_f, _s, _n) \
VMSTATE_UINT64_ARRAY_V(_f, _s, _n, 0)
+#define VMSTATE_UINT64_SUB_ARRAY(_f, _s, _start, _num) \
+ VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint64, uint64_t)
+
#define VMSTATE_UINT64_2DARRAY(_f, _s, _n1, _n2) \
VMSTATE_UINT64_2DARRAY_V(_f, _s, _n1, _n2, 0)
@@ -932,9 +938,6 @@ extern const VMStateInfo vmstate_info_qtailq;
#define VMSTATE_INT32_ARRAY(_f, _s, _n) \
VMSTATE_INT32_ARRAY_V(_f, _s, _n, 0)
-#define VMSTATE_UINT32_SUB_ARRAY(_f, _s, _start, _num) \
- VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint32, uint32_t)
-
#define VMSTATE_INT64_ARRAY_V(_f, _s, _n, _v) \
VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_int64, int64_t)
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [Qemu-arm] [PATCH 6/9] vmstate: Add VMSTATE_UINT64_SUB_ARRAY
2017-12-18 17:30 ` [Qemu-devel] [PATCH 6/9] vmstate: Add VMSTATE_UINT64_SUB_ARRAY Richard Henderson
@ 2017-12-18 20:29 ` Philippe Mathieu-Daudé
2018-01-11 18:46 ` [Qemu-devel] " Peter Maydell
1 sibling, 0 replies; 31+ messages in thread
From: Philippe Mathieu-Daudé @ 2017-12-18 20:29 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: peter.maydell, qemu-arm
On 12/18/2017 02:30 PM, Richard Henderson wrote:
> At the same time, move VMSTATE_UINT32_SUB_ARRAY
> beside the other UINT32 definitions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
> include/migration/vmstate.h | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 88b55df5ae..8c3889433c 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -905,6 +905,9 @@ extern const VMStateInfo vmstate_info_qtailq;
> #define VMSTATE_UINT32_ARRAY(_f, _s, _n) \
> VMSTATE_UINT32_ARRAY_V(_f, _s, _n, 0)
>
> +#define VMSTATE_UINT32_SUB_ARRAY(_f, _s, _start, _num) \
> + VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint32, uint32_t)
> +
> #define VMSTATE_UINT32_2DARRAY(_f, _s, _n1, _n2) \
> VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, 0)
>
> @@ -914,6 +917,9 @@ extern const VMStateInfo vmstate_info_qtailq;
> #define VMSTATE_UINT64_ARRAY(_f, _s, _n) \
> VMSTATE_UINT64_ARRAY_V(_f, _s, _n, 0)
>
> +#define VMSTATE_UINT64_SUB_ARRAY(_f, _s, _start, _num) \
> + VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint64, uint64_t)
> +
> #define VMSTATE_UINT64_2DARRAY(_f, _s, _n1, _n2) \
> VMSTATE_UINT64_2DARRAY_V(_f, _s, _n1, _n2, 0)
>
> @@ -932,9 +938,6 @@ extern const VMStateInfo vmstate_info_qtailq;
> #define VMSTATE_INT32_ARRAY(_f, _s, _n) \
> VMSTATE_INT32_ARRAY_V(_f, _s, _n, 0)
>
> -#define VMSTATE_UINT32_SUB_ARRAY(_f, _s, _start, _num) \
> - VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint32, uint32_t)
> -
> #define VMSTATE_INT64_ARRAY_V(_f, _s, _n, _v) \
> VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_int64, int64_t)
>
>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 6/9] vmstate: Add VMSTATE_UINT64_SUB_ARRAY
2017-12-18 17:30 ` [Qemu-devel] [PATCH 6/9] vmstate: Add VMSTATE_UINT64_SUB_ARRAY Richard Henderson
2017-12-18 20:29 ` [Qemu-devel] [Qemu-arm] " Philippe Mathieu-Daudé
@ 2018-01-11 18:46 ` Peter Maydell
1 sibling, 0 replies; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> At the same time, move VMSTATE_UINT32_SUB_ARRAY
> beside the other UINT32 definitions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> include/migration/vmstate.h | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 88b55df5ae..8c3889433c 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -905,6 +905,9 @@ extern const VMStateInfo vmstate_info_qtailq;
> #define VMSTATE_UINT32_ARRAY(_f, _s, _n) \
> VMSTATE_UINT32_ARRAY_V(_f, _s, _n, 0)
>
> +#define VMSTATE_UINT32_SUB_ARRAY(_f, _s, _start, _num) \
> + VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint32, uint32_t)
> +
> #define VMSTATE_UINT32_2DARRAY(_f, _s, _n1, _n2) \
> VMSTATE_UINT32_2DARRAY_V(_f, _s, _n1, _n2, 0)
>
> @@ -914,6 +917,9 @@ extern const VMStateInfo vmstate_info_qtailq;
> #define VMSTATE_UINT64_ARRAY(_f, _s, _n) \
> VMSTATE_UINT64_ARRAY_V(_f, _s, _n, 0)
>
> +#define VMSTATE_UINT64_SUB_ARRAY(_f, _s, _start, _num) \
> + VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint64, uint64_t)
> +
> #define VMSTATE_UINT64_2DARRAY(_f, _s, _n1, _n2) \
> VMSTATE_UINT64_2DARRAY_V(_f, _s, _n1, _n2, 0)
>
> @@ -932,9 +938,6 @@ extern const VMStateInfo vmstate_info_qtailq;
> #define VMSTATE_INT32_ARRAY(_f, _s, _n) \
> VMSTATE_INT32_ARRAY_V(_f, _s, _n, 0)
>
> -#define VMSTATE_UINT32_SUB_ARRAY(_f, _s, _start, _num) \
> - VMSTATE_SUB_ARRAY(_f, _s, _start, _num, 0, vmstate_info_uint32, uint32_t)
> -
> #define VMSTATE_INT64_ARRAY_V(_f, _s, _n, _v) \
> VMSTATE_ARRAY(_f, _s, _n, _v, vmstate_info_int64, int64_t)
>
> --
> 2.14.3
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
` (5 preceding siblings ...)
2017-12-18 17:30 ` [Qemu-devel] [PATCH 6/9] vmstate: Add VMSTATE_UINT64_SUB_ARRAY Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2018-01-11 18:44 ` Peter Maydell
2018-01-12 18:38 ` Peter Maydell
2017-12-18 17:30 ` [Qemu-devel] [PATCH 8/9] target/arm: Add ZCR.LEN to tb->flags Richard Henderson
2017-12-18 17:30 ` [Qemu-devel] [PATCH 9/9] target/arm: Add ARM_FEATURE_SVE Richard Henderson
8 siblings, 2 replies; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Change vfp.regs as a uint64_t to vfp.zregs as an ARMVectorReg.
The previous patches have made the change in representation
relatively painless.
Add vfp.pregs as an ARMPredicateReg. Let FFR be P16 to make
it easier to treat it as for any other predicate.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 64 ++++++++++++++++++++++++++++++++--------------
target/arm/machine.c | 37 ++++++++++++++++++++++++++-
target/arm/translate-a64.c | 8 +++---
target/arm/translate.c | 12 ++++-----
4 files changed, 90 insertions(+), 31 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index e1a8e2880d..150b0d9d84 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -153,6 +153,47 @@ typedef struct {
uint32_t base_mask;
} TCR;
+/* Define a maximum sized vector register.
+ * For 32-bit, this is a 128-bit NEON/AdvSIMD register.
+ * For 64-bit, this is a 2048-bit SVE register.
+ *
+ * Note that the mapping between S, D, and Q views of the register bank
+ * differs between AArch64 and AArch32.
+ * In AArch32:
+ * Qn = regs[n].d[1]:regs[n].d[0]
+ * Dn = regs[n / 2].d[n & 1]
+ * Sn = regs[n / 4].d[n % 4 / 2],
+ * bits 31..0 for even n, and bits 63..32 for odd n
+ * (and regs[16] to regs[31] are inaccessible)
+ * In AArch64:
+ * Zn = regs[n].d[*]
+ * Qn = regs[n].d[1]:regs[n].d[0]
+ * Dn = regs[n].d[0]
+ * Sn = regs[n].d[0] bits 31..0
+ *
+ * This corresponds to the architecturally defined mapping between
+ * the two execution states, and means we do not need to explicitly
+ * map these registers when changing states.
+ */
+
+#ifdef TARGET_AARCH64
+# define ARM_MAX_VQ 16
+#else
+# define ARM_MAX_VQ 1
+#endif
+
+typedef struct ARMVectorReg {
+ uint64_t d[2 * ARM_MAX_VQ] QEMU_ALIGNED(16);
+} ARMVectorReg;
+
+typedef struct ARMPredicateReg {
+#ifdef TARGET_AARCH64
+ uint64_t p[2 * ARM_MAX_VQ / 8] QEMU_ALIGNED(16);
+#else
+ uint64_t p[0];
+#endif
+} ARMPredicateReg;
+
typedef struct CPUARMState {
/* Regs for current mode. */
uint32_t regs[16];
@@ -477,23 +518,8 @@ typedef struct CPUARMState {
/* VFP coprocessor state. */
struct {
- /* VFP/Neon register state. Note that the mapping between S, D and Q
- * views of the register bank differs between AArch64 and AArch32:
- * In AArch32:
- * Qn = regs[2n+1]:regs[2n]
- * Dn = regs[n]
- * Sn = regs[n/2] bits 31..0 for even n, and bits 63..32 for odd n
- * (and regs[32] to regs[63] are inaccessible)
- * In AArch64:
- * Qn = regs[2n+1]:regs[2n]
- * Dn = regs[2n]
- * Sn = regs[2n] bits 31..0
- * Hn = regs[2n] bits 15..0 for even n, and bits 31..16 for odd n
- * This corresponds to the architecturally defined mapping between
- * the two execution states, and means we do not need to explicitly
- * map these registers when changing states.
- */
- uint64_t regs[64] QEMU_ALIGNED(16);
+ ARMVectorReg zregs[32];
+ ARMPredicateReg pregs[17]; /* ffr is pregs[16] */
uint32_t xregs[16];
/* We store these fpcsr fields separately for convenience. */
@@ -2905,7 +2931,7 @@ static inline void *arm_get_el_change_hook_opaque(ARMCPU *cpu)
*/
static inline uint64_t *aa32_vfp_dreg(CPUARMState *env, unsigned regno)
{
- return &env->vfp.regs[regno];
+ return &env->vfp.zregs[regno >> 1].d[regno & 1];
}
/**
@@ -2914,7 +2940,7 @@ static inline uint64_t *aa32_vfp_dreg(CPUARMState *env, unsigned regno)
*/
static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, unsigned regno)
{
- return &env->vfp.regs[2 * regno];
+ return &env->vfp.zregs[regno].d[0];
}
#endif
diff --git a/target/arm/machine.c b/target/arm/machine.c
index a85c2430d3..39f3123b10 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -50,7 +50,42 @@ static const VMStateDescription vmstate_vfp = {
.minimum_version_id = 3,
.needed = vfp_needed,
.fields = (VMStateField[]) {
- VMSTATE_UINT64_ARRAY(env.vfp.regs, ARMCPU, 64),
+ /* For compatibility, store Qn out of Zn here. */
+ /* TODO: store the other quadwords in another subsection,
+ along with predicate registers and ZCR_EL[1-3]. */
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[0].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[1].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[2].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[3].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[4].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[5].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[6].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[7].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[8].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[9].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[10].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[11].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[12].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[13].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[14].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[15].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[16].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[17].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[18].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[19].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[20].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[21].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[22].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[23].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[24].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[25].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[26].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[27].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[28].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[29].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[30].d, ARMCPU, 0, 2),
+ VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[31].d, ARMCPU, 0, 2),
+
/* The xregs array is a little awkward because element 1 (FPSCR)
* requires a specific accessor, so we have to split it up in
* the vmstate:
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e17c7269d4..487408a7a7 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -523,8 +523,8 @@ static inline int vec_reg_offset(DisasContext *s, int regno,
{
int offs = 0;
#ifdef HOST_WORDS_BIGENDIAN
- /* This is complicated slightly because vfp.regs[2n] is
- * still the low half and vfp.regs[2n+1] the high half
+ /* This is complicated slightly because vfp.zregs[n].d[0] is
+ * still the low half and vfp.zregs[n].d[1] the high half
* of the 128 bit vector, even on big endian systems.
* Calculate the offset assuming a fully bigendian 128 bits,
* then XOR to account for the order of the two 64 bit halves.
@@ -534,7 +534,7 @@ static inline int vec_reg_offset(DisasContext *s, int regno,
#else
offs += element * (1 << size);
#endif
- offs += offsetof(CPUARMState, vfp.regs[regno * 2]);
+ offs += offsetof(CPUARMState, vfp.zregs[regno]);
assert_fp_access_checked(s);
return offs;
}
@@ -543,7 +543,7 @@ static inline int vec_reg_offset(DisasContext *s, int regno,
static inline int vec_full_reg_offset(DisasContext *s, int regno)
{
assert_fp_access_checked(s);
- return offsetof(CPUARMState, vfp.regs[regno * 2]);
+ return offsetof(CPUARMState, vfp.zregs[regno]);
}
/* Return the byte size of the "whole" vector register, VL / 8. */
diff --git a/target/arm/translate.c b/target/arm/translate.c
index a93e26498e..7a5e493333 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -1513,19 +1513,17 @@ static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
}
}
-static inline long
-vfp_reg_offset (int dp, int reg)
+static inline long vfp_reg_offset(bool dp, unsigned reg)
{
if (dp) {
- return offsetof(CPUARMState, vfp.regs[reg]);
+ return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
} else {
- long ofs = offsetof(CPUARMState, vfp.regs[reg >> 1]);
+ long r = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
if (reg & 1) {
- ofs += offsetof(CPU_DoubleU, l.upper);
+ return r + offsetof(CPU_DoubleU, l.upper);
} else {
- ofs += offsetof(CPU_DoubleU, l.lower);
+ return r + offsetof(CPU_DoubleU, l.lower);
}
- return ofs;
}
}
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE
2017-12-18 17:30 ` [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE Richard Henderson
@ 2018-01-11 18:44 ` Peter Maydell
2018-01-11 18:58 ` Dr. David Alan Gilbert
2018-01-12 18:38 ` Peter Maydell
1 sibling, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:44 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm, Dr. David Alan Gilbert
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Change vfp.regs as a uint64_t to vfp.zregs as an ARMVectorReg.
> The previous patches have made the change in representation
> relatively painless.
>
> Add vfp.pregs as an ARMPredicateReg. Let FFR be P16 to make
> it easier to treat it as for any other predicate.
> diff --git a/target/arm/machine.c b/target/arm/machine.c
> index a85c2430d3..39f3123b10 100644
> --- a/target/arm/machine.c
> +++ b/target/arm/machine.c
> @@ -50,7 +50,42 @@ static const VMStateDescription vmstate_vfp = {
> .minimum_version_id = 3,
> .needed = vfp_needed,
> .fields = (VMStateField[]) {
> - VMSTATE_UINT64_ARRAY(env.vfp.regs, ARMCPU, 64),
> + /* For compatibility, store Qn out of Zn here. */
> + /* TODO: store the other quadwords in another subsection,
> + along with predicate registers and ZCR_EL[1-3]. */
I think we should get the migration state stuff right at the point
where we change the data structure in cpu.h.
> + VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[0].d, ARMCPU, 0, 2),
> + VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[1].d, ARMCPU, 0, 2),
> + VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[2].d, ARMCPU, 0, 2),
David: does the migration code guarantee that we can split an
old VMSTATE_UINT64_ARRAY into being a sequence of SUB_ARRAYs
like this and retain the same on-the-wire format (ie and
keep migration compat) ?
I suspect we've used this clever trick before elsewhere, but thought
it better to check...
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE
2018-01-11 18:44 ` Peter Maydell
@ 2018-01-11 18:58 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 31+ messages in thread
From: Dr. David Alan Gilbert @ 2018-01-11 18:58 UTC (permalink / raw)
To: Peter Maydell; +Cc: Richard Henderson, QEMU Developers, qemu-arm
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On 18 December 2017 at 17:30, Richard Henderson
> <richard.henderson@linaro.org> wrote:
> > Change vfp.regs as a uint64_t to vfp.zregs as an ARMVectorReg.
> > The previous patches have made the change in representation
> > relatively painless.
> >
> > Add vfp.pregs as an ARMPredicateReg. Let FFR be P16 to make
> > it easier to treat it as for any other predicate.
>
> > diff --git a/target/arm/machine.c b/target/arm/machine.c
> > index a85c2430d3..39f3123b10 100644
> > --- a/target/arm/machine.c
> > +++ b/target/arm/machine.c
> > @@ -50,7 +50,42 @@ static const VMStateDescription vmstate_vfp = {
> > .minimum_version_id = 3,
> > .needed = vfp_needed,
> > .fields = (VMStateField[]) {
> > - VMSTATE_UINT64_ARRAY(env.vfp.regs, ARMCPU, 64),
> > + /* For compatibility, store Qn out of Zn here. */
> > + /* TODO: store the other quadwords in another subsection,
> > + along with predicate registers and ZCR_EL[1-3]. */
>
> I think we should get the migration state stuff right at the point
> where we change the data structure in cpu.h.
>
>
> > + VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[0].d, ARMCPU, 0, 2),
> > + VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[1].d, ARMCPU, 0, 2),
> > + VMSTATE_UINT64_SUB_ARRAY(env.vfp.zregs[2].d, ARMCPU, 0, 2),
>
> David: does the migration code guarantee that we can split an
> old VMSTATE_UINT64_ARRAY into being a sequence of SUB_ARRAYs
> like this and retain the same on-the-wire format (ie and
> keep migration compat) ?
There's no on-the-wire protocol for this - it's just the array elements,
so as long as the total adds up the same and it's still uint64's
it should work.
Dave
> I suspect we've used this clever trick before elsewhere, but thought
> it better to check...
>
> thanks
> -- PMM
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE
2017-12-18 17:30 ` [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE Richard Henderson
2018-01-11 18:44 ` Peter Maydell
@ 2018-01-12 18:38 ` Peter Maydell
2018-01-12 18:50 ` Richard Henderson
1 sibling, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2018-01-12 18:38 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Change vfp.regs as a uint64_t to vfp.zregs as an ARMVectorReg.
> The previous patches have made the change in representation
> relatively painless.
>
> Add vfp.pregs as an ARMPredicateReg. Let FFR be P16 to make
> it easier to treat it as for any other predicate.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 64 ++++++++++++++++++++++++++++++++--------------
> target/arm/machine.c | 37 ++++++++++++++++++++++++++-
> target/arm/translate-a64.c | 8 +++---
> target/arm/translate.c | 12 ++++-----
> 4 files changed, 90 insertions(+), 31 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index e1a8e2880d..150b0d9d84 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -153,6 +153,47 @@ typedef struct {
> uint32_t base_mask;
> } TCR;
>
> +/* Define a maximum sized vector register.
> + * For 32-bit, this is a 128-bit NEON/AdvSIMD register.
> + * For 64-bit, this is a 2048-bit SVE register.
> + *
> + * Note that the mapping between S, D, and Q views of the register bank
> + * differs between AArch64 and AArch32.
> + * In AArch32:
> + * Qn = regs[n].d[1]:regs[n].d[0]
> + * Dn = regs[n / 2].d[n & 1]
> + * Sn = regs[n / 4].d[n % 4 / 2],
> + * bits 31..0 for even n, and bits 63..32 for odd n
> + * (and regs[16] to regs[31] are inaccessible)
> + * In AArch64:
> + * Zn = regs[n].d[*]
> + * Qn = regs[n].d[1]:regs[n].d[0]
> + * Dn = regs[n].d[0]
> + * Sn = regs[n].d[0] bits 31..0
> + *
> + * This corresponds to the architecturally defined mapping between
> + * the two execution states, and means we do not need to explicitly
> + * map these registers when changing states.
> + */
The transformations on the data structures (ie the meat of this patch
looks good).
> +
> +#ifdef TARGET_AARCH64
> +# define ARM_MAX_VQ 16
> +#else
> +# define ARM_MAX_VQ 1
> +#endif
> +
> +typedef struct ARMVectorReg {
> + uint64_t d[2 * ARM_MAX_VQ] QEMU_ALIGNED(16);
> +} ARMVectorReg;
> +
> +typedef struct ARMPredicateReg {
> +#ifdef TARGET_AARCH64
> + uint64_t p[2 * ARM_MAX_VQ / 8] QEMU_ALIGNED(16);
> +#else
> + uint64_t p[0];
> +#endif
> +} ARMPredicateReg;
I think introducing the predicate registers should go in a separate
patch.
> --- a/target/arm/translate.c
> +++ b/target/arm/translate.c
> @@ -1513,19 +1513,17 @@ static inline void gen_vfp_st(DisasContext *s, int dp, TCGv_i32 addr)
> }
> }
>
> -static inline long
> -vfp_reg_offset (int dp, int reg)
> +static inline long vfp_reg_offset(bool dp, unsigned reg)
> {
> if (dp) {
> - return offsetof(CPUARMState, vfp.regs[reg]);
> + return offsetof(CPUARMState, vfp.zregs[reg >> 1].d[reg & 1]);
> } else {
> - long ofs = offsetof(CPUARMState, vfp.regs[reg >> 1]);
> + long r = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
> if (reg & 1) {
> - ofs += offsetof(CPU_DoubleU, l.upper);
> + return r + offsetof(CPU_DoubleU, l.upper);
> } else {
> - ofs += offsetof(CPU_DoubleU, l.lower);
> + return r + offsetof(CPU_DoubleU, l.lower);
> }
> - return ofs;
...I see we're tweaking the logic on this code again. I was
expecting that the changes in the previous patch would have turned
out to be in support of just having to do a one-line change in this
one, but apparently not ?
> }
> }
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE
2018-01-12 18:38 ` Peter Maydell
@ 2018-01-12 18:50 ` Richard Henderson
0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2018-01-12 18:50 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/12/2018 10:38 AM, Peter Maydell wrote:
>> - long ofs = offsetof(CPUARMState, vfp.regs[reg >> 1]);
>> + long r = offsetof(CPUARMState, vfp.zregs[reg >> 2].d[(reg >> 1) & 1]);
>> if (reg & 1) {
>> - ofs += offsetof(CPU_DoubleU, l.upper);
>> + return r + offsetof(CPU_DoubleU, l.upper);
>> } else {
>> - ofs += offsetof(CPU_DoubleU, l.lower);
>> + return r + offsetof(CPU_DoubleU, l.lower);
>> }
>> - return ofs;
> ...I see we're tweaking the logic on this code again. I was
> expecting that the changes in the previous patch would have turned
> out to be in support of just having to do a one-line change in this
> one, but apparently not ?
>
I don't know why I changed my mind about where the return would go. It should
have just been the one line change...
r~
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 8/9] target/arm: Add ZCR.LEN to tb->flags
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
` (6 preceding siblings ...)
2017-12-18 17:30 ` [Qemu-devel] [PATCH 7/9] target/arm: Expand vector registers for SVE Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2018-01-12 18:49 ` Peter Maydell
2017-12-18 17:30 ` [Qemu-devel] [PATCH 9/9] target/arm: Add ARM_FEATURE_SVE Richard Henderson
8 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
This has a stub(-ish) implementation of ZCR, in that it does not
implement _EL[123], or wire up the system register properly.
However, it is enough for CONFIG_USER_ONLY.
We will need VQ in tb->flags in order to pass the true vector
size to the generic vector expanders.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 48 ++++++++++++++++++++++++++++++----------------
target/arm/translate.h | 1 +
target/arm/cpu.c | 2 ++
target/arm/translate-a64.c | 5 ++---
4 files changed, 37 insertions(+), 19 deletions(-)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 150b0d9d84..37b8cef2e2 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -553,6 +553,12 @@ typedef struct CPUARMState {
float_status fp_status;
float_status fp_status_f16;
float_status standard_fp_status;
+
+ /* TODO: For system mode, represent all of zcr_el{1,2,3}.
+ * In the meantime, the composite value exposed to el0.
+ * Only the low 4 bits are defined, and set via syscall.
+ */
+ uint32_t zcr_el;
} vfp;
uint64_t exclusive_addr;
uint64_t exclusive_val;
@@ -2649,6 +2655,8 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
#define ARM_TBFLAG_TBI0_MASK (0x1ull << ARM_TBFLAG_TBI0_SHIFT)
#define ARM_TBFLAG_TBI1_SHIFT 1 /* TBI1 for EL0/1 */
#define ARM_TBFLAG_TBI1_MASK (0x1ull << ARM_TBFLAG_TBI1_SHIFT)
+#define ARM_TBFLAG_ZCR_LEN_SHIFT 2 /* Composite ZCR_EL1/2/3.LEN */
+#define ARM_TBFLAG_ZCR_LEN_MASK (0xfull << ARM_TBFLAG_ZCR_LEN_SHIFT)
/* some convenience accessor macros */
#define ARM_TBFLAG_AARCH64_STATE(F) \
@@ -2685,6 +2693,8 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
(((F) & ARM_TBFLAG_TBI0_MASK) >> ARM_TBFLAG_TBI0_SHIFT)
#define ARM_TBFLAG_TBI1(F) \
(((F) & ARM_TBFLAG_TBI1_MASK) >> ARM_TBFLAG_TBI1_SHIFT)
+#define ARM_TBFLAG_ZCR_LEN(F) \
+ (((F) & ARM_TBFLAG_ZCR_LEN_MASK) >> ARM_TBFLAG_ZCR_LEN_SHIFT)
static inline bool bswap_code(bool sctlr_b)
{
@@ -2818,34 +2828,39 @@ static inline uint32_t arm_regime_tbi1(CPUARMState *env, ARMMMUIdx mmu_idx)
#endif
static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
- target_ulong *cs_base, uint32_t *flags)
+ target_ulong *cs_base, uint32_t *pflags)
{
ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
+ uint32_t flags;
+
if (is_a64(env)) {
*pc = env->pc;
- *flags = ARM_TBFLAG_AARCH64_STATE_MASK;
+ flags = ARM_TBFLAG_AARCH64_STATE_MASK;
/* Get control bits for tagged addresses */
- *flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
- *flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
+ flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
+ flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
+ /* Get SVE vector length */
+ flags |= ((env->vfp.zcr_el << ARM_TBFLAG_ZCR_LEN_SHIFT)
+ & ARM_TBFLAG_ZCR_LEN_MASK);
} else {
*pc = env->regs[15];
- *flags = (env->thumb << ARM_TBFLAG_THUMB_SHIFT)
+ flags = (env->thumb << ARM_TBFLAG_THUMB_SHIFT)
| (env->vfp.vec_len << ARM_TBFLAG_VECLEN_SHIFT)
| (env->vfp.vec_stride << ARM_TBFLAG_VECSTRIDE_SHIFT)
| (env->condexec_bits << ARM_TBFLAG_CONDEXEC_SHIFT)
| (arm_sctlr_b(env) << ARM_TBFLAG_SCTLR_B_SHIFT);
if (!(access_secure_reg(env))) {
- *flags |= ARM_TBFLAG_NS_MASK;
+ flags |= ARM_TBFLAG_NS_MASK;
}
if (env->vfp.xregs[ARM_VFP_FPEXC] & (1 << 30)
|| arm_el_is_aa64(env, 1)) {
- *flags |= ARM_TBFLAG_VFPEN_MASK;
+ flags |= ARM_TBFLAG_VFPEN_MASK;
}
- *flags |= (extract32(env->cp15.c15_cpar, 0, 2)
- << ARM_TBFLAG_XSCALE_CPAR_SHIFT);
+ flags |= (extract32(env->cp15.c15_cpar, 0, 2)
+ << ARM_TBFLAG_XSCALE_CPAR_SHIFT);
}
- *flags |= (arm_to_core_mmu_idx(mmu_idx) << ARM_TBFLAG_MMUIDX_SHIFT);
+ flags |= (arm_to_core_mmu_idx(mmu_idx) << ARM_TBFLAG_MMUIDX_SHIFT);
/* The SS_ACTIVE and PSTATE_SS bits correspond to the state machine
* states defined in the ARM ARM for software singlestep:
@@ -2855,26 +2870,27 @@ static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
* 1 1 Active-not-pending
*/
if (arm_singlestep_active(env)) {
- *flags |= ARM_TBFLAG_SS_ACTIVE_MASK;
+ flags |= ARM_TBFLAG_SS_ACTIVE_MASK;
if (is_a64(env)) {
if (env->pstate & PSTATE_SS) {
- *flags |= ARM_TBFLAG_PSTATE_SS_MASK;
+ flags |= ARM_TBFLAG_PSTATE_SS_MASK;
}
} else {
if (env->uncached_cpsr & PSTATE_SS) {
- *flags |= ARM_TBFLAG_PSTATE_SS_MASK;
+ flags |= ARM_TBFLAG_PSTATE_SS_MASK;
}
}
}
if (arm_cpu_data_is_big_endian(env)) {
- *flags |= ARM_TBFLAG_BE_DATA_MASK;
+ flags |= ARM_TBFLAG_BE_DATA_MASK;
}
- *flags |= fp_exception_el(env) << ARM_TBFLAG_FPEXC_EL_SHIFT;
+ flags |= fp_exception_el(env) << ARM_TBFLAG_FPEXC_EL_SHIFT;
if (arm_v7m_is_handler_mode(env)) {
- *flags |= ARM_TBFLAG_HANDLER_MASK;
+ flags |= ARM_TBFLAG_HANDLER_MASK;
}
+ *pflags = flags;
*cs_base = 0;
}
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 3f4df91e5e..2b0a2d7aa6 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -34,6 +34,7 @@ typedef struct DisasContext {
bool vfp_enabled; /* FP enabled via FPSCR.EN */
int vec_len;
int vec_stride;
+ int sve_len; /* SVE vector length in bytes */
bool v7m_handler_mode;
bool v8m_secure; /* true if v8M and we're in Secure mode */
/* Immediate value in AArch32 SVC insn; must be set if is_jmp == DISAS_SWI
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 6cd8ae1459..10b68ea16f 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -149,6 +149,8 @@ static void arm_cpu_reset(CPUState *s)
env->cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
/* and to the FP/Neon instructions */
env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
+ /* Set the SVE vector length to maximum. */
+ env->vfp.zcr_el = ARM_MAX_VQ - 1;
#else
/* Reset into the highest available EL */
if (arm_feature(env, ARM_FEATURE_EL3)) {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 487408a7a7..ecb72e4d9c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -549,9 +549,7 @@ static inline int vec_full_reg_offset(DisasContext *s, int regno)
/* Return the byte size of the "whole" vector register, VL / 8. */
static inline int vec_full_reg_size(DisasContext *s)
{
- /* FIXME SVE: We should put the composite ZCR_EL* value into tb->flags.
- In the meantime this is just the AdvSIMD length of 128. */
- return 128 / 8;
+ return s->sve_len;
}
/* Return a newly allocated pointer to the vector register. */
@@ -12771,6 +12769,7 @@ static int aarch64_tr_init_disas_context(DisasContextBase *dcbase,
dc->fp_excp_el = ARM_TBFLAG_FPEXC_EL(dc->base.tb->flags);
dc->vec_len = 0;
dc->vec_stride = 0;
+ dc->sve_len = (ARM_TBFLAG_ZCR_LEN(dc->base.tb->flags) + 1) * 16;
dc->cp_regs = arm_cpu->cp_regs;
dc->features = env->features;
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 8/9] target/arm: Add ZCR.LEN to tb->flags
2017-12-18 17:30 ` [Qemu-devel] [PATCH 8/9] target/arm: Add ZCR.LEN to tb->flags Richard Henderson
@ 2018-01-12 18:49 ` Peter Maydell
2018-01-15 10:00 ` Peter Maydell
0 siblings, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2018-01-12 18:49 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> This has a stub(-ish) implementation of ZCR, in that it does not
> implement _EL[123], or wire up the system register properly.
> However, it is enough for CONFIG_USER_ONLY.
>
> We will need VQ in tb->flags in order to pass the true vector
> size to the generic vector expanders.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 48 ++++++++++++++++++++++++++++++----------------
> target/arm/translate.h | 1 +
> target/arm/cpu.c | 2 ++
> target/arm/translate-a64.c | 5 ++---
> 4 files changed, 37 insertions(+), 19 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 150b0d9d84..37b8cef2e2 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -553,6 +553,12 @@ typedef struct CPUARMState {
> float_status fp_status;
> float_status fp_status_f16;
> float_status standard_fp_status;
> +
> + /* TODO: For system mode, represent all of zcr_el{1,2,3}.
> + * In the meantime, the composite value exposed to el0.
> + * Only the low 4 bits are defined, and set via syscall.
> + */
> + uint32_t zcr_el;
For purposes of not having to mess with migration state too frequently,
I think this patch should define zcr_el[4] (and add that to the
migration subsection for 'has SVE'). We don't need to actually
implement the behaviour of the upper ELs yet.
> } vfp;
> uint64_t exclusive_addr;
> uint64_t exclusive_val;
> @@ -2649,6 +2655,8 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
> #define ARM_TBFLAG_TBI0_MASK (0x1ull << ARM_TBFLAG_TBI0_SHIFT)
> #define ARM_TBFLAG_TBI1_SHIFT 1 /* TBI1 for EL0/1 */
> #define ARM_TBFLAG_TBI1_MASK (0x1ull << ARM_TBFLAG_TBI1_SHIFT)
> +#define ARM_TBFLAG_ZCR_LEN_SHIFT 2 /* Composite ZCR_EL1/2/3.LEN */
> +#define ARM_TBFLAG_ZCR_LEN_MASK (0xfull << ARM_TBFLAG_ZCR_LEN_SHIFT)
>
> /* some convenience accessor macros */
> #define ARM_TBFLAG_AARCH64_STATE(F) \
> @@ -2685,6 +2693,8 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState *env)
> (((F) & ARM_TBFLAG_TBI0_MASK) >> ARM_TBFLAG_TBI0_SHIFT)
> #define ARM_TBFLAG_TBI1(F) \
> (((F) & ARM_TBFLAG_TBI1_MASK) >> ARM_TBFLAG_TBI1_SHIFT)
> +#define ARM_TBFLAG_ZCR_LEN(F) \
> + (((F) & ARM_TBFLAG_ZCR_LEN_MASK) >> ARM_TBFLAG_ZCR_LEN_SHIFT)
>
> static inline bool bswap_code(bool sctlr_b)
> {
> @@ -2818,34 +2828,39 @@ static inline uint32_t arm_regime_tbi1(CPUARMState *env, ARMMMUIdx mmu_idx)
> #endif
>
> static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
> - target_ulong *cs_base, uint32_t *flags)
> + target_ulong *cs_base, uint32_t *pflags)
> {
> ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
> + uint32_t flags;
> +
> if (is_a64(env)) {
> *pc = env->pc;
> - *flags = ARM_TBFLAG_AARCH64_STATE_MASK;
> + flags = ARM_TBFLAG_AARCH64_STATE_MASK;
Can you put this kind of refactoring in its own patch, please?
> /* Get control bits for tagged addresses */
> - *flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
> - *flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
> + flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
> + flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
> + /* Get SVE vector length */
> + flags |= ((env->vfp.zcr_el << ARM_TBFLAG_ZCR_LEN_SHIFT)
> + & ARM_TBFLAG_ZCR_LEN_MASK);
Otherwise looks OK.
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 8/9] target/arm: Add ZCR.LEN to tb->flags
2018-01-12 18:49 ` Peter Maydell
@ 2018-01-15 10:00 ` Peter Maydell
0 siblings, 0 replies; 31+ messages in thread
From: Peter Maydell @ 2018-01-15 10:00 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 12 January 2018 at 18:49, Peter Maydell <peter.maydell@linaro.org> wrote:
> For purposes of not having to mess with migration state too frequently,
> I think this patch should define zcr_el[4] (and add that to the
> migration subsection for 'has SVE').
I realized over the weekend that this suggestion isn't right --
ZCR_ELx are sysregs accessible via the usual msr/mrs, so they'll
be migrated via the sysreg migration mechanism, not with
explicit fields in the migration structs. So I think we're
OK to leave this as TODO for the moment.
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* [Qemu-devel] [PATCH 9/9] target/arm: Add ARM_FEATURE_SVE
2017-12-18 17:30 [Qemu-devel] [PATCH 0/9] target/arm: Prepatory work for SVE Richard Henderson
` (7 preceding siblings ...)
2017-12-18 17:30 ` [Qemu-devel] [PATCH 8/9] target/arm: Add ZCR.LEN to tb->flags Richard Henderson
@ 2017-12-18 17:30 ` Richard Henderson
2018-01-11 18:42 ` Peter Maydell
8 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2017-12-18 17:30 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, qemu-arm
Enable it for the "any" CPU used by aarch64-linux-user.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/cpu.h | 1 +
target/arm/cpu64.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 37b8cef2e2..652e00d957 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1386,6 +1386,7 @@ enum arm_features {
ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions. */
+ ARM_FEATURE_SVE, /* has SVE extension */
};
static inline int arm_feature(CPUARMState *env, int feature)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 43b42f95fd..366ab2eeee 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -229,6 +229,7 @@ static void aarch64_any_initfn(Object *obj)
set_feature(&cpu->env, ARM_FEATURE_V8_1_SIMD);
set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
+ set_feature(&cpu->env, ARM_FEATURE_SVE);
cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
cpu->dcz_blocksize = 7; /* 512 bytes */
}
--
2.14.3
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 9/9] target/arm: Add ARM_FEATURE_SVE
2017-12-18 17:30 ` [Qemu-devel] [PATCH 9/9] target/arm: Add ARM_FEATURE_SVE Richard Henderson
@ 2018-01-11 18:42 ` Peter Maydell
2018-01-11 19:32 ` Richard Henderson
0 siblings, 1 reply; 31+ messages in thread
From: Peter Maydell @ 2018-01-11 18:42 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, qemu-arm
On 18 December 2017 at 17:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
> Enable it for the "any" CPU used by aarch64-linux-user.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/arm/cpu.h | 1 +
> target/arm/cpu64.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index 37b8cef2e2..652e00d957 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -1386,6 +1386,7 @@ enum arm_features {
> ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
> ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
> ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions. */
> + ARM_FEATURE_SVE, /* has SVE extension */
> };
>
> static inline int arm_feature(CPUARMState *env, int feature)
> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
> index 43b42f95fd..366ab2eeee 100644
> --- a/target/arm/cpu64.c
> +++ b/target/arm/cpu64.c
> @@ -229,6 +229,7 @@ static void aarch64_any_initfn(Object *obj)
> set_feature(&cpu->env, ARM_FEATURE_V8_1_SIMD);
> set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
> set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
> + set_feature(&cpu->env, ARM_FEATURE_SVE);
> cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
> cpu->dcz_blocksize = 7; /* 512 bytes */
> }
We shouldn't turn this on until it actually works, I think.
thanks
-- PMM
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [Qemu-devel] [PATCH 9/9] target/arm: Add ARM_FEATURE_SVE
2018-01-11 18:42 ` Peter Maydell
@ 2018-01-11 19:32 ` Richard Henderson
0 siblings, 0 replies; 31+ messages in thread
From: Richard Henderson @ 2018-01-11 19:32 UTC (permalink / raw)
To: Peter Maydell; +Cc: QEMU Developers, qemu-arm
On 01/11/2018 10:42 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:30, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>> Enable it for the "any" CPU used by aarch64-linux-user.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/arm/cpu.h | 1 +
>> target/arm/cpu64.c | 1 +
>> 2 files changed, 2 insertions(+)
>>
>> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
>> index 37b8cef2e2..652e00d957 100644
>> --- a/target/arm/cpu.h
>> +++ b/target/arm/cpu.h
>> @@ -1386,6 +1386,7 @@ enum arm_features {
>> ARM_FEATURE_V8_1_SIMD, /* has ARMv8.1-SIMD */
>> ARM_FEATURE_V8_FP16, /* implements v8.2 half-precision float */
>> ARM_FEATURE_V8_FCMA, /* has complex number part of v8.3 extensions. */
>> + ARM_FEATURE_SVE, /* has SVE extension */
>> };
>>
>> static inline int arm_feature(CPUARMState *env, int feature)
>> diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
>> index 43b42f95fd..366ab2eeee 100644
>> --- a/target/arm/cpu64.c
>> +++ b/target/arm/cpu64.c
>> @@ -229,6 +229,7 @@ static void aarch64_any_initfn(Object *obj)
>> set_feature(&cpu->env, ARM_FEATURE_V8_1_SIMD);
>> set_feature(&cpu->env, ARM_FEATURE_V8_FP16);
>> set_feature(&cpu->env, ARM_FEATURE_V8_FCMA);
>> + set_feature(&cpu->env, ARM_FEATURE_SVE);
>> cpu->ctr = 0x80038003; /* 32 byte I and D cacheline size, VIPT icache */
>> cpu->dcz_blocksize = 7; /* 512 bytes */
>> }
>
> We shouldn't turn this on until it actually works, I think.
Yes, probably. One of those things where you need it on early for development
but for final commit you need to sort the fragment to the end.
r~
^ permalink raw reply [flat|nested] 31+ messages in thread