* [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes
@ 2018-07-05 19:19 Richard Henderson
2018-07-05 19:19 ` [Qemu-devel] [PATCH 1/2] tcg: Restrict check_size_impl to multiples of the line size Richard Henderson
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Richard Henderson @ 2018-07-05 19:19 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
RISU testing with ARM Fast Models and VQ=16 showed up two bugs,
both affecting the PTRUE instruction.
r~
Richard Henderson (2):
tcg: Restrict check_size_impl to multiples of the line size
target/arm: Fix do_predset for large VL
target/arm/translate-sve.c | 10 ++--------
tcg/tcg-op-gvec.c | 7 +++++--
2 files changed, 7 insertions(+), 10 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 1/2] tcg: Restrict check_size_impl to multiples of the line size
2018-07-05 19:19 [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes Richard Henderson
@ 2018-07-05 19:19 ` Richard Henderson
2018-07-06 8:08 ` Alex Bennée
2018-07-05 19:19 ` [Qemu-devel] [PATCH 2/2] target/arm: Fix do_predset for large VL Richard Henderson
2018-07-06 10:44 ` [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes Peter Maydell
2 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2018-07-05 19:19 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
Normally this is automatic in the size restrictions that are placed
on vector sizes coming from the implementation. However, for the
legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
24 bytes of the vector register. Without this check, do_dup selects
TCG_TYPE_V128 and clears only 16 bytes.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/tcg-op-gvec.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
index 22db1590d5..61c25f5784 100644
--- a/tcg/tcg-op-gvec.c
+++ b/tcg/tcg-op-gvec.c
@@ -287,8 +287,11 @@ void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
in units of LNSZ. This limits the expansion of inline code. */
static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz)
{
- uint32_t lnct = oprsz / lnsz;
- return lnct >= 1 && lnct <= MAX_UNROLL;
+ if (oprsz % lnsz == 0) {
+ uint32_t lnct = oprsz / lnsz;
+ return lnct >= 1 && lnct <= MAX_UNROLL;
+ }
+ return false;
}
static void expand_clr(uint32_t dofs, uint32_t maxsz);
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 2/2] target/arm: Fix do_predset for large VL
2018-07-05 19:19 [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes Richard Henderson
2018-07-05 19:19 ` [Qemu-devel] [PATCH 1/2] tcg: Restrict check_size_impl to multiples of the line size Richard Henderson
@ 2018-07-05 19:19 ` Richard Henderson
2018-07-06 8:14 ` Alex Bennée
2018-07-06 10:44 ` [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes Peter Maydell
2 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2018-07-05 19:19 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, alex.bennee
Use MAKE_64BIT_MASK instead of open-coding. Remove an odd
vector size check that is unlikely to be more profitable
than 3 64-bit integer stores. Correct the iteration for WORD
to avoid writing too much data.
Fixes RISU tests of PTRUE for VL 256.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/arm/translate-sve.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
index d41f1155f9..374051cd20 100644
--- a/target/arm/translate-sve.c
+++ b/target/arm/translate-sve.c
@@ -1438,7 +1438,7 @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag)
setsz = numelem << esz;
lastword = word = pred_esz_masks[esz];
if (setsz % 64) {
- lastword &= ~(-1ull << (setsz % 64));
+ lastword &= MAKE_64BIT_MASK(0, setsz % 64);
}
}
@@ -1457,19 +1457,13 @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag)
tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
goto done;
}
- if (oprsz * 8 == setsz + 8) {
- tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
- tcg_gen_movi_i64(t, 0);
- tcg_gen_st_i64(t, cpu_env, ofs + oprsz - 8);
- goto done;
- }
}
setsz /= 8;
fullsz /= 8;
tcg_gen_movi_i64(t, word);
- for (i = 0; i < setsz; i += 8) {
+ for (i = 0; i < QEMU_ALIGN_DOWN(setsz, 8); i += 8) {
tcg_gen_st_i64(t, cpu_env, ofs + i);
}
if (lastword != word) {
--
2.17.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] tcg: Restrict check_size_impl to multiples of the line size
2018-07-05 19:19 ` [Qemu-devel] [PATCH 1/2] tcg: Restrict check_size_impl to multiples of the line size Richard Henderson
@ 2018-07-06 8:08 ` Alex Bennée
0 siblings, 0 replies; 6+ messages in thread
From: Alex Bennée @ 2018-07-06 8:08 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, peter.maydell
Richard Henderson <richard.henderson@linaro.org> writes:
> Normally this is automatic in the size restrictions that are placed
> on vector sizes coming from the implementation. However, for the
> legitimate size tuple [oprsz=8, maxsz=32], we need to clear the final
> 24 bytes of the vector register. Without this check, do_dup selects
> TCG_TYPE_V128 and clears only 16 bytes.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> tcg/tcg-op-gvec.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/tcg/tcg-op-gvec.c b/tcg/tcg-op-gvec.c
> index 22db1590d5..61c25f5784 100644
> --- a/tcg/tcg-op-gvec.c
> +++ b/tcg/tcg-op-gvec.c
> @@ -287,8 +287,11 @@ void tcg_gen_gvec_4_ptr(uint32_t dofs, uint32_t aofs, uint32_t bofs,
> in units of LNSZ. This limits the expansion of inline code. */
> static inline bool check_size_impl(uint32_t oprsz, uint32_t lnsz)
> {
> - uint32_t lnct = oprsz / lnsz;
> - return lnct >= 1 && lnct <= MAX_UNROLL;
> + if (oprsz % lnsz == 0) {
> + uint32_t lnct = oprsz / lnsz;
> + return lnct >= 1 && lnct <= MAX_UNROLL;
> + }
> + return false;
> }
>
> static void expand_clr(uint32_t dofs, uint32_t maxsz);
--
Alex Bennée
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] target/arm: Fix do_predset for large VL
2018-07-05 19:19 ` [Qemu-devel] [PATCH 2/2] target/arm: Fix do_predset for large VL Richard Henderson
@ 2018-07-06 8:14 ` Alex Bennée
0 siblings, 0 replies; 6+ messages in thread
From: Alex Bennée @ 2018-07-06 8:14 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, peter.maydell
Richard Henderson <richard.henderson@linaro.org> writes:
> Use MAKE_64BIT_MASK instead of open-coding. Remove an odd
> vector size check that is unlikely to be more profitable
> than 3 64-bit integer stores. Correct the iteration for WORD
> to avoid writing too much data.
>
> Fixes RISU tests of PTRUE for VL 256.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> target/arm/translate-sve.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/target/arm/translate-sve.c b/target/arm/translate-sve.c
> index d41f1155f9..374051cd20 100644
> --- a/target/arm/translate-sve.c
> +++ b/target/arm/translate-sve.c
> @@ -1438,7 +1438,7 @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag)
> setsz = numelem << esz;
> lastword = word = pred_esz_masks[esz];
> if (setsz % 64) {
> - lastword &= ~(-1ull << (setsz % 64));
> + lastword &= MAKE_64BIT_MASK(0, setsz % 64);
> }
> }
>
> @@ -1457,19 +1457,13 @@ static bool do_predset(DisasContext *s, int esz, int rd, int pat, bool setflag)
> tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
> goto done;
> }
> - if (oprsz * 8 == setsz + 8) {
> - tcg_gen_gvec_dup64i(ofs, oprsz, maxsz, word);
> - tcg_gen_movi_i64(t, 0);
> - tcg_gen_st_i64(t, cpu_env, ofs + oprsz - 8);
> - goto done;
> - }
> }
>
> setsz /= 8;
> fullsz /= 8;
>
> tcg_gen_movi_i64(t, word);
> - for (i = 0; i < setsz; i += 8) {
> + for (i = 0; i < QEMU_ALIGN_DOWN(setsz, 8); i += 8) {
> tcg_gen_st_i64(t, cpu_env, ofs + i);
> }
> if (lastword != word) {
--
Alex Bennée
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes
2018-07-05 19:19 [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes Richard Henderson
2018-07-05 19:19 ` [Qemu-devel] [PATCH 1/2] tcg: Restrict check_size_impl to multiples of the line size Richard Henderson
2018-07-05 19:19 ` [Qemu-devel] [PATCH 2/2] target/arm: Fix do_predset for large VL Richard Henderson
@ 2018-07-06 10:44 ` Peter Maydell
2 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2018-07-06 10:44 UTC (permalink / raw)
To: Richard Henderson; +Cc: QEMU Developers, Alex Bennée
On 5 July 2018 at 20:19, Richard Henderson <richard.henderson@linaro.org> wrote:
> RISU testing with ARM Fast Models and VQ=16 showed up two bugs,
> both affecting the PTRUE instruction.
>
>
> r~
>
>
> Richard Henderson (2):
> tcg: Restrict check_size_impl to multiples of the line size
> target/arm: Fix do_predset for large VL
>
> target/arm/translate-sve.c | 10 ++--------
> tcg/tcg-op-gvec.c | 7 +++++--
> 2 files changed, 7 insertions(+), 10 deletions(-)
Applied to target-arm.next, thanks.
-- PMM
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-07-06 10:44 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-05 19:19 [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes Richard Henderson
2018-07-05 19:19 ` [Qemu-devel] [PATCH 1/2] tcg: Restrict check_size_impl to multiples of the line size Richard Henderson
2018-07-06 8:08 ` Alex Bennée
2018-07-05 19:19 ` [Qemu-devel] [PATCH 2/2] target/arm: Fix do_predset for large VL Richard Henderson
2018-07-06 8:14 ` Alex Bennée
2018-07-06 10:44 ` [Qemu-devel] [PATCH 0/2] target/arm: SVE fixes Peter Maydell
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.