From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-devel@nongnu.org
Subject: [PULL 08/24] target/arm: Fix bugs in MVE VRMLALDAVH, VRMLSLDAVH
Date: Fri, 2 Jul 2021 13:59:38 +0100 [thread overview]
Message-ID: <20210702125954.13247-9-peter.maydell@linaro.org> (raw)
In-Reply-To: <20210702125954.13247-1-peter.maydell@linaro.org>
The initial implementation of the MVE VRMLALDAVH and VRMLSLDAVH
insns had some bugs:
* the 32x32 multiply of elements was being done as 32x32->32,
not 32x32->64
* we were incorrectly maintaining the accumulator in its full
72-bit form across all 4 beats of the insn; in the pseudocode
it is squashed back into the 64 bits of the RdaHi:RdaLo
registers after each beat
In particular, fixing the second of these allows us to recast
the implementation to avoid 128-bit arithmetic entirely.
Since the element size here is always 4, we can also drop the
parameterization of ESIZE to make the code a little more readable.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20210628135835.6690-3-peter.maydell@linaro.org
---
target/arm/mve_helper.c | 38 +++++++++++++++++++++-----------------
1 file changed, 21 insertions(+), 17 deletions(-)
diff --git a/target/arm/mve_helper.c b/target/arm/mve_helper.c
index 05552ce7eee..85a552fe070 100644
--- a/target/arm/mve_helper.c
+++ b/target/arm/mve_helper.c
@@ -18,7 +18,6 @@
*/
#include "qemu/osdep.h"
-#include "qemu/int128.h"
#include "cpu.h"
#include "internals.h"
#include "vec_internal.h"
@@ -1100,40 +1099,45 @@ DO_LDAV(vmlsldavsw, 4, int32_t, false, +=, -=)
DO_LDAV(vmlsldavxsw, 4, int32_t, true, +=, -=)
/*
- * Rounding multiply add long dual accumulate high: we must keep
- * a 72-bit internal accumulator value and return the top 64 bits.
+ * Rounding multiply add long dual accumulate high. In the pseudocode
+ * this is implemented with a 72-bit internal accumulator value of which
+ * the top 64 bits are returned. We optimize this to avoid having to
+ * use 128-bit arithmetic -- we can do this because the 74-bit accumulator
+ * is squashed back into 64-bits after each beat.
*/
-#define DO_LDAVH(OP, ESIZE, TYPE, XCHG, EVENACC, ODDACC, TO128) \
+#define DO_LDAVH(OP, TYPE, LTYPE, XCHG, SUB) \
uint64_t HELPER(glue(mve_, OP))(CPUARMState *env, void *vn, \
void *vm, uint64_t a) \
{ \
uint16_t mask = mve_element_mask(env); \
unsigned e; \
TYPE *n = vn, *m = vm; \
- Int128 acc = int128_lshift(TO128(a), 8); \
- for (e = 0; e < 16 / ESIZE; e++, mask >>= ESIZE) { \
+ for (e = 0; e < 16 / 4; e++, mask >>= 4) { \
if (mask & 1) { \
+ LTYPE mul; \
if (e & 1) { \
- acc = ODDACC(acc, TO128(n[H##ESIZE(e - 1 * XCHG)] * \
- m[H##ESIZE(e)])); \
+ mul = (LTYPE)n[H4(e - 1 * XCHG)] * m[H4(e)]; \
+ if (SUB) { \
+ mul = -mul; \
+ } \
} else { \
- acc = EVENACC(acc, TO128(n[H##ESIZE(e + 1 * XCHG)] * \
- m[H##ESIZE(e)])); \
+ mul = (LTYPE)n[H4(e + 1 * XCHG)] * m[H4(e)]; \
} \
- acc = int128_add(acc, int128_make64(1 << 7)); \
+ mul = (mul >> 8) + ((mul >> 7) & 1); \
+ a += mul; \
} \
} \
mve_advance_vpt(env); \
- return int128_getlo(int128_rshift(acc, 8)); \
+ return a; \
}
-DO_LDAVH(vrmlaldavhsw, 4, int32_t, false, int128_add, int128_add, int128_makes64)
-DO_LDAVH(vrmlaldavhxsw, 4, int32_t, true, int128_add, int128_add, int128_makes64)
+DO_LDAVH(vrmlaldavhsw, int32_t, int64_t, false, false)
+DO_LDAVH(vrmlaldavhxsw, int32_t, int64_t, true, false)
-DO_LDAVH(vrmlaldavhuw, 4, uint32_t, false, int128_add, int128_add, int128_make64)
+DO_LDAVH(vrmlaldavhuw, uint32_t, uint64_t, false, false)
-DO_LDAVH(vrmlsldavhsw, 4, int32_t, false, int128_add, int128_sub, int128_makes64)
-DO_LDAVH(vrmlsldavhxsw, 4, int32_t, true, int128_add, int128_sub, int128_makes64)
+DO_LDAVH(vrmlsldavhsw, int32_t, int64_t, false, true)
+DO_LDAVH(vrmlsldavhxsw, int32_t, int64_t, true, true)
/* Vector add across vector */
#define DO_VADDV(OP, ESIZE, TYPE) \
--
2.20.1
next prev parent reply other threads:[~2021-07-02 13:07 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-02 12:59 [PULL 00/24] target-arm queue Peter Maydell
2021-07-02 12:59 ` [PULL 01/24] docs/system/arm: Add quanta-q7l1-bmc reference Peter Maydell
2021-07-02 12:59 ` [PULL 02/24] docs/system/arm: Add quanta-gbs-bmc reference Peter Maydell
2021-07-02 12:59 ` [PULL 03/24] hw/arm: Add basic power management to raspi Peter Maydell
2021-07-02 12:59 ` [PULL 04/24] tests: Boot and halt a Linux guest on the Raspberry Pi 2 machine Peter Maydell
2021-07-02 12:59 ` [PULL 05/24] target/arm: Check NaN mode before silencing NaN Peter Maydell
2021-07-02 12:59 ` [PULL 06/24] hw/gpio/gpio_pwr: use shutdown function for reboot Peter Maydell
2021-07-02 12:59 ` [PULL 07/24] target/arm: Fix MVE widening/narrowing VLDR/VSTR offset calculation Peter Maydell
2021-07-02 12:59 ` Peter Maydell [this message]
2021-07-02 12:59 ` [PULL 09/24] target/arm: Make asimd_imm_const() public Peter Maydell
2021-07-02 12:59 ` [PULL 10/24] target/arm: Use asimd_imm_const for A64 decode Peter Maydell
2021-07-02 12:59 ` [PULL 11/24] target/arm: Use dup_const() instead of bitfield_replicate() Peter Maydell
2021-07-02 12:59 ` [PULL 12/24] target/arm: Implement MVE logical immediate insns Peter Maydell
2021-07-02 12:59 ` [PULL 13/24] target/arm: Implement MVE vector shift left by " Peter Maydell
2021-07-02 12:59 ` [PULL 14/24] target/arm: Implement MVE vector shift right " Peter Maydell
2021-07-02 12:59 ` [PULL 15/24] target/arm: Implement MVE VSHLL Peter Maydell
2021-07-02 12:59 ` [PULL 16/24] target/arm: Implement MVE VSRI, VSLI Peter Maydell
2021-07-02 12:59 ` [PULL 17/24] target/arm: Implement MVE VSHRN, VRSHRN Peter Maydell
2021-07-02 12:59 ` [PULL 18/24] target/arm: Implement MVE saturating narrowing shifts Peter Maydell
2021-07-02 12:59 ` [PULL 19/24] target/arm: Implement MVE VSHLC Peter Maydell
2021-07-02 12:59 ` [PULL 20/24] target/arm: Implement MVE VADDLV Peter Maydell
2021-07-02 12:59 ` [PULL 21/24] target/arm: Implement MVE long shifts by immediate Peter Maydell
2021-07-02 12:59 ` [PULL 22/24] target/arm: Implement MVE long shifts by register Peter Maydell
2021-07-02 12:59 ` [PULL 23/24] target/arm: Implement MVE shifts by immediate Peter Maydell
2021-07-02 12:59 ` [PULL 24/24] target/arm: Implement MVE shifts by register Peter Maydell
2021-07-04 13:03 ` [PULL 00/24] target-arm queue Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210702125954.13247-9-peter.maydell@linaro.org \
--to=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).