[Qemu-devel] [PULL 48/48] target/arm: Fix short-vector increment behaviour

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Peter Maydell <peter.maydell@linaro.org>
To: qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 48/48] target/arm: Fix short-vector increment behaviour
Date: Thu, 13 Jun 2019 13:14:33 +0100	[thread overview]
Message-ID: <20190613121433.5246-49-peter.maydell@linaro.org> (raw)
In-Reply-To: <20190613121433.5246-1-peter.maydell@linaro.org>

For VFP short vectors, the VFP registers are divided into a
series of banks: for single-precision these are s0-s7, s8-s15,
s16-s23 and s24-s31; for double-precision they are d0-d3,
d4-d7, ... d28-d31. Some banks are "scalar" meaning that
use of a register within them triggers a pure-scalar or
mixed vector-scalar operation rather than a full vector
operation. The scalar banks are s0-s7, d0-d3 and d16-d19.
When using a bank as part of a vector operation, we
iterate through it, increasing the register number by
the specified stride each time, and wrapping around to
the beginning of the bank.

Unfortunately our calculation of the "increment" part of this
was incorrect:
 vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask)
will only do the intended thing if bank_mask has exactly
one set high bit. For instance for doubles (bank_mask = 0xc),
if we start with vd = 6 and delta_d = 2 then vd is updated
to 12 rather than the intended 4.

This only causes problems in the unlikely case that the
starting register is not the first in its bank: if the
register number doesn't have to wrap around then the
expression happens to give the right answer.

Fix this bug by abstracting out the "check whether register
is in a scalar bank" and "advance register within bank"
operations to utility functions which use the right
bit masking operations.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/translate-vfp.inc.c | 100 ++++++++++++++++++++-------------
 1 file changed, 60 insertions(+), 40 deletions(-)

diff --git a/target/arm/translate-vfp.inc.c b/target/arm/translate-vfp.inc.c
index 8216dba796e..709fc65374d 100644
--- a/target/arm/translate-vfp.inc.c
+++ b/target/arm/translate-vfp.inc.c
@@ -1139,6 +1139,42 @@ typedef void VFPGen3OpDPFn(TCGv_i64 vd,
 typedef void VFPGen2OpSPFn(TCGv_i32 vd, TCGv_i32 vm);
 typedef void VFPGen2OpDPFn(TCGv_i64 vd, TCGv_i64 vm);
 
+/*
+ * Return true if the specified S reg is in a scalar bank
+ * (ie if it is s0..s7)
+ */
+static inline bool vfp_sreg_is_scalar(int reg)
+{
+    return (reg & 0x18) == 0;
+}
+
+/*
+ * Return true if the specified D reg is in a scalar bank
+ * (ie if it is d0..d3 or d16..d19)
+ */
+static inline bool vfp_dreg_is_scalar(int reg)
+{
+    return (reg & 0xc) == 0;
+}
+
+/*
+ * Advance the S reg number forwards by delta within its bank
+ * (ie increment the low 3 bits but leave the rest the same)
+ */
+static inline int vfp_advance_sreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x7) | (reg & ~0x7);
+}
+
+/*
+ * Advance the D reg number forwards by delta within its bank
+ * (ie increment the low 2 bits but leave the rest the same)
+ */
+static inline int vfp_advance_dreg(int reg, int delta)
+{
+    return ((reg + delta) & 0x3) | (reg & ~0x3);
+}
+
 /*
  * Perform a 3-operand VFP data processing instruction. fn is the
  * callback to do the actual operation; this function deals with the
@@ -1149,7 +1185,6 @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, f1, fd;
     TCGv_ptr fpst;
@@ -1164,16 +1199,14 @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -1204,11 +1237,11 @@ static bool do_vfp_3op_sp(DisasContext *s, VFPGen3OpSPFn *fn,
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vn = vfp_advance_sreg(vn, delta_d);
         neon_load_reg32(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_sreg(vm, delta_m);
             neon_load_reg32(f1, vm);
         }
     }
@@ -1226,7 +1259,6 @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, f1, fd;
     TCGv_ptr fpst;
@@ -1246,16 +1278,14 @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -1285,11 +1315,11 @@ static bool do_vfp_3op_dp(DisasContext *s, VFPGen3OpDPFn *fn,
         }
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vn = ((vn + delta_d) & (bank_mask - 1)) | (vn & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vn = vfp_advance_dreg(vn, delta_d);
         neon_load_reg64(f0, vn);
         if (delta_m) {
-            vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+            vm = vfp_advance_dreg(vm, delta_m);
             neon_load_reg64(f1, vm);
         }
     }
@@ -1306,7 +1336,6 @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 f0, fd;
 
@@ -1320,16 +1349,14 @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = s->vec_stride + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_sreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -1355,7 +1382,7 @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_sreg(vd, delta_d);
                 neon_store_reg32(fd, vd);
             }
             break;
@@ -1363,8 +1390,8 @@ static bool do_vfp_2op_sp(DisasContext *s, VFPGen2OpSPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
+        vm = vfp_advance_sreg(vm, delta_m);
         neon_load_reg32(f0, vm);
     }
 
@@ -1378,7 +1405,6 @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 {
     uint32_t delta_m = 0;
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 f0, fd;
 
@@ -1397,16 +1423,14 @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
-
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
             delta_d = (s->vec_stride >> 1) + 1;
 
-            if ((vm & bank_mask) == 0) {
+            if (vfp_dreg_is_scalar(vm)) {
                 /* mixed scalar/vector */
                 delta_m = 0;
             } else {
@@ -1432,7 +1456,7 @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
         if (delta_m == 0) {
             /* single source one-many */
             while (veclen--) {
-                vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+                vd = vfp_advance_dreg(vd, delta_d);
                 neon_store_reg64(fd, vd);
             }
             break;
@@ -1440,8 +1464,8 @@ static bool do_vfp_2op_dp(DisasContext *s, VFPGen2OpDPFn *fn, int vd, int vm)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
-        vm = ((vm + delta_m) & (bank_mask - 1)) | (vm & bank_mask);
+        vd = vfp_advance_dreg(vd, delta_d);
+        vd = vfp_advance_dreg(vm, delta_m);
         neon_load_reg64(f0, vm);
     }
 
@@ -1783,7 +1807,6 @@ static bool trans_VFM_dp(DisasContext *s, arg_VFM_sp *a)
 static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i32 fd;
     uint32_t n, i, vd;
@@ -1804,9 +1827,8 @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0x18;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_sreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -1835,7 +1857,7 @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vd = vfp_advance_sreg(vd, delta_d);
     }
 
     tcg_temp_free_i32(fd);
@@ -1845,7 +1867,6 @@ static bool trans_VMOV_imm_sp(DisasContext *s, arg_VMOV_imm_sp *a)
 static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 {
     uint32_t delta_d = 0;
-    uint32_t bank_mask = 0;
     int veclen = s->vec_len;
     TCGv_i64 fd;
     uint32_t n, i, vd;
@@ -1871,9 +1892,8 @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
     }
 
     if (veclen > 0) {
-        bank_mask = 0xc;
         /* Figure out what type of vector operation this is.  */
-        if ((vd & bank_mask) == 0) {
+        if (vfp_dreg_is_scalar(vd)) {
             /* scalar */
             veclen = 0;
         } else {
@@ -1902,7 +1922,7 @@ static bool trans_VMOV_imm_dp(DisasContext *s, arg_VMOV_imm_dp *a)
 
         /* Set up the operands for the next iteration */
         veclen--;
-        vd = ((vd + delta_d) & (bank_mask - 1)) | (vd & bank_mask);
+        vfp_advance_dreg(vd, delta_d);
     }
 
     tcg_temp_free_i64(fd);
-- 
2.20.1

next prev parent reply	other threads:[~2019-06-13 13:51 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-13 12:13 [Qemu-devel] [PULL 00/48] target-arm queue Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 01/48] target/arm: Vectorize USHL and SSHL Peter Maydell
2019-06-13 14:11   ` Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 02/48] target/arm: Use tcg_gen_gvec_bitsel Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 03/48] target/arm: Implement NSACR gating of floating point Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 04/48] hw/arm/smmuv3: Fix decoding of ID register range Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 05/48] hw/core/bus.c: Only the main system bus can have no parent Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 06/48] target/arm: Fix output of PAuth Auth Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 07/48] decodetree: Fix comparison of Field Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 08/48] target/arm: Add stubs for AArch32 VFP decodetree Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 09/48] target/arm: Factor out VFP access checking code Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 10/48] target/arm: Fix Cortex-R5F MVFR values Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 11/48] target/arm: Explicitly enable VFP short-vectors for aarch32 -cpu max Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 12/48] target/arm: Convert the VSEL instructions to decodetree Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 13/48] target/arm: Convert VMINNM, VMAXNM " Peter Maydell
2019-06-13 12:13 ` [Qemu-devel] [PULL 14/48] target/arm: Convert VRINTA/VRINTN/VRINTP/VRINTM " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 15/48] target/arm: Convert VCVTA/VCVTN/VCVTP/VCVTM " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 16/48] target/arm: Move the VFP trans_* functions to translate-vfp.inc.c Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 17/48] target/arm: Add helpers for VFP register loads and stores Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 18/48] target/arm: Convert "double-precision" register moves to decodetree Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 19/48] target/arm: Convert "single-precision" " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 20/48] target/arm: Convert VFP two-register transfer insns " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 21/48] target/arm: Convert VFP VLDR and VSTR " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 22/48] target/arm: Convert the VFP load/store multiple insns " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 23/48] target/arm: Remove VLDR/VSTR/VLDM/VSTM use of cpu_F0s and cpu_F0d Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 24/48] target/arm: Convert VFP VMLA to decodetree Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 25/48] target/arm: Convert VFP VMLS " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 26/48] target/arm: Convert VFP VNMLS " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 27/48] target/arm: Convert VFP VNMLA " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 28/48] target/arm: Convert VMUL " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 29/48] target/arm: Convert VNMUL " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 30/48] target/arm: Convert VADD " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 31/48] target/arm: Convert VSUB " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 32/48] target/arm: Convert VDIV " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 33/48] target/arm: Convert VFP fused multiply-add insns " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 34/48] target/arm: Convert VMOV (imm) " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 35/48] target/arm: Convert VABS " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 36/48] target/arm: Convert VNEG " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 37/48] target/arm: Convert VSQRT " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 38/48] target/arm: Convert VMOV (register) " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 39/48] target/arm: Convert VFP comparison insns " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 40/48] target/arm: Convert the VCVT-from-f16 " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 41/48] target/arm: Convert the VCVT-to-f16 " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 42/48] target/arm: Convert VFP round " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 43/48] target/arm: Convert double-single precision conversion " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 44/48] target/arm: Convert integer-to-float " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 45/48] target/arm: Convert VJCVT " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 46/48] target/arm: Convert VCVT fp/fixed-point conversion insns " Peter Maydell
2019-06-13 12:14 ` [Qemu-devel] [PULL 47/48] target/arm: Convert float-to-integer VCVT " Peter Maydell
2019-06-13 12:14 ` Peter Maydell [this message]
2019-06-13 14:18 ` [Qemu-devel] [PULL 00/48] target-arm queue no-reply
2019-06-13 16:51 ` no-reply

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:8216dba796 dfblob:709fc65374 )
 OR (
bs:"[Qemu-devel] [PULL 48/48] target/arm: Fix short-vector increment behaviour" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190613121433.5246-49-peter.maydell@linaro.org \
    --to=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).