qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: Laurent Desnogues <laurent.desnogues@gmail.com>,
	peter.maydell@linaro.org, alex.bennee@linaro.org
Subject: [PATCH v3 02/81] target/arm: Fix sve_zip_p vs odd vector lengths
Date: Fri, 18 Sep 2020 11:36:32 -0700	[thread overview]
Message-ID: <20200918183751.2787647-3-richard.henderson@linaro.org> (raw)
In-Reply-To: <20200918183751.2787647-1-richard.henderson@linaro.org>

Wrote too much with low-half zip (zip1) with vl % 512 != 0.

Adjust all of the x + (y << s) to x | (y << s) as a style fix.

We only ever have exact overlap between D, M, and N.  Therefore
we only need a single temporary, and we do not need to check for
partial overlap.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index fcb46f150f..b8651ae173 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1870,6 +1870,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
     intptr_t oprsz = extract32(pred_desc, 0, SIMD_OPRSZ_BITS) + 2;
     int esz = extract32(pred_desc, SIMD_DATA_SHIFT, 2);
     intptr_t high = extract32(pred_desc, SIMD_DATA_SHIFT + 2, 1);
+    int esize = 1 << esz;
     uint64_t *d = vd;
     intptr_t i;
 
@@ -1882,33 +1883,35 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
         mm = extract64(mm, high * half, half);
         nn = expand_bits(nn, esz);
         mm = expand_bits(mm, esz);
-        d[0] = nn + (mm << (1 << esz));
+        d[0] = nn | (mm << esize);
     } else {
-        ARMPredicateReg tmp_n, tmp_m;
+        ARMPredicateReg tmp;
 
         /* We produce output faster than we consume input.
            Therefore we must be mindful of possible overlap.  */
-        if ((vn - vd) < (uintptr_t)oprsz) {
-            vn = memcpy(&tmp_n, vn, oprsz);
-        }
-        if ((vm - vd) < (uintptr_t)oprsz) {
-            vm = memcpy(&tmp_m, vm, oprsz);
+        if (vd == vn) {
+            vn = memcpy(&tmp, vn, oprsz);
+            if (vd == vm) {
+                vm = vn;
+            }
+        } else if (vd == vm) {
+            vm = memcpy(&tmp, vm, oprsz);
         }
         if (high) {
             high = oprsz >> 1;
         }
 
-        if ((high & 3) == 0) {
+        if ((oprsz & 7) == 0) {
             uint32_t *n = vn, *m = vm;
             high >>= 2;
 
-            for (i = 0; i < DIV_ROUND_UP(oprsz, 8); i++) {
+            for (i = 0; i < oprsz / 8; i++) {
                 uint64_t nn = n[H4(high + i)];
                 uint64_t mm = m[H4(high + i)];
 
                 nn = expand_bits(nn, esz);
                 mm = expand_bits(mm, esz);
-                d[i] = nn + (mm << (1 << esz));
+                d[i] = nn | (mm << esize);
             }
         } else {
             uint8_t *n = vn, *m = vm;
@@ -1920,7 +1923,7 @@ void HELPER(sve_zip_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
 
                 nn = expand_bits(nn, esz);
                 mm = expand_bits(mm, esz);
-                d16[H2(i)] = nn + (mm << (1 << esz));
+                d16[H2(i)] = nn | (mm << esize);
             }
         }
     }
-- 
2.25.1



  parent reply	other threads:[~2020-09-18 18:39 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 18:36 [PATCH v3 00/81] target/arm: Implement SVE2 Richard Henderson
2020-09-18 18:36 ` [PATCH v3 01/81] target/arm: Fix sve_uzp_p vs odd vector lengths Richard Henderson
2020-09-18 18:36 ` Richard Henderson [this message]
2020-09-18 18:36 ` [PATCH v3 03/81] target/arm: Fix sve_punpk_p " Richard Henderson
2020-09-18 18:36 ` [PATCH v3 04/81] target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2 Richard Henderson
2020-09-18 18:36 ` [PATCH v3 05/81] target/arm: Implement SVE2 Integer Multiply - Unpredicated Richard Henderson
2020-09-18 18:36 ` [PATCH v3 06/81] target/arm: Implement SVE2 integer pairwise add and accumulate long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 07/81] target/arm: Implement SVE2 integer unary operations (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 08/81] target/arm: Split out saturating/rounding shifts from neon Richard Henderson
2020-09-18 18:36 ` [PATCH v3 09/81] target/arm: Implement SVE2 saturating/rounding bitwise shift left (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 10/81] target/arm: Implement SVE2 integer halving add/subtract (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 11/81] target/arm: Implement SVE2 integer pairwise arithmetic Richard Henderson
2020-09-18 18:36 ` [PATCH v3 12/81] target/arm: Implement SVE2 saturating add/subtract (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 13/81] target/arm: Implement SVE2 integer add/subtract long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 14/81] target/arm: Implement SVE2 integer add/subtract interleaved long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 15/81] target/arm: Implement SVE2 integer add/subtract wide Richard Henderson
2020-09-18 18:36 ` [PATCH v3 16/81] target/arm: Implement SVE2 integer multiply long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 17/81] target/arm: Implement PMULLB and PMULLT Richard Henderson
2020-09-18 18:36 ` [PATCH v3 18/81] target/arm: Implement SVE2 bitwise shift left long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 19/81] target/arm: Implement SVE2 bitwise exclusive-or interleaved Richard Henderson
2020-09-18 18:36 ` [PATCH v3 20/81] target/arm: Implement SVE2 bitwise permute Richard Henderson
2020-09-18 18:36 ` [PATCH v3 21/81] target/arm: Implement SVE2 complex integer add Richard Henderson
2020-09-18 18:36 ` [PATCH v3 22/81] target/arm: Implement SVE2 integer absolute difference and accumulate long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 23/81] target/arm: Implement SVE2 integer add/subtract long with carry Richard Henderson
2020-09-18 18:36 ` [PATCH v3 24/81] target/arm: Implement SVE2 bitwise shift right and accumulate Richard Henderson
2020-09-18 18:36 ` [PATCH v3 25/81] target/arm: Implement SVE2 bitwise shift and insert Richard Henderson
2020-09-18 18:36 ` [PATCH v3 26/81] target/arm: Implement SVE2 integer absolute difference and accumulate Richard Henderson
2020-09-18 18:36 ` [PATCH v3 27/81] target/arm: Implement SVE2 saturating extract narrow Richard Henderson
2020-09-18 18:36 ` [PATCH v3 28/81] target/arm: Implement SVE2 floating-point pairwise Richard Henderson
2020-09-18 18:36 ` [PATCH v3 29/81] target/arm: Implement SVE2 SHRN, RSHRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 30/81] target/arm: Implement SVE2 SQSHRUN, SQRSHRUN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 31/81] target/arm: Implement SVE2 UQSHRN, UQRSHRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 32/81] target/arm: Implement SVE2 SQSHRN, SQRSHRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 33/81] target/arm: Implement SVE2 WHILEGT, WHILEGE, WHILEHI, WHILEHS Richard Henderson
2020-09-18 18:37 ` [PATCH v3 34/81] target/arm: Implement SVE2 WHILERW, WHILEWR Richard Henderson
2020-10-13  2:33   ` LIU Zhiwei
2020-10-19 21:58     ` Richard Henderson
2020-09-18 18:37 ` [PATCH v3 35/81] target/arm: Implement SVE2 bitwise ternary operations Richard Henderson
2020-09-18 18:37 ` [PATCH v3 36/81] target/arm: Implement SVE2 MATCH, NMATCH Richard Henderson
2020-09-18 18:37 ` [PATCH v3 37/81] target/arm: Implement SVE2 saturating multiply-add long Richard Henderson
2020-09-18 18:37 ` [PATCH v3 38/81] target/arm: Implement SVE2 saturating multiply-add high Richard Henderson
2020-09-18 18:37 ` [PATCH v3 39/81] target/arm: Implement SVE2 integer multiply-add long Richard Henderson
2020-09-18 18:37 ` [PATCH v3 40/81] target/arm: Implement SVE2 complex integer multiply-add Richard Henderson
2020-09-18 18:37 ` [PATCH v3 41/81] target/arm: Implement SVE2 ADDHNB, ADDHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 42/81] target/arm: Implement SVE2 RADDHNB, RADDHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 43/81] target/arm: Implement SVE2 SUBHNB, SUBHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 44/81] target/arm: Implement SVE2 RSUBHNB, RSUBHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 45/81] target/arm: Implement SVE2 HISTCNT, HISTSEG Richard Henderson
2020-10-09  6:13   ` LIU Zhiwei
2020-10-09 12:35     ` Richard Henderson
2020-09-18 18:37 ` [PATCH v3 46/81] target/arm: Implement SVE2 XAR Richard Henderson
2020-09-18 18:37 ` [PATCH v3 47/81] target/arm: Implement SVE2 scatter store insns Richard Henderson
2020-09-18 18:37 ` [PATCH v3 48/81] target/arm: Implement SVE2 gather load insns Richard Henderson
2020-09-18 18:37 ` [PATCH v3 49/81] target/arm: Implement SVE2 FMMLA Richard Henderson
2020-09-18 18:37 ` [PATCH v3 50/81] target/arm: Implement SVE2 SPLICE, EXT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 51/81] target/arm: Pass separate addend to {U, S}DOT helpers Richard Henderson
2020-09-23 10:01   ` LIU Zhiwei
2020-09-23 14:46     ` Richard Henderson
2020-09-24  1:29       ` LIU Zhiwei
2020-09-23 11:48   ` LIU Zhiwei
2020-10-09 12:42     ` Richard Henderson
2020-09-18 18:37 ` [PATCH v3 52/81] target/arm: Pass separate addend to FCMLA helpers Richard Henderson
2020-09-18 18:37 ` [PATCH v3 53/81] target/arm: Split out formats for 2 vectors + 1 index Richard Henderson
2020-09-18 18:37 ` [PATCH v3 54/81] target/arm: Split out formats for 3 " Richard Henderson
2020-09-18 18:37 ` [PATCH v3 55/81] target/arm: Implement SVE2 integer multiply (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 56/81] target/arm: Implement SVE2 integer multiply-add (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 57/81] target/arm: Implement SVE2 saturating multiply-add high (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 58/81] target/arm: Implement SVE2 saturating multiply-add (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 59/81] target/arm: Implement SVE2 integer multiply long (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 60/81] target/arm: Implement SVE2 saturating multiply (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 61/81] target/arm: Implement SVE2 signed saturating doubling multiply high Richard Henderson
2020-09-18 18:37 ` [PATCH v3 62/81] target/arm: Implement SVE2 saturating multiply high (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 63/81] target/arm: Implement SVE2 multiply-add long (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 64/81] target/arm: Implement SVE2 complex integer multiply-add (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 65/81] target/arm: Implement SVE mixed sign dot product (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 66/81] target/arm: Implement SVE mixed sign dot product Richard Henderson
2020-09-18 18:37 ` [PATCH v3 67/81] target/arm: Implement SVE2 crypto unary operations Richard Henderson
2020-09-18 18:37 ` [PATCH v3 68/81] target/arm: Implement SVE2 crypto destructive binary operations Richard Henderson
2020-09-18 18:37 ` [PATCH v3 69/81] target/arm: Implement SVE2 crypto constructive " Richard Henderson
2020-09-18 18:37 ` [PATCH v3 70/81] target/arm: Implement SVE2 TBL, TBX Richard Henderson
2020-09-18 18:37 ` [PATCH v3 71/81] target/arm: Implement SVE2 FCVTNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 72/81] target/arm: Implement SVE2 FCVTLT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 73/81] target/arm: Implement SVE2 FCVTXNT, FCVTX Richard Henderson
2020-09-18 18:37 ` [PATCH v3 74/81] target/arm: Implement SVE2 FLOGB Richard Henderson
2020-09-18 18:37 ` [PATCH v3 75/81] target/arm: Share table of sve load functions Richard Henderson
2020-09-18 18:37 ` [PATCH v3 76/81] target/arm: Implement SVE2 LD1RO Richard Henderson
2020-09-18 18:37 ` [PATCH v3 77/81] target/arm: Implement 128-bit ZIP, UZP, TRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 78/81] target/arm: Implement SVE2 bitwise shift immediate Richard Henderson
2020-09-18 18:37 ` [PATCH v3 79/81] target/arm: Implement SVE2 fp multiply-add long Richard Henderson
2020-09-18 18:37 ` [PATCH v3 80/81] target/arm: Implement SVE2 complex integer dot product Richard Henderson
2020-09-18 18:37 ` [PATCH v3 81/81] target/arm: Enable SVE2 and some extensions Richard Henderson
2020-11-10 19:55 ` [PATCH v3 00/81] target/arm: Implement SVE2 Stephen Long
2020-11-12 21:06   ` Richard Henderson
2020-11-11 18:17 ` Stephen Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200918183751.2787647-3-richard.henderson@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=alex.bennee@linaro.org \
    --cc=laurent.desnogues@gmail.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).