[PATCH v3 01/81] target/arm: Fix sve_uzp_p vs odd vector lengths

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: Laurent Desnogues <laurent.desnogues@gmail.com>,
	peter.maydell@linaro.org, alex.bennee@linaro.org
Subject: [PATCH v3 01/81] target/arm: Fix sve_uzp_p vs odd vector lengths
Date: Fri, 18 Sep 2020 11:36:31 -0700	[thread overview]
Message-ID: <20200918183751.2787647-2-richard.henderson@linaro.org> (raw)
In-Reply-To: <20200918183751.2787647-1-richard.henderson@linaro.org>

Missed out on compressing the second half of a predicate
with length vl % 512 > 256.

Adjust all of the x + (y << s) to x | (y << s) as a
general style fix.  Drop the extract64 because the input
uint64_t are known to be already zero-extended from the
current size of the predicate.

Reported-by: Laurent Desnogues <laurent.desnogues@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/sve_helper.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/target/arm/sve_helper.c b/target/arm/sve_helper.c
index 4758d46f34..fcb46f150f 100644
--- a/target/arm/sve_helper.c
+++ b/target/arm/sve_helper.c
@@ -1938,7 +1938,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
     if (oprsz <= 8) {
         l = compress_bits(n[0] >> odd, esz);
         h = compress_bits(m[0] >> odd, esz);
-        d[0] = extract64(l + (h << (4 * oprsz)), 0, 8 * oprsz);
+        d[0] = l | (h << (4 * oprsz));
     } else {
         ARMPredicateReg tmp_m;
         intptr_t oprsz_16 = oprsz / 16;
@@ -1952,23 +1952,35 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
             h = n[2 * i + 1];
             l = compress_bits(l >> odd, esz);
             h = compress_bits(h >> odd, esz);
-            d[i] = l + (h << 32);
+            d[i] = l | (h << 32);
         }
 
-        /* For VL which is not a power of 2, the results from M do not
-           align nicely with the uint64_t for D.  Put the aligned results
-           from M into TMP_M and then copy it into place afterward.  */
+        /*
+         * For VL which is not a multiple of 512, the results from M do not
+         * align nicely with the uint64_t for D.  Put the aligned results
+         * from M into TMP_M and then copy it into place afterward.
+         */
         if (oprsz & 15) {
-            d[i] = compress_bits(n[2 * i] >> odd, esz);
+            int final_shift = (oprsz & 15) * 2;
+
+            l = n[2 * i + 0];
+            h = n[2 * i + 1];
+            l = compress_bits(l >> odd, esz);
+            h = compress_bits(h >> odd, esz);
+            d[i] = l | (h << final_shift);
 
             for (i = 0; i < oprsz_16; i++) {
                 l = m[2 * i + 0];
                 h = m[2 * i + 1];
                 l = compress_bits(l >> odd, esz);
                 h = compress_bits(h >> odd, esz);
-                tmp_m.p[i] = l + (h << 32);
+                tmp_m.p[i] = l | (h << 32);
             }
-            tmp_m.p[i] = compress_bits(m[2 * i] >> odd, esz);
+            l = m[2 * i + 0];
+            h = m[2 * i + 1];
+            l = compress_bits(l >> odd, esz);
+            h = compress_bits(h >> odd, esz);
+            tmp_m.p[i] = l | (h << final_shift);
 
             swap_memmove(vd + oprsz / 2, &tmp_m, oprsz / 2);
         } else {
@@ -1977,7 +1989,7 @@ void HELPER(sve_uzp_p)(void *vd, void *vn, void *vm, uint32_t pred_desc)
                 h = m[2 * i + 1];
                 l = compress_bits(l >> odd, esz);
                 h = compress_bits(h >> odd, esz);
-                d[oprsz_16 + i] = l + (h << 32);
+                d[oprsz_16 + i] = l | (h << 32);
             }
         }
     }
-- 
2.25.1

next prev parent reply	other threads:[~2020-09-18 18:43 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-18 18:36 [PATCH v3 00/81] target/arm: Implement SVE2 Richard Henderson
2020-09-18 18:36 ` Richard Henderson [this message]
2020-09-18 18:36 ` [PATCH v3 02/81] target/arm: Fix sve_zip_p vs odd vector lengths Richard Henderson
2020-09-18 18:36 ` [PATCH v3 03/81] target/arm: Fix sve_punpk_p " Richard Henderson
2020-09-18 18:36 ` [PATCH v3 04/81] target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2 Richard Henderson
2020-09-18 18:36 ` [PATCH v3 05/81] target/arm: Implement SVE2 Integer Multiply - Unpredicated Richard Henderson
2020-09-18 18:36 ` [PATCH v3 06/81] target/arm: Implement SVE2 integer pairwise add and accumulate long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 07/81] target/arm: Implement SVE2 integer unary operations (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 08/81] target/arm: Split out saturating/rounding shifts from neon Richard Henderson
2020-09-18 18:36 ` [PATCH v3 09/81] target/arm: Implement SVE2 saturating/rounding bitwise shift left (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 10/81] target/arm: Implement SVE2 integer halving add/subtract (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 11/81] target/arm: Implement SVE2 integer pairwise arithmetic Richard Henderson
2020-09-18 18:36 ` [PATCH v3 12/81] target/arm: Implement SVE2 saturating add/subtract (predicated) Richard Henderson
2020-09-18 18:36 ` [PATCH v3 13/81] target/arm: Implement SVE2 integer add/subtract long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 14/81] target/arm: Implement SVE2 integer add/subtract interleaved long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 15/81] target/arm: Implement SVE2 integer add/subtract wide Richard Henderson
2020-09-18 18:36 ` [PATCH v3 16/81] target/arm: Implement SVE2 integer multiply long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 17/81] target/arm: Implement PMULLB and PMULLT Richard Henderson
2020-09-18 18:36 ` [PATCH v3 18/81] target/arm: Implement SVE2 bitwise shift left long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 19/81] target/arm: Implement SVE2 bitwise exclusive-or interleaved Richard Henderson
2020-09-18 18:36 ` [PATCH v3 20/81] target/arm: Implement SVE2 bitwise permute Richard Henderson
2020-09-18 18:36 ` [PATCH v3 21/81] target/arm: Implement SVE2 complex integer add Richard Henderson
2020-09-18 18:36 ` [PATCH v3 22/81] target/arm: Implement SVE2 integer absolute difference and accumulate long Richard Henderson
2020-09-18 18:36 ` [PATCH v3 23/81] target/arm: Implement SVE2 integer add/subtract long with carry Richard Henderson
2020-09-18 18:36 ` [PATCH v3 24/81] target/arm: Implement SVE2 bitwise shift right and accumulate Richard Henderson
2020-09-18 18:36 ` [PATCH v3 25/81] target/arm: Implement SVE2 bitwise shift and insert Richard Henderson
2020-09-18 18:36 ` [PATCH v3 26/81] target/arm: Implement SVE2 integer absolute difference and accumulate Richard Henderson
2020-09-18 18:36 ` [PATCH v3 27/81] target/arm: Implement SVE2 saturating extract narrow Richard Henderson
2020-09-18 18:36 ` [PATCH v3 28/81] target/arm: Implement SVE2 floating-point pairwise Richard Henderson
2020-09-18 18:36 ` [PATCH v3 29/81] target/arm: Implement SVE2 SHRN, RSHRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 30/81] target/arm: Implement SVE2 SQSHRUN, SQRSHRUN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 31/81] target/arm: Implement SVE2 UQSHRN, UQRSHRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 32/81] target/arm: Implement SVE2 SQSHRN, SQRSHRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 33/81] target/arm: Implement SVE2 WHILEGT, WHILEGE, WHILEHI, WHILEHS Richard Henderson
2020-09-18 18:37 ` [PATCH v3 34/81] target/arm: Implement SVE2 WHILERW, WHILEWR Richard Henderson
2020-10-13  2:33   ` LIU Zhiwei
2020-10-19 21:58     ` Richard Henderson
2020-09-18 18:37 ` [PATCH v3 35/81] target/arm: Implement SVE2 bitwise ternary operations Richard Henderson
2020-09-18 18:37 ` [PATCH v3 36/81] target/arm: Implement SVE2 MATCH, NMATCH Richard Henderson
2020-09-18 18:37 ` [PATCH v3 37/81] target/arm: Implement SVE2 saturating multiply-add long Richard Henderson
2020-09-18 18:37 ` [PATCH v3 38/81] target/arm: Implement SVE2 saturating multiply-add high Richard Henderson
2020-09-18 18:37 ` [PATCH v3 39/81] target/arm: Implement SVE2 integer multiply-add long Richard Henderson
2020-09-18 18:37 ` [PATCH v3 40/81] target/arm: Implement SVE2 complex integer multiply-add Richard Henderson
2020-09-18 18:37 ` [PATCH v3 41/81] target/arm: Implement SVE2 ADDHNB, ADDHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 42/81] target/arm: Implement SVE2 RADDHNB, RADDHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 43/81] target/arm: Implement SVE2 SUBHNB, SUBHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 44/81] target/arm: Implement SVE2 RSUBHNB, RSUBHNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 45/81] target/arm: Implement SVE2 HISTCNT, HISTSEG Richard Henderson
2020-10-09  6:13   ` LIU Zhiwei
2020-10-09 12:35     ` Richard Henderson
2020-09-18 18:37 ` [PATCH v3 46/81] target/arm: Implement SVE2 XAR Richard Henderson
2020-09-18 18:37 ` [PATCH v3 47/81] target/arm: Implement SVE2 scatter store insns Richard Henderson
2020-09-18 18:37 ` [PATCH v3 48/81] target/arm: Implement SVE2 gather load insns Richard Henderson
2020-09-18 18:37 ` [PATCH v3 49/81] target/arm: Implement SVE2 FMMLA Richard Henderson
2020-09-18 18:37 ` [PATCH v3 50/81] target/arm: Implement SVE2 SPLICE, EXT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 51/81] target/arm: Pass separate addend to {U, S}DOT helpers Richard Henderson
2020-09-23 10:01   ` LIU Zhiwei
2020-09-23 14:46     ` Richard Henderson
2020-09-24  1:29       ` LIU Zhiwei
2020-09-23 11:48   ` LIU Zhiwei
2020-10-09 12:42     ` Richard Henderson
2020-09-18 18:37 ` [PATCH v3 52/81] target/arm: Pass separate addend to FCMLA helpers Richard Henderson
2020-09-18 18:37 ` [PATCH v3 53/81] target/arm: Split out formats for 2 vectors + 1 index Richard Henderson
2020-09-18 18:37 ` [PATCH v3 54/81] target/arm: Split out formats for 3 " Richard Henderson
2020-09-18 18:37 ` [PATCH v3 55/81] target/arm: Implement SVE2 integer multiply (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 56/81] target/arm: Implement SVE2 integer multiply-add (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 57/81] target/arm: Implement SVE2 saturating multiply-add high (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 58/81] target/arm: Implement SVE2 saturating multiply-add (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 59/81] target/arm: Implement SVE2 integer multiply long (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 60/81] target/arm: Implement SVE2 saturating multiply (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 61/81] target/arm: Implement SVE2 signed saturating doubling multiply high Richard Henderson
2020-09-18 18:37 ` [PATCH v3 62/81] target/arm: Implement SVE2 saturating multiply high (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 63/81] target/arm: Implement SVE2 multiply-add long (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 64/81] target/arm: Implement SVE2 complex integer multiply-add (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 65/81] target/arm: Implement SVE mixed sign dot product (indexed) Richard Henderson
2020-09-18 18:37 ` [PATCH v3 66/81] target/arm: Implement SVE mixed sign dot product Richard Henderson
2020-09-18 18:37 ` [PATCH v3 67/81] target/arm: Implement SVE2 crypto unary operations Richard Henderson
2020-09-18 18:37 ` [PATCH v3 68/81] target/arm: Implement SVE2 crypto destructive binary operations Richard Henderson
2020-09-18 18:37 ` [PATCH v3 69/81] target/arm: Implement SVE2 crypto constructive " Richard Henderson
2020-09-18 18:37 ` [PATCH v3 70/81] target/arm: Implement SVE2 TBL, TBX Richard Henderson
2020-09-18 18:37 ` [PATCH v3 71/81] target/arm: Implement SVE2 FCVTNT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 72/81] target/arm: Implement SVE2 FCVTLT Richard Henderson
2020-09-18 18:37 ` [PATCH v3 73/81] target/arm: Implement SVE2 FCVTXNT, FCVTX Richard Henderson
2020-09-18 18:37 ` [PATCH v3 74/81] target/arm: Implement SVE2 FLOGB Richard Henderson
2020-09-18 18:37 ` [PATCH v3 75/81] target/arm: Share table of sve load functions Richard Henderson
2020-09-18 18:37 ` [PATCH v3 76/81] target/arm: Implement SVE2 LD1RO Richard Henderson
2020-09-18 18:37 ` [PATCH v3 77/81] target/arm: Implement 128-bit ZIP, UZP, TRN Richard Henderson
2020-09-18 18:37 ` [PATCH v3 78/81] target/arm: Implement SVE2 bitwise shift immediate Richard Henderson
2020-09-18 18:37 ` [PATCH v3 79/81] target/arm: Implement SVE2 fp multiply-add long Richard Henderson
2020-09-18 18:37 ` [PATCH v3 80/81] target/arm: Implement SVE2 complex integer dot product Richard Henderson
2020-09-18 18:37 ` [PATCH v3 81/81] target/arm: Enable SVE2 and some extensions Richard Henderson
2020-11-10 19:55 ` [PATCH v3 00/81] target/arm: Implement SVE2 Stephen Long
2020-11-12 21:06   ` Richard Henderson
2020-11-11 18:17 ` Stephen Long

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4758d46f3 dfblob:fcb46f150 )
 OR (
bs:"[PATCH v3 01/81] target/arm: Fix sve_uzp_p vs odd vector lengths" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200918183751.2787647-2-richard.henderson@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=alex.bennee@linaro.org \
    --cc=laurent.desnogues@gmail.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).