qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org
Subject: Re: [PATCH v3 31/97] target/arm: Implemement SME2 SDOT, UDOT, USDOT,  SUDOT
Date: Thu, 3 Jul 2025 10:25:28 -0600	[thread overview]
Message-ID: <2767ec10-d08d-4de0-95f1-3362e2acf15e@linaro.org> (raw)
In-Reply-To: <CAFEAcA8szLP4mEvkatHhbBJzU5A6w0XGcMRRJYr_HPSNgZmU7Q@mail.gmail.com>

On 7/3/25 03:45, Peter Maydell wrote:
> On Wed, 2 Jul 2025 at 13:34, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> 
> 
>> +/* Similar for 2-way dot product */
>> +#define DO_DOT(NAME, TYPED, TYPEN, TYPEM) \
>> +void HELPER(NAME)(void *vd, void *vn, void *vm, void *va, uint32_t desc)  \
>> +{                                                                         \
>> +    intptr_t i, opr_sz = simd_oprsz(desc);                                \
>> +    TYPED *d = vd, *a = va;                                               \
>> +    TYPEN *n = vn;                                                        \
>> +    TYPEM *m = vm;                                                        \
>> +    for (i = 0; i < opr_sz / sizeof(TYPED); ++i) {                        \
>> +        d[i] = (a[i] +                                                    \
>> +                (TYPED)n[i * 2 + 0] * m[i * 2 + 0] +                      \
>> +                (TYPED)n[i * 2 + 1] * m[i * 2 + 1]);                      \
> 
> Don't we need some H macros here for the big-endian host case?
> (For that matter, the existing 4-way dot product helpers also
> look like they won't work on big-endian...)

The logic here is that all columns are treated identically.

...a0... ...a1...
.n0..n1. .n2..n3.
.m0..m1. .m2..m3.

vs

...a1... ...a0...
.n3..n2. .n1..n0.
.m3..m2. .m1..m0.

d0 = a0 + n0 * m0 + n1 * m1 -- it doesn't matter if n0 or n1 is at the lowest or highest 
address, because it still gets multiplied by the corresponding element in m, and then the 
two products are added to the sum that is addressed the same way.

The existing 4-way dot product uses the same endian independent logic, fwiw.


r~


  reply	other threads:[~2025-07-03 16:26 UTC|newest]

Thread overview: 172+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 12:32 [PATCH v3 00/97] target/arm: Implement FEAT_SME2p1 Richard Henderson
2025-07-02 12:32 ` [PATCH v3 01/97] target/arm: Introduce FPST_ZA, FPST_ZA_F16 Richard Henderson
2025-07-02 12:32 ` [PATCH v3 02/97] target/arm: Use FPST_ZA for sme_fmopa_[hsd] Richard Henderson
2025-07-02 12:32 ` [PATCH v3 03/97] target/arm: Rename zarray to za_state.za Richard Henderson
2025-07-02 12:32 ` [PATCH v3 04/97] target/arm: Add isar feature tests for SME2p1, SVE2p1 Richard Henderson
2025-07-02 12:32 ` [PATCH v3 05/97] target/arm: Add ZT0 Richard Henderson
2025-07-02 12:32 ` [PATCH v3 06/97] target/arm: Add zt0_excp_el to DisasContext Richard Henderson
2025-07-03  9:32   ` Peter Maydell
2025-07-02 12:32 ` [PATCH v3 07/97] target/arm: Implement SME2 ZERO ZT0 Richard Henderson
2025-07-02 12:32 ` [PATCH v3 08/97] target/arm: Add alignment argument to gen_sve_{ldr, str} Richard Henderson
2025-07-03  9:33   ` Peter Maydell
2025-07-02 12:32 ` [PATCH v3 09/97] target/arm: Implement SME2 LDR/STR ZT0 Richard Henderson
2025-07-02 12:32 ` [PATCH v3 10/97] target/arm: Implement SME2 MOVT Richard Henderson
2025-07-02 12:32 ` [PATCH v3 11/97] target/arm: Split get_tile_rowcol argument tile_index Richard Henderson
2025-07-02 12:32 ` [PATCH v3 12/97] target/arm: Rename MOVA for translate Richard Henderson
2025-07-02 12:32 ` [PATCH v3 13/97] target/arm: Split out get_zarray Richard Henderson
2025-07-02 12:32 ` [PATCH v3 14/97] target/arm: Introduce ARMCPU.sme_max_vq Richard Henderson
2025-07-03  9:33   ` Peter Maydell
2025-07-02 12:32 ` [PATCH v3 15/97] target/arm: Implement SME2 MOVA to/from tile, multiple registers Richard Henderson
2025-07-03  9:33   ` Peter Maydell
2025-07-02 12:32 ` [PATCH v3 16/97] target/arm: Implement SME2 MOVA to/from array, " Richard Henderson
2025-07-02 12:32 ` [PATCH v3 17/97] target/arm: Implement SME2 BMOPA Richard Henderson
2025-07-02 12:32 ` [PATCH v3 18/97] target/arm: Implement SME2 SMOPS, UMOPS (2-way) Richard Henderson
2025-07-02 12:32 ` [PATCH v3 19/97] target/arm: Introduce gen_gvec_sve2_sqdmulh Richard Henderson
2025-07-02 12:32 ` [PATCH v3 20/97] target/arm: Implement SME2 Multiple and Single SVE Destructive Richard Henderson
2025-07-02 12:32 ` [PATCH v3 21/97] target/arm: Implement SME2 Multiple Vectors " Richard Henderson
2025-07-02 12:32 ` [PATCH v3 22/97] target/arm: Implement SME2 ADD/SUB (array results, multiple and single vector) Richard Henderson
2025-07-02 12:32 ` [PATCH v3 23/97] target/arm: Implement SME2 ADD/SUB (array results, multiple vectors) Richard Henderson
2025-07-02 12:32 ` [PATCH v3 24/97] target/arm: Pass ZA to helper_sve2_fmlal_zz[zx]w_s Richard Henderson
2025-07-02 12:32 ` [PATCH v3 25/97] target/arm: Add helper_gvec{_ah}_bfmlsl{_nx} Richard Henderson
2025-07-03  9:34   ` Peter Maydell
2025-07-02 12:32 ` [PATCH v3 26/97] target/arm: Implement SME2 FMLAL, BFMLAL Richard Henderson
2025-07-02 12:33 ` [PATCH v3 27/97] target/arm: Implement SME2 FDOT Richard Henderson
2025-07-02 12:33 ` [PATCH v3 28/97] target/arm: Implement SME2 BFDOT Richard Henderson
2025-07-02 12:33 ` [PATCH v3 29/97] target/arm: Implement SME2 FVDOT, BFVDOT Richard Henderson
2025-07-02 12:33 ` [PATCH v3 30/97] target/arm: Rename helper_gvec_*dot_[bh] to *_4[bh] Richard Henderson
2025-07-02 12:33 ` [PATCH v3 31/97] target/arm: Implemement SME2 SDOT, UDOT, USDOT, SUDOT Richard Henderson
2025-07-03  9:45   ` Peter Maydell
2025-07-03 16:25     ` Richard Henderson [this message]
2025-07-02 12:33 ` [PATCH v3 32/97] target/arm: Rename SVE SDOT and UDOT patterns Richard Henderson
2025-07-03  9:49   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 33/97] target/arm: Tighten USDOT (vectors) decode Richard Henderson
2025-07-03  9:49   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 34/97] target/arm: Implement SDOT, UDOT (2-way) for SME2/SVE2p1 Richard Henderson
2025-07-03  9:49   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 35/97] target/arm: Implement SME2 SVDOT, UVDOT, SUVDOT, USVDOT Richard Henderson
2025-07-02 12:33 ` [PATCH v3 36/97] target/arm: Implement SME2 SMLAL, SMLSL, UMLAL, UMLSL Richard Henderson
2025-07-03  9:50   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 37/97] target/arm: Rename gvec_fml[as]_[hs] with _nf_ infix Richard Henderson
2025-07-03  9:50   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 38/97] target/arm: Implement SME2 FMLA, FMLS Richard Henderson
2025-07-03  9:50   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 39/97] target/arm: Implement SME2 BFMLA, BFMLS Richard Henderson
2025-07-03  9:50   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 40/97] target/arm: Implement SME2 FADD, FSUB, BFADD, BFSUB Richard Henderson
2025-07-03  9:51   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 41/97] target/arm: Implement SME2 ADD/SUB (array accumulator) Richard Henderson
2025-07-03 10:13   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 42/97] target/arm: Implement SME2 BFCVT, BFCVTN, FCVT, FCVTN Richard Henderson
2025-07-03 10:13   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 43/97] target/arm: Implement SME2 FCVT (widening), FCVTL Richard Henderson
2025-07-03 10:14   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 44/97] target/arm: Implement SME2 FCVTZS, FCVTZU Richard Henderson
2025-07-03 10:14   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 45/97] target/arm: Implement SME2 SCVTF, UCVTF Richard Henderson
2025-07-03 10:14   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 46/97] target/arm: Implement SME2 FRINTN, FRINTP, FRINTM, FRINTA Richard Henderson
2025-07-03 10:14   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 47/97] target/arm: Introduce do_[us]sat_[bhs] macros Richard Henderson
2025-07-03 10:15   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 48/97] target/arm: Use do_[us]sat_[bhs] in sve_helper.c Richard Henderson
2025-07-03 10:15   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 49/97] target/arm: Implement SME2 SQCVT, UQCVT, SQCVTU Richard Henderson
2025-07-03 10:20   ` Peter Maydell
2025-07-03 15:53     ` Richard Henderson
2025-07-02 12:33 ` [PATCH v3 50/97] target/arm: Implement SQCVTN, UQCVTN, SQCVTUN for SME2/SVE2p1 Richard Henderson
2025-07-03 10:22   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 51/97] target/arm: Implement SME2 SUNPK, UUNPK Richard Henderson
2025-07-03 10:23   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 52/97] target/arm: Implement SME2 ZIP, UZP (four registers) Richard Henderson
2025-07-03 10:25   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 53/97] target/arm: Move do_urshr, do_srshr to vec_internal.h Richard Henderson
2025-07-03 10:25   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 54/97] target/arm: Implement SME2 SQRSHR, UQRSHR, SQRSHRN Richard Henderson
2025-07-03 10:28   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 55/97] target/arm: Implement SME2 ZIP, UZP (two registers) Richard Henderson
2025-07-03 10:27   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 56/97] target/arm: Implement SME2 FCLAMP, SCLAMP, UCLAMP Richard Henderson
2025-07-03 10:31   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 57/97] target/arm: Enable SCLAMP, UCLAMP for SVE2p1 Richard Henderson
2025-07-03 10:32   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 58/97] target/arm: Implement FCLAMP for SME2, SVE2p1 Richard Henderson
2025-07-03 10:32   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 59/97] target/arm: Implement SME2p1 Multiple Zero Richard Henderson
2025-07-03 10:33   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 60/97] target/arm: Introduce pred_count_test Richard Henderson
2025-07-02 13:34   ` Richard Henderson
2025-07-02 12:33 ` [PATCH v3 61/97] target/arm: Fold predtest_ones into helper_sve_brkns Richard Henderson
2025-07-03 10:36   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 62/97] target/arm: Split out do_whilel from helper_sve_whilel Richard Henderson
2025-07-03 10:38   ` Peter Maydell
2025-07-03 16:02     ` Richard Henderson
2025-07-02 12:33 ` [PATCH v3 63/97] target/arm: Split out do_whileg from helper_sve_whileg Richard Henderson
2025-07-02 12:33 ` [PATCH v3 64/97] target/arm: Move scale by esz into helper_sve_while* Richard Henderson
2025-07-03 12:10   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 65/97] target/arm: Split trans_WHILE to lt and gt Richard Henderson
2025-07-03 12:10   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 66/97] target/arm: Enable PSEL for SVE2p1 Richard Henderson
2025-07-03 12:10   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 67/97] target/arm: Implement SVE2p1 WHILE (predicate pair) Richard Henderson
2025-07-03 12:11   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 68/97] target/arm: Implement SVE2p1 WHILE (predicate as counter) Richard Henderson
2025-07-03 12:12   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 69/97] target/arm: Implement SVE2p1 PTRUE " Richard Henderson
2025-07-03 12:14   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 70/97] target/arm: Implement {ADD, SMIN, SMAX, UMIN, UMAX}QV for SVE2p1 Richard Henderson
2025-07-03 12:15   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 71/97] target/arm: Implement SVE2p1 PEXT Richard Henderson
2025-07-03 12:19   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 72/97] target/arm: Implement SME2 SEL Richard Henderson
2025-07-03 12:21   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 73/97] target/arm: Implement ANDQV, ORQV, EORQV for SVE2p1 Richard Henderson
2025-07-03 12:23   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 74/97] target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV " Richard Henderson
2025-07-03 12:24   ` [PATCH v3 74/97] target/arm: Implement FADDQV, F{MIN,MAX}{NM}QV " Peter Maydell
2025-07-02 12:33 ` [PATCH v3 75/97] target/arm: Implement BFMLSLB{L,T} for SME2/SVE2p1 Richard Henderson
2025-07-03 12:24   ` [PATCH v3 75/97] target/arm: Implement BFMLSLB{L, T} " Peter Maydell
2025-07-02 12:33 ` [PATCH v3 76/97] target/arm: Implement CNTP (predicate as counter) " Richard Henderson
2025-07-03 12:25   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 77/97] target/arm: Implement DUPQ for SME2p1/SVE2p1 Richard Henderson
2025-07-03 12:27   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 78/97] target/arm: Implement EXTQ " Richard Henderson
2025-07-03 12:27   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 79/97] target/arm: Implement PMOV " Richard Henderson
2025-07-03 13:24   ` Peter Maydell
2025-07-03 23:27     ` Richard Henderson
2025-07-02 12:33 ` [PATCH v3 80/97] target/arm: Implement ZIPQ, UZPQ " Richard Henderson
2025-07-03 12:28   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 81/97] target/arm: Implement TBLQ, TBXQ " Richard Henderson
2025-07-03 12:29   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 82/97] target/arm: Implement SME2 counted predicate register load/store Richard Henderson
2025-07-03 13:30   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 83/97] target/arm: Split the ST_zpri and ST_zprr patterns Richard Henderson
2025-07-03 12:30   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 84/97] target/arm: Implement {LD1, ST1}{W, D} (128-bit element) for SVE2p1 Richard Henderson
2025-07-03 12:33   ` [PATCH v3 84/97] target/arm: Implement {LD1,ST1}{W,D} " Peter Maydell
2025-07-02 12:33 ` [PATCH v3 85/97] target/arm: Move ld1qq and st1qq primitives to sve_ldst_internal.h Richard Henderson
2025-07-03 12:35   ` Peter Maydell
2025-07-02 12:33 ` [PATCH v3 86/97] target/arm: Implement {LD, ST}[234]Q for SME2p1/SVE2p1 Richard Henderson
2025-07-03 12:37   ` Peter Maydell
2025-07-02 12:34 ` [PATCH v3 87/97] target/arm: Implement LD1Q, ST1Q for SVE2p1 Richard Henderson
2025-07-03 12:37   ` Peter Maydell
2025-07-02 12:34 ` [PATCH v3 88/97] target/arm: Implement MOVAZ for SME2p1 Richard Henderson
2025-07-03 12:38   ` Peter Maydell
2025-07-02 12:34 ` [PATCH v3 89/97] target/arm: Implement LUTI2, LUTI4 for SME2/SME2p1 Richard Henderson
2025-07-03 12:42   ` Peter Maydell
2025-07-02 12:34 ` [PATCH v3 90/97] target/arm: Rename FMOPA_h to FMOPA_w_h Richard Henderson
2025-07-02 12:34 ` [PATCH v3 91/97] target/arm: Rename BFMOPA to BFMOPA_w Richard Henderson
2025-07-02 12:34 ` [PATCH v3 92/97] target/arm: Support FPCR.AH in SME FMOPS, BFMOPS Richard Henderson
2025-07-03 12:45   ` Peter Maydell
2025-07-02 12:34 ` [PATCH v3 93/97] target/arm: Implement FMOPA (non-widening) for fp16 Richard Henderson
2025-07-03 17:15   ` Alex Bennée
2025-07-02 12:34 ` [PATCH v3 94/97] target/arm: Implement SME2 BFMOPA (non-widening) Richard Henderson
2025-07-03 17:16   ` Alex Bennée
2025-07-02 12:34 ` [PATCH v3 95/97] target/arm: Enable FEAT_SME2p1 on -cpu max Richard Henderson
2025-07-03 12:46   ` Peter Maydell
2025-07-03 17:17   ` Alex Bennée
2025-07-04  3:12     ` Richard Henderson
2025-07-04  9:30       ` Alex Bennée
2025-07-02 12:34 ` [PATCH v3 96/97] linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1 Richard Henderson
2025-07-03 12:50   ` Peter Maydell
2025-07-02 12:34 ` [PATCH v3 97/97] tests/tcg/aarch64: Add sme2-matmul test case Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2767ec10-d08d-4de0-95f1-3362e2acf15e@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).