From: Richard Henderson <richard.henderson@linaro.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org
Subject: Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)
Date: Fri, 24 Jun 2022 07:16:57 -0700 [thread overview]
Message-ID: <c2eae981-55e4-0430-ee56-ac853cfc930d@linaro.org> (raw)
In-Reply-To: <CAFEAcA-y99PmUdPbdrWSj=_vUy35tRRFOJgkG2Lyg1A_iK6qRQ@mail.gmail.com>
On 6/24/22 05:31, Peter Maydell wrote:
> On Mon, 20 Jun 2022 at 19:07, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
>> +void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn,
>> + void *vpm, void *vst, uint32_t desc)
>> +{
>> + intptr_t row, col, oprsz = simd_maxsz(desc);
>> + uint32_t neg = simd_data(desc) << 31;
>> + uint16_t *pn = vpn, *pm = vpm;
>> +
>> + bool save_dn = get_default_nan_mode(vst);
>> + set_default_nan_mode(true, vst);
>> +
>> + for (row = 0; row < oprsz; ) {
>> + uint16_t pa = pn[H2(row >> 4)];
>> + do {
>> + if (pa & 1) {
>> + void *vza_row = vza + row * sizeof(ARMVectorReg);
>> + uint32_t n = *(uint32_t *)(vzn + row) ^ neg;
>> +
>> + for (col = 0; col < oprsz; ) {
>> + uint16_t pb = pm[H2(col >> 4)];
>> + do {
>> + if (pb & 1) {
>> + uint32_t *a = vza_row + col;
>> + uint32_t *m = vzm + col;
>> + *a = float32_muladd(n, *m, *a, 0, vst);
>> + }
>> + col += 4;
>> + pb >>= 4;
>> + } while (col & 15);
>> + }
>> + }
>> + row += 4;
>> + pa >>= 4;
>> + } while (row & 15);
>> + }
>
> The code for the double version seems straightforward:
> row counts from 0 up to the number of rows, and we
> do something per row. Why is the single precision version
> doing something with an unrolled loop here? It's confusing
> that 'oprsz' in the two functions isn't the same thing --
> in the double version we divide by the element size, but
> here we don't.
It's all about the predicate addressing. For doubles, the bits are spaced 8 bits apart,
which makes it easy as you see. For singles, the bits are spaced 4 bits apart, which is
inconvenient. Anyway, just as over in sve_helper.c, I load uint16_t at a time and shift
to find each predicate bit.
So it's not unrolled, exactly. There's second loop over predicates. And since this is a
matrix op, we get loops nested 4 deep.
> The pseudocode says that we ignore floating point exceptions
> (ie do not accumulate them in the FPSR) -- it passes fpexc == false
> to FPMulAdd(). Don't we need to do something special to arrange
> for that ?
Oops, somewhere I read that as "do not trap" not "do not accumulate".
But R_TGSKG is very clear on this as accumulate.
r~
next prev parent reply other threads:[~2022-06-24 14:39 UTC|newest]
Thread overview: 81+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-20 17:51 [PATCH v3 00/51] target/arm: Scalable Matrix Extension Richard Henderson
2022-06-20 17:51 ` [PATCH v3 01/51] target/arm: Implement TPIDR2_EL0 Richard Henderson
2022-06-20 17:51 ` [PATCH v3 02/51] target/arm: Add SMEEXC_EL to TB flags Richard Henderson
2022-06-20 17:51 ` [PATCH v3 03/51] target/arm: Add syn_smetrap Richard Henderson
2022-06-20 17:51 ` [PATCH v3 04/51] target/arm: Add ARM_CP_SME Richard Henderson
2022-06-20 17:51 ` [PATCH v3 05/51] target/arm: Add SVCR Richard Henderson
2022-06-20 17:51 ` [PATCH v3 06/51] target/arm: Add SMCR_ELx Richard Henderson
2022-06-20 17:51 ` [PATCH v3 07/51] target/arm: Add SMIDR_EL1, SMPRI_EL1, SMPRIMAP_EL2 Richard Henderson
2022-06-20 17:51 ` [PATCH v3 08/51] target/arm: Add PSTATE.{SM,ZA} to TB flags Richard Henderson
2022-06-20 17:51 ` [PATCH v3 09/51] target/arm: Add the SME ZA storage to CPUARMState Richard Henderson
2022-06-21 20:24 ` Peter Maydell
2022-06-20 17:51 ` [PATCH v3 10/51] target/arm: Implement SMSTART, SMSTOP Richard Henderson
2022-06-20 17:51 ` [PATCH v3 11/51] target/arm: Move error for sve%d property to arm_cpu_sve_finalize Richard Henderson
2022-06-20 17:51 ` [PATCH v3 12/51] target/arm: Create ARMVQMap Richard Henderson
2022-06-20 17:51 ` [PATCH v3 13/51] target/arm: Generalize cpu_arm_{get,set}_vq Richard Henderson
2022-06-20 17:51 ` [PATCH v3 14/51] target/arm: Generalize cpu_arm_{get, set}_default_vec_len Richard Henderson
2022-06-20 17:51 ` [PATCH v3 15/51] target/arm: Move arm_cpu_*_finalize to internals.h Richard Henderson
2022-06-20 17:52 ` [PATCH v3 16/51] target/arm: Unexport aarch64_add_*_properties Richard Henderson
2022-06-20 17:52 ` [PATCH v3 17/51] target/arm: Add cpu properties for SME Richard Henderson
2022-06-21 17:13 ` Peter Maydell
2024-04-12 11:36 ` Peter Maydell
2024-04-12 16:17 ` Richard Henderson
2022-06-20 17:52 ` [PATCH v3 18/51] target/arm: Introduce sve_vqm1_for_el_sm Richard Henderson
2022-06-20 17:52 ` [PATCH v3 19/51] target/arm: Add SVL to TB flags Richard Henderson
2022-06-20 17:52 ` [PATCH v3 20/51] target/arm: Move pred_{full, gvec}_reg_{offset, size} to translate-a64.h Richard Henderson
2022-06-20 17:52 ` [PATCH v3 21/51] target/arm: Add infrastructure for disas_sme Richard Henderson
2022-06-20 17:52 ` [PATCH v3 22/51] target/arm: Trap AdvSIMD usage when Streaming SVE is active Richard Henderson
2022-06-24 15:30 ` Peter Maydell
2022-06-24 20:34 ` Richard Henderson
2022-06-24 21:38 ` Peter Maydell
2022-06-26 3:37 ` Richard Henderson
2022-06-20 17:52 ` [PATCH v3 23/51] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL Richard Henderson
2022-06-21 17:23 ` Peter Maydell
2022-06-22 0:58 ` Richard Henderson
2022-06-23 10:12 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 24/51] target/arm: Implement SME ZERO Richard Henderson
2022-06-21 20:07 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 25/51] target/arm: Implement SME MOVA Richard Henderson
2022-06-23 11:24 ` Peter Maydell
2022-06-23 14:44 ` Richard Henderson
2022-06-20 17:52 ` [PATCH v3 26/51] target/arm: Implement SME LD1, ST1 Richard Henderson
2022-06-23 11:41 ` Peter Maydell
2022-06-23 20:36 ` Richard Henderson
2022-06-24 10:05 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 27/51] target/arm: Export unpredicated ld/st from translate-sve.c Richard Henderson
2022-06-23 11:42 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 28/51] target/arm: Implement SME LDR, STR Richard Henderson
2022-06-23 11:46 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 29/51] target/arm: Implement SME ADDHA, ADDVA Richard Henderson
2022-06-23 12:04 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening) Richard Henderson
2022-06-24 12:31 ` Peter Maydell
2022-06-24 14:16 ` Richard Henderson [this message]
2022-06-20 17:52 ` [PATCH v3 31/51] target/arm: Implement BFMOPA, BFMOPS Richard Henderson
2022-06-20 17:52 ` [PATCH v3 32/51] target/arm: Implement FMOPA, FMOPS (widening) Richard Henderson
2022-06-20 17:52 ` [PATCH v3 33/51] target/arm: Implement SME integer outer product Richard Henderson
2022-06-24 12:39 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 34/51] target/arm: Implement PSEL Richard Henderson
2022-06-24 12:51 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 35/51] target/arm: Implement REVD Richard Henderson
2022-06-24 12:54 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 36/51] target/arm: Implement SCLAMP, UCLAMP Richard Henderson
2022-06-24 13:00 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 37/51] target/arm: Reset streaming sve state on exception boundaries Richard Henderson
2022-06-24 13:02 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 38/51] target/arm: Enable SME for -cpu max Richard Henderson
2022-06-24 13:03 ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 39/51] linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS Richard Henderson
2022-06-20 17:52 ` [PATCH v3 40/51] linux-user/aarch64: Reset PSTATE.SM on syscalls Richard Henderson
2022-06-20 17:52 ` [PATCH v3 41/51] linux-user/aarch64: Add SM bit to SVE signal context Richard Henderson
2022-06-20 17:52 ` [PATCH v3 42/51] linux-user/aarch64: Tidy target_restore_sigframe error return Richard Henderson
2022-06-20 17:52 ` [PATCH v3 43/51] linux-user/aarch64: Do not allow duplicate or short sve records Richard Henderson
2022-06-20 17:52 ` [PATCH v3 44/51] linux-user/aarch64: Verify extra record lock succeeded Richard Henderson
2022-06-20 17:52 ` [PATCH v3 45/51] linux-user/aarch64: Move sve record checks into restore Richard Henderson
2022-06-20 17:52 ` [PATCH v3 46/51] linux-user/aarch64: Implement SME signal handling Richard Henderson
2022-06-20 17:52 ` [PATCH v3 47/51] linux-user: Rename sve prctls Richard Henderson
2022-06-20 17:52 ` [PATCH v3 48/51] linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VL Richard Henderson
2022-06-20 17:52 ` [PATCH v3 49/51] target/arm: Only set ZEN in reset if SVE present Richard Henderson
2022-06-20 17:52 ` [PATCH v3 50/51] target/arm: Enable SME for user-only Richard Henderson
2022-06-20 17:52 ` [PATCH v3 51/51] linux-user/aarch64: Add SME related hwcap entries Richard Henderson
2022-06-24 15:02 ` [PATCH v3 00/51] target/arm: Scalable Matrix Extension Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c2eae981-55e4-0430-ee56-ac853cfc930d@linaro.org \
--to=richard.henderson@linaro.org \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).