qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Richard Henderson <richard.henderson@linaro.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org
Subject: Re: [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening)
Date: Fri, 24 Jun 2022 07:16:57 -0700	[thread overview]
Message-ID: <c2eae981-55e4-0430-ee56-ac853cfc930d@linaro.org> (raw)
In-Reply-To: <CAFEAcA-y99PmUdPbdrWSj=_vUy35tRRFOJgkG2Lyg1A_iK6qRQ@mail.gmail.com>

On 6/24/22 05:31, Peter Maydell wrote:
> On Mon, 20 Jun 2022 at 19:07, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
>> +void HELPER(sme_fmopa_s)(void *vza, void *vzn, void *vzm, void *vpn,
>> +                         void *vpm, void *vst, uint32_t desc)
>> +{
>> +    intptr_t row, col, oprsz = simd_maxsz(desc);
>> +    uint32_t neg = simd_data(desc) << 31;
>> +    uint16_t *pn = vpn, *pm = vpm;
>> +
>> +    bool save_dn = get_default_nan_mode(vst);
>> +    set_default_nan_mode(true, vst);
>> +
>> +    for (row = 0; row < oprsz; ) {
>> +        uint16_t pa = pn[H2(row >> 4)];
>> +        do {
>> +            if (pa & 1) {
>> +                void *vza_row = vza + row * sizeof(ARMVectorReg);
>> +                uint32_t n = *(uint32_t *)(vzn + row) ^ neg;
>> +
>> +                for (col = 0; col < oprsz; ) {
>> +                    uint16_t pb = pm[H2(col >> 4)];
>> +                    do {
>> +                        if (pb & 1) {
>> +                            uint32_t *a = vza_row + col;
>> +                            uint32_t *m = vzm + col;
>> +                            *a = float32_muladd(n, *m, *a, 0, vst);
>> +                        }
>> +                        col += 4;
>> +                        pb >>= 4;
>> +                    } while (col & 15);
>> +                }
>> +            }
>> +            row += 4;
>> +            pa >>= 4;
>> +        } while (row & 15);
>> +    }
> 
> The code for the double version seems straightforward:
> row counts from 0 up to the number of rows, and we
> do something per row. Why is the single precision version
> doing something with an unrolled loop here? It's confusing
> that 'oprsz' in the two functions isn't the same thing --
> in the double version we divide by the element size, but
> here we don't.

It's all about the predicate addressing.  For doubles, the bits are spaced 8 bits apart, 
which makes it easy as you see.  For singles, the bits are spaced 4 bits apart, which is 
inconvenient.  Anyway, just as over in sve_helper.c, I load uint16_t at a time and shift 
to find each predicate bit.

So it's not unrolled, exactly.  There's second loop over predicates.  And since this is a 
matrix op, we get loops nested 4 deep.

> The pseudocode says that we ignore floating point exceptions
> (ie do not accumulate them in the FPSR) -- it passes fpexc == false
> to FPMulAdd(). Don't we need to do something special to arrange
> for that ?

Oops, somewhere I read that as "do not trap" not "do not accumulate".
But R_TGSKG is very clear on this as accumulate.


r~



  reply	other threads:[~2022-06-24 14:39 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 17:51 [PATCH v3 00/51] target/arm: Scalable Matrix Extension Richard Henderson
2022-06-20 17:51 ` [PATCH v3 01/51] target/arm: Implement TPIDR2_EL0 Richard Henderson
2022-06-20 17:51 ` [PATCH v3 02/51] target/arm: Add SMEEXC_EL to TB flags Richard Henderson
2022-06-20 17:51 ` [PATCH v3 03/51] target/arm: Add syn_smetrap Richard Henderson
2022-06-20 17:51 ` [PATCH v3 04/51] target/arm: Add ARM_CP_SME Richard Henderson
2022-06-20 17:51 ` [PATCH v3 05/51] target/arm: Add SVCR Richard Henderson
2022-06-20 17:51 ` [PATCH v3 06/51] target/arm: Add SMCR_ELx Richard Henderson
2022-06-20 17:51 ` [PATCH v3 07/51] target/arm: Add SMIDR_EL1, SMPRI_EL1, SMPRIMAP_EL2 Richard Henderson
2022-06-20 17:51 ` [PATCH v3 08/51] target/arm: Add PSTATE.{SM,ZA} to TB flags Richard Henderson
2022-06-20 17:51 ` [PATCH v3 09/51] target/arm: Add the SME ZA storage to CPUARMState Richard Henderson
2022-06-21 20:24   ` Peter Maydell
2022-06-20 17:51 ` [PATCH v3 10/51] target/arm: Implement SMSTART, SMSTOP Richard Henderson
2022-06-20 17:51 ` [PATCH v3 11/51] target/arm: Move error for sve%d property to arm_cpu_sve_finalize Richard Henderson
2022-06-20 17:51 ` [PATCH v3 12/51] target/arm: Create ARMVQMap Richard Henderson
2022-06-20 17:51 ` [PATCH v3 13/51] target/arm: Generalize cpu_arm_{get,set}_vq Richard Henderson
2022-06-20 17:51 ` [PATCH v3 14/51] target/arm: Generalize cpu_arm_{get, set}_default_vec_len Richard Henderson
2022-06-20 17:51 ` [PATCH v3 15/51] target/arm: Move arm_cpu_*_finalize to internals.h Richard Henderson
2022-06-20 17:52 ` [PATCH v3 16/51] target/arm: Unexport aarch64_add_*_properties Richard Henderson
2022-06-20 17:52 ` [PATCH v3 17/51] target/arm: Add cpu properties for SME Richard Henderson
2022-06-21 17:13   ` Peter Maydell
2024-04-12 11:36   ` Peter Maydell
2024-04-12 16:17     ` Richard Henderson
2022-06-20 17:52 ` [PATCH v3 18/51] target/arm: Introduce sve_vqm1_for_el_sm Richard Henderson
2022-06-20 17:52 ` [PATCH v3 19/51] target/arm: Add SVL to TB flags Richard Henderson
2022-06-20 17:52 ` [PATCH v3 20/51] target/arm: Move pred_{full, gvec}_reg_{offset, size} to translate-a64.h Richard Henderson
2022-06-20 17:52 ` [PATCH v3 21/51] target/arm: Add infrastructure for disas_sme Richard Henderson
2022-06-20 17:52 ` [PATCH v3 22/51] target/arm: Trap AdvSIMD usage when Streaming SVE is active Richard Henderson
2022-06-24 15:30   ` Peter Maydell
2022-06-24 20:34     ` Richard Henderson
2022-06-24 21:38       ` Peter Maydell
2022-06-26  3:37         ` Richard Henderson
2022-06-20 17:52 ` [PATCH v3 23/51] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL Richard Henderson
2022-06-21 17:23   ` Peter Maydell
2022-06-22  0:58     ` Richard Henderson
2022-06-23 10:12       ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 24/51] target/arm: Implement SME ZERO Richard Henderson
2022-06-21 20:07   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 25/51] target/arm: Implement SME MOVA Richard Henderson
2022-06-23 11:24   ` Peter Maydell
2022-06-23 14:44     ` Richard Henderson
2022-06-20 17:52 ` [PATCH v3 26/51] target/arm: Implement SME LD1, ST1 Richard Henderson
2022-06-23 11:41   ` Peter Maydell
2022-06-23 20:36     ` Richard Henderson
2022-06-24 10:05       ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 27/51] target/arm: Export unpredicated ld/st from translate-sve.c Richard Henderson
2022-06-23 11:42   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 28/51] target/arm: Implement SME LDR, STR Richard Henderson
2022-06-23 11:46   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 29/51] target/arm: Implement SME ADDHA, ADDVA Richard Henderson
2022-06-23 12:04   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 30/51] target/arm: Implement FMOPA, FMOPS (non-widening) Richard Henderson
2022-06-24 12:31   ` Peter Maydell
2022-06-24 14:16     ` Richard Henderson [this message]
2022-06-20 17:52 ` [PATCH v3 31/51] target/arm: Implement BFMOPA, BFMOPS Richard Henderson
2022-06-20 17:52 ` [PATCH v3 32/51] target/arm: Implement FMOPA, FMOPS (widening) Richard Henderson
2022-06-20 17:52 ` [PATCH v3 33/51] target/arm: Implement SME integer outer product Richard Henderson
2022-06-24 12:39   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 34/51] target/arm: Implement PSEL Richard Henderson
2022-06-24 12:51   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 35/51] target/arm: Implement REVD Richard Henderson
2022-06-24 12:54   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 36/51] target/arm: Implement SCLAMP, UCLAMP Richard Henderson
2022-06-24 13:00   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 37/51] target/arm: Reset streaming sve state on exception boundaries Richard Henderson
2022-06-24 13:02   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 38/51] target/arm: Enable SME for -cpu max Richard Henderson
2022-06-24 13:03   ` Peter Maydell
2022-06-20 17:52 ` [PATCH v3 39/51] linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS Richard Henderson
2022-06-20 17:52 ` [PATCH v3 40/51] linux-user/aarch64: Reset PSTATE.SM on syscalls Richard Henderson
2022-06-20 17:52 ` [PATCH v3 41/51] linux-user/aarch64: Add SM bit to SVE signal context Richard Henderson
2022-06-20 17:52 ` [PATCH v3 42/51] linux-user/aarch64: Tidy target_restore_sigframe error return Richard Henderson
2022-06-20 17:52 ` [PATCH v3 43/51] linux-user/aarch64: Do not allow duplicate or short sve records Richard Henderson
2022-06-20 17:52 ` [PATCH v3 44/51] linux-user/aarch64: Verify extra record lock succeeded Richard Henderson
2022-06-20 17:52 ` [PATCH v3 45/51] linux-user/aarch64: Move sve record checks into restore Richard Henderson
2022-06-20 17:52 ` [PATCH v3 46/51] linux-user/aarch64: Implement SME signal handling Richard Henderson
2022-06-20 17:52 ` [PATCH v3 47/51] linux-user: Rename sve prctls Richard Henderson
2022-06-20 17:52 ` [PATCH v3 48/51] linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VL Richard Henderson
2022-06-20 17:52 ` [PATCH v3 49/51] target/arm: Only set ZEN in reset if SVE present Richard Henderson
2022-06-20 17:52 ` [PATCH v3 50/51] target/arm: Enable SME for user-only Richard Henderson
2022-06-20 17:52 ` [PATCH v3 51/51] linux-user/aarch64: Add SME related hwcap entries Richard Henderson
2022-06-24 15:02 ` [PATCH v3 00/51] target/arm: Scalable Matrix Extension Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c2eae981-55e4-0430-ee56-ac853cfc930d@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).