From: Richard Henderson <richard.henderson@linaro.org>
To: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org
Subject: Re: [PATCH v4 20/45] target/arm: Implement SME LD1, ST1
Date: Tue, 5 Jul 2022 07:19:39 +0530 [thread overview]
Message-ID: <52d072c4-b6e7-c413-b15e-3aae358b4b00@linaro.org> (raw)
In-Reply-To: <CAFEAcA_qDAR9YmXdZ0wuVVkaBeV3ArGzknjLkU=BqJn7hBAbjw@mail.gmail.com>
On 7/4/22 16:09, Peter Maydell wrote:
>> +static void clear_vertical_h(void *vptr, size_t off, size_t len)
>> +{
>> + uint16_t *ptr = vptr;
>> + size_t i;
>> +
>> + for (i = 0; i < len / 2; ++i) {
>> + ptr[(i + off) * sizeof(ARMVectorReg)] = 0;
>> + }
>
> The code for the bigger-than-byte vertical actions seems wrong:
> because 'ptr' is a uint16_t here this expression is mixing
> byte offsets (off, the multiply by sizeof(ARMVectorReg) with
> halfword offsets (i, the fact we're calculating an index value
> for a uint16_t array).
I agree these are wrong, because they mix 'i' as index and 'off' as byte offset. I think
the correct addressing is always the same as byte addressing. I.e.
for (i = 0; i < len; i += N) {
uintN_t *ptr = vptr + (i + off) * sizeof(ARMVectorReg);
*ptr = 0;
}
so that every iteration advances N rows and writes N bytes.
>> +static void copy_vertical_h(void *vdst, const void *vsrc, size_t len)
>> +{
>> + const uint16_t *src = vsrc;
>> + uint16_t *dst = vdst;
>> + size_t i;
>> +
>> + for (i = 0; i < len / 2; ++i) {
>> + dst[i * sizeof(ARMVectorReg)] = src[i];
>
> Similarly the array index calculation for dst[] here looks wrong.
I don't think so in this case. I'm not mixing indexes and byte offsets like I was above.
Recall that the next vertical tile element is not in the next physical row, but in the Nth
physical row. Therefore there are always sizeof(ARMVectorReg) elements in the host layout
between vertical tile elements.
I agree it looks strange.
>> +#define DO_LD(NAME, TYPE, HOST, TLB) \
>> +static inline void sme_##NAME##_v_host(void *za, intptr_t off, void *host) \
>> +{ \
>> + TYPE val = HOST(host); \
>> + *(TYPE *)(za + off * sizeof(ARMVectorReg)) = val; \
>> +} \
>> +static inline void sme_##NAME##_v_tlb(CPUARMState *env, void *za, \
>> + intptr_t off, target_ulong addr, uintptr_t ra) \
>> +{ \
>> + TYPE val = TLB(env, useronly_clean_ptr(addr), ra); \
>> + *(TYPE *)(za + off * sizeof(ARMVectorReg)) = val; \
>> +}
>
> So in these functions is 'za' pre-adjusted to the start address of the
> vertical column?
Yes. That's true of all of these routines, and what I compute in get_tile_colrow.
> Is 'off' a byte offset here
Yes.
> (in which case the arithmetic is wrong for anything except byte columns)
I don't think so in this case. This is all byte arithmetic. Just as for copy_vertical_*,
there are N rows between elements.
Consider a vertical tile slice of uint64_t:
Element 0 is off=0 -> za + row 0.
Element 1 is off=8 -> za + row 8.
>> + tcg_gen_shli_i64(addr, cpu_reg(s, a->rm), a->esz);
>> + tcg_gen_add_i64(addr, addr, cpu_reg_sp(s, a->rn));
>
> Theoretically we ought to call gen_check_sp_alignment() here
> for rn == 31, but I guess we didn't do that for any of the
> non-base-A64 instructions like SVE either.
Oh yeah. Some day we should make gen_check_sp_alignment do something too.
r~
next prev parent reply other threads:[~2022-07-05 1:50 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-28 4:20 [PATCH v4 00/45] target/arm: Scalable Matrix Extension Richard Henderson
2022-06-28 4:20 ` [PATCH v4 01/45] target/arm: Handle SME in aarch64_cpu_dump_state Richard Henderson
2022-07-01 10:11 ` Peter Maydell
2022-07-03 8:43 ` Richard Henderson
2022-06-28 4:20 ` [PATCH v4 02/45] target/arm: Add infrastructure for disas_sme Richard Henderson
2022-06-28 4:20 ` [PATCH v4 03/45] target/arm: Trap non-streaming usage when Streaming SVE is active Richard Henderson
2022-07-01 11:06 ` Peter Maydell
2022-07-04 8:28 ` Richard Henderson
2022-07-04 8:33 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 04/45] target/arm: Mark ADR as non-streaming Richard Henderson
2022-07-01 11:11 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 05/45] target/arm: Mark RDFFR, WRFFR, SETFFR " Richard Henderson
2022-07-01 11:15 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 06/45] target/arm: Mark BDEP, BEXT, BGRP, COMPACT, FEXPA, FTSSEL " Richard Henderson
2022-07-01 12:14 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 07/45] target/arm: Mark PMULL, FMMLA " Richard Henderson
2022-07-01 12:18 ` Peter Maydell
2022-07-04 8:48 ` Richard Henderson
2022-06-28 4:20 ` [PATCH v4 08/45] target/arm: Mark FTSMUL, FTMAD, FADDA " Richard Henderson
2022-07-01 12:21 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 09/45] target/arm: Mark SMMLA, UMMLA, USMMLA " Richard Henderson
2022-07-01 12:22 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 10/45] target/arm: Mark string/histo/crypto " Richard Henderson
2022-07-01 12:25 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 11/45] target/arm: Mark gather/scatter load/store " Richard Henderson
2022-07-01 12:29 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 12/45] target/arm: Mark gather prefetch " Richard Henderson
2022-07-01 12:31 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 13/45] target/arm: Mark LDFF1 and LDNF1 " Richard Henderson
2022-07-01 12:33 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 14/45] target/arm: Mark LD1RO " Richard Henderson
2022-07-01 13:00 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 15/45] target/arm: Add SME enablement checks Richard Henderson
2022-07-01 13:05 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 16/45] target/arm: Handle SME in sve_access_check Richard Henderson
2022-07-01 13:07 ` Peter Maydell
2022-06-28 4:20 ` [PATCH v4 17/45] target/arm: Implement SME RDSVL, ADDSVL, ADDSPL Richard Henderson
2022-06-28 4:20 ` [PATCH v4 18/45] target/arm: Implement SME ZERO Richard Henderson
2022-06-28 4:20 ` [PATCH v4 19/45] target/arm: Implement SME MOVA Richard Henderson
2022-07-01 16:19 ` Peter Maydell
2022-07-04 9:08 ` Richard Henderson
2022-07-04 9:31 ` Peter Maydell
2022-07-04 9:43 ` Richard Henderson
2022-06-28 4:20 ` [PATCH v4 20/45] target/arm: Implement SME LD1, ST1 Richard Henderson
2022-07-04 10:39 ` Peter Maydell
2022-07-05 1:49 ` Richard Henderson [this message]
2022-07-05 10:48 ` Peter Maydell
2022-07-05 11:21 ` Richard Henderson
2022-06-28 4:20 ` [PATCH v4 21/45] target/arm: Export unpredicated ld/st from translate-sve.c Richard Henderson
2022-06-28 4:20 ` [PATCH v4 22/45] target/arm: Implement SME LDR, STR Richard Henderson
2022-06-28 4:20 ` [PATCH v4 23/45] target/arm: Implement SME ADDHA, ADDVA Richard Henderson
2022-07-04 10:50 ` Peter Maydell
2022-07-05 2:05 ` Richard Henderson
2022-06-28 4:20 ` [PATCH v4 24/45] target/arm: Implement FMOPA, FMOPS (non-widening) Richard Henderson
2022-06-28 4:20 ` [PATCH v4 25/45] target/arm: Implement BFMOPA, BFMOPS Richard Henderson
2022-06-28 4:20 ` [PATCH v4 26/45] target/arm: Implement FMOPA, FMOPS (widening) Richard Henderson
2022-06-28 4:20 ` [PATCH v4 27/45] target/arm: Implement SME integer outer product Richard Henderson
2022-06-28 4:21 ` [PATCH v4 28/45] target/arm: Implement PSEL Richard Henderson
2022-06-28 4:21 ` [PATCH v4 29/45] target/arm: Implement REVD Richard Henderson
2022-06-28 4:21 ` [PATCH v4 30/45] target/arm: Implement SCLAMP, UCLAMP Richard Henderson
2022-06-28 4:21 ` [PATCH v4 31/45] target/arm: Reset streaming sve state on exception boundaries Richard Henderson
2022-06-28 4:21 ` [PATCH v4 32/45] target/arm: Enable SME for -cpu max Richard Henderson
2022-06-28 4:21 ` [PATCH v4 33/45] linux-user/aarch64: Clear tpidr2_el0 if CLONE_SETTLS Richard Henderson
2022-07-04 11:45 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 34/45] linux-user/aarch64: Reset PSTATE.SM on syscalls Richard Henderson
2022-07-04 11:50 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 35/45] linux-user/aarch64: Add SM bit to SVE signal context Richard Henderson
2022-07-04 12:02 ` Peter Maydell
2022-07-05 3:24 ` Richard Henderson
2022-06-28 4:21 ` [PATCH v4 36/45] linux-user/aarch64: Tidy target_restore_sigframe error return Richard Henderson
2022-07-04 12:04 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 37/45] linux-user/aarch64: Do not allow duplicate or short sve records Richard Henderson
2022-07-04 12:08 ` Peter Maydell
2022-07-05 3:27 ` Richard Henderson
2022-07-05 3:30 ` Richard Henderson
2022-07-05 3:32 ` Richard Henderson
2022-06-28 4:21 ` [PATCH v4 38/45] linux-user/aarch64: Verify extra record lock succeeded Richard Henderson
2022-07-04 12:11 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 39/45] linux-user/aarch64: Move sve record checks into restore Richard Henderson
2022-07-04 12:15 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 40/45] linux-user/aarch64: Implement SME signal handling Richard Henderson
2022-07-04 13:05 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 41/45] linux-user: Rename sve prctls Richard Henderson
2022-07-04 12:16 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 42/45] linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VL Richard Henderson
2022-07-04 12:20 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 43/45] target/arm: Only set ZEN in reset if SVE present Richard Henderson
2022-07-04 12:21 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 44/45] target/arm: Enable SME for user-only Richard Henderson
2022-07-04 12:22 ` Peter Maydell
2022-06-28 4:21 ` [PATCH v4 45/45] linux-user/aarch64: Add SME related hwcap entries Richard Henderson
2022-07-04 12:24 ` Peter Maydell
2022-07-04 13:09 ` [PATCH v4 00/45] target/arm: Scalable Matrix Extension Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=52d072c4-b6e7-c413-b15e-3aae358b4b00@linaro.org \
--to=richard.henderson@linaro.org \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).