Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: Anton Johansson <anjo@rev.ng>
Cc: qemu-devel@nongnu.org, ale@rev.ng, ltaylorsimpson@gmail.com,
	bcain@quicinc.com, philmd@linaro.org, alex.bennee@linaro.org
Subject: Re: [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations
Date: Tue, 3 Dec 2024 12:57:57 -0600	[thread overview]
Message-ID: <e4910c71-8220-404b-bb43-0b885914e183@linaro.org> (raw)
In-Reply-To: <v5pkpmxto7vtshg7a5mifaozrzn6n5d7raknvydad3oxk67jeu@i4jydb4wylpb>

On 12/3/24 12:08, Anton Johansson wrote:
> On 22/11/24, Richard Henderson wrote:
>> On 11/20/24 19:49, Anton Johansson wrote:
>>> Adds new functions to the gvec API for truncating, sign- or zero
>>> extending vector elements.  Currently implemented as helper functions,
>>> these may be mapped onto host vector instructions in the future.
>>>
>>> For the time being, allows translation of more complicated vector
>>> instructions by helper-to-tcg.
>>>
>>> Signed-off-by: Anton Johansson <anjo@rev.ng>
>>> ---
>>>    accel/tcg/tcg-runtime-gvec.c     | 41 +++++++++++++++++
>>>    accel/tcg/tcg-runtime.h          | 22 +++++++++
>>>    include/tcg/tcg-op-gvec-common.h | 18 ++++++++
>>>    tcg/tcg-op-gvec.c                | 78 ++++++++++++++++++++++++++++++++
>>>    4 files changed, 159 insertions(+)
>>>
>>> diff --git a/accel/tcg/tcg-runtime-gvec.c b/accel/tcg/tcg-runtime-gvec.c
>>> index afca89baa1..685c991e6a 100644
>>> --- a/accel/tcg/tcg-runtime-gvec.c
>>> +++ b/accel/tcg/tcg-runtime-gvec.c
>>> @@ -1569,3 +1569,44 @@ void HELPER(gvec_bitsel)(void *d, void *a, void *b, void *c, uint32_t desc)
>>>        }
>>>        clear_high(d, oprsz, desc);
>>>    }
>>> +
>>> +#define DO_SZ_OP1(NAME, DSTTY, SRCTY)                                      \
>>> +void HELPER(NAME)(void *d, void *a, uint32_t desc)                         \
>>> +{                                                                          \
>>> +    intptr_t oprsz = simd_oprsz(desc);                                     \
>>> +    intptr_t elsz = oprsz/sizeof(DSTTY);                                   \
>>> +    intptr_t i;                                                            \
>>> +                                                                           \
>>> +    for (i = 0; i < elsz; ++i) {                                           \
>>> +        SRCTY aa = *((SRCTY *) a + i);                                     \
>>> +        *((DSTTY *) d + i) = aa;                                           \
>>> +    }                                                                      \
>>> +    clear_high(d, oprsz, desc);                                            \
>>
>> This formulation is not valid.
>>
>> (1) Generic forms must *always* operate strictly on columns.  This
>> formulation is either expanding a narrow vector to a wider vector or
>> compressing a wider vector to a narrow vector.
>>
>> (2) This takes no care for byte ordering of the data between columns.  This
>> is where sticking strictly to columns helps, in that we can assume that data
>> is host-endian *within the column*, but we cannot assume anything about the
>> element indexing of ptr + i.
> 
> Concerning (1) and (2), is this a limitation imposed on generic vector
> ops. to simplify mapping to host vector instructions where
> padding/alignment of elements might differ?  From my understanding, the
> helper above should be fine since we can assume contiguous elements?

This is a limitation imposed on generic vector ops, because different target/arch/ 
represent their vectors in different ways.

For instance, Arm and RISC-V chunk the vector in to host-endian uint64_t, with the chunks 
indexed little-endian.  But PPC puts the entire 128-bit vector in host-endian bit 
ordering, so the uint64_t chunks are host-endian.

On a big-endian host, ptr+1 may be addressing element i-1 or i-7 instead of i+1.

> I see, I don't think we can make this work for Hexagon vector ops., as
> an example consider V6_vadduwsat which performs an unsigned saturated
> add of 32-bit elements, currently we emit
> 
>      void emit_V6_vadduwsat(intptr_t vec2, intptr_t vec7, intptr_t vec6) {
>          VectorMem mem = {0};
>          intptr_t vec5 = temp_new_gvec(&mem, 256);
>          tcg_gen_gvec_zext(MO_64, MO_32, vec5, vec7, 256, 128, 256);
> 
>          intptr_t vec1 = temp_new_gvec(&mem, 256);
>          tcg_gen_gvec_zext(MO_64, MO_32, vec1, vec6, 256, 128, 256);
> 
>          tcg_gen_gvec_add(MO_64, vec1, vec1, vec5, 256, 256);
> 
>          intptr_t vec3 = temp_new_gvec(&mem, 256);
>          tcg_gen_gvec_dup_imm(MO_64, vec3, 256, 256, 4294967295ull);
> 
>          tcg_gen_gvec_umin(MO_64, vec1, vec1, vec3, 256, 256);
> 
>          tcg_gen_gvec_trunc(MO_32, MO_64, vec2, vec1, 128, 256, 128);
>      }
> 
> so we really do rely on the size-changing property of zext here, the
> input vectors are 128-byte and we expand them to 256-byte.  We could
> expand vector operations within the instruction to the largest vector
> size, but would need to zext and trunc to destination and source
> registers anyway.
Yes, well, this is the output of llvm though, yes?

Did you forget to describe TCG's native saturating operations to the compiler? 
tcg_gen_gvec_usadd performs exactly this operation.

And if you'd like to improve llvm, usadd(a, b) equals umin(a, ~b) + b.
Fewer operations without having to change vector sizes.
Similarly for unsigned saturating subtract: ussub(a, b) equals umax(a, b) - b.


r~

next prev parent reply	other threads:[~2024-12-03 18:58 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-21  1:49 [RFC PATCH v1 00/43] Introduce helper-to-tcg Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 01/43] Add option to enable/disable helper-to-tcg Anton Johansson via
2024-11-22 17:30   ` Richard Henderson
2024-11-22 18:23     ` Paolo Bonzini
2024-12-03 19:05       ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 02/43] accel/tcg: Add bitreverse and funnel-shift runtime helper functions Anton Johansson via
2024-11-22 17:35   ` Richard Henderson
2024-12-03 17:50     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 03/43] accel/tcg: Add gvec size changing operations Anton Johansson via
2024-11-22 17:50   ` Richard Henderson
2024-12-03 18:08     ` Anton Johansson via
2024-12-03 18:57       ` Richard Henderson [this message]
2024-12-03 20:15         ` Anton Johansson via
2024-12-03 21:14           ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 04/43] tcg: Add gvec functions for creating consant vectors Anton Johansson via
2024-11-22 18:00   ` Richard Henderson
2024-12-03 18:19     ` Anton Johansson via
2024-12-03 19:03       ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 05/43] tcg: Add helper function dispatcher and hook tcg_gen_callN Anton Johansson via
2024-11-22 18:04   ` Richard Henderson
2024-12-03 18:45     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 06/43] tcg: Introduce tcg-global-mappings Anton Johansson via
2024-11-22 19:14   ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 07/43] tcg: Increase maximum TB size and maximum temporaries Anton Johansson via
2024-11-22 18:11   ` Richard Henderson
2024-11-21  1:49 ` [RFC PATCH v1 08/43] include/helper-to-tcg: Introduce annotate.h Anton Johansson via
2024-11-22 18:12   ` Richard Henderson
2024-11-25 11:27     ` Philippe Mathieu-Daudé
2024-12-03 19:00       ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 09/43] helper-to-tcg: Introduce get-llvm-ir.py Anton Johansson via
2024-11-22 18:14   ` Richard Henderson
2024-12-03 18:49     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 10/43] helper-to-tcg: Add meson.build Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 11/43] helper-to-tcg: Introduce llvm-compat Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 12/43] helper-to-tcg: Introduce custom LLVM pipeline Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 13/43] helper-to-tcg: Introduce Error.h Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 14/43] helper-to-tcg: Introduce PrepareForOptPass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 15/43] helper-to-tcg: PrepareForOptPass, map annotations Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 16/43] helper-to-tcg: PrepareForOptPass, Cull unused functions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 17/43] helper-to-tcg: PrepareForOptPass, undef llvm.returnaddress Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 18/43] helper-to-tcg: PrepareForOptPass, Remove noinline attribute Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 19/43] helper-to-tcg: Pipeline, run optimization pass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 20/43] helper-to-tcg: Introduce pseudo instructions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 21/43] helper-to-tcg: Introduce PrepareForTcgPass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 22/43] helper-to-tcg: PrepareForTcgPass, remove functions w. cycles Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 23/43] helper-to-tcg: PrepareForTcgPass, demote phi nodes Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 24/43] helper-to-tcg: PrepareForTcgPass, map TCG globals Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 25/43] helper-to-tcg: PrepareForTcgPass, transform GEPs Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 26/43] helper-to-tcg: PrepareForTcgPass, canonicalize IR Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 27/43] helper-to-tcg: PrepareForTcgPass, identity map trivial expressions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 28/43] helper-to-tcg: Introduce TcgType.h Anton Johansson via
2024-11-22 18:26   ` Richard Henderson
2024-12-03 18:50     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 29/43] helper-to-tcg: Introduce TCG register allocation Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 30/43] helper-to-tcg: TcgGenPass, introduce TcgEmit.[cpp|h] Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 31/43] helper-to-tcg: Introduce TcgGenPass Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 32/43] helper-to-tcg: Add README Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 33/43] helper-to-tcg: Add end-to-end tests Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 34/43] target/hexagon: Add get_tb_mmu_index() Anton Johansson via
2024-11-22 18:34   ` Richard Henderson
2024-12-03 18:50     ` Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 35/43] target/hexagon: Use argparse in all python scripts Anton Johansson via
2024-12-05 15:23   ` Brian Cain
2024-11-21  1:49 ` [RFC PATCH v1 36/43] target/hexagon: Add temporary vector storage Anton Johansson via
2024-11-22 18:35   ` Richard Henderson
2024-12-03 18:56     ` Anton Johansson via
2024-12-03 20:28       ` Brian Cain
2024-12-04  0:37         ` ltaylorsimpson
2024-11-21  1:49 ` [RFC PATCH v1 37/43] target/hexagon: Make HVX vector args. restrict * Anton Johansson via
2024-11-25 11:36   ` Philippe Mathieu-Daudé
2024-11-25 12:00     ` Paolo Bonzini
2024-12-03 18:57       ` Anton Johansson via
2024-12-03 18:58         ` Brian Cain
2024-11-21  1:49 ` [RFC PATCH v1 38/43] target/hexagon: Use cpu_mapping to map env -> TCG Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 39/43] target/hexagon: Keep gen_slotval/check_noshuf for helper-to-tcg Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 40/43] target/hexagon: Emit annotations for helpers Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 41/43] target/hexagon: Manually call generated HVX instructions Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 42/43] target/hexagon: Only translate w. idef-parser if helper-to-tcg failed Anton Johansson via
2024-11-21  1:49 ` [RFC PATCH v1 43/43] target/hexagon: Use helper-to-tcg Anton Johansson via
2024-11-25 11:34 ` [RFC PATCH v1 00/43] Introduce helper-to-tcg Philippe Mathieu-Daudé
2024-12-03 18:58   ` Anton Johansson via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4910c71-8220-404b-bb43-0b885914e183@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=ale@rev.ng \
    --cc=alex.bennee@linaro.org \
    --cc=anjo@rev.ng \
    --cc=bcain@quicinc.com \
    --cc=ltaylorsimpson@gmail.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).