Re: [PATCH 4/4] target/i386: implement FMA instructions

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Richard Henderson <richard.henderson@linaro.org>, qemu-devel@nongnu.org
Subject: Re: [PATCH 4/4] target/i386: implement FMA instructions
Date: Thu, 20 Oct 2022 15:23:22 +0200	[thread overview]
Message-ID: <c729c3ec-5538-d9d4-6eec-3c7ee3d43f31@redhat.com> (raw)
In-Reply-To: <ce91008a-d9d3-853f-fed1-e064e5d49d9a@linaro.org>

On 10/20/22 05:02, Richard Henderson wrote:
> On 10/20/22 01:06, Paolo Bonzini wrote:
>> The only issue with FMA instructions is that there are _a lot_ of them
>> (30 opcodes, each of which comes in up to 4 versions depending on VEX.W
>> and VEX.L).
>>
>> We can reduce the number of helpers to one third by passing four operands
>> (one output and three inputs); the reordering of which operands go to
>> the multiply and which go to the add is done in emit.c.
>>
>> Scalar versions do not do any merging; they only affect the bottom 32
>> or 64 bits of the output operand.  Therefore, there is no separate XMM
>> and YMM of the scalar helpers.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>   target/i386/cpu.c                |  5 ++-
>>   target/i386/ops_sse.h            | 63 ++++++++++++++++++++++++++++++++
>>   target/i386/ops_sse_header.h     | 28 ++++++++++++++
>>   target/i386/tcg/decode-new.c.inc | 38 +++++++++++++++++++
>>   target/i386/tcg/decode-new.h     |  1 +
>>   target/i386/tcg/emit.c.inc       | 43 ++++++++++++++++++++++
>>   tests/tcg/i386/test-avx.py       |  2 +-
>>   7 files changed, 177 insertions(+), 3 deletions(-)
>>
>> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
>> index 6292b7e12f..22b681ca37 100644
>> --- a/target/i386/cpu.c
>> +++ b/target/i386/cpu.c
>> @@ -625,10 +625,11 @@ void x86_cpu_vendor_words2str(char *dst, 
>> uint32_t vendor1,
>>             CPUID_EXT_SSE41 | CPUID_EXT_SSE42 | CPUID_EXT_POPCNT | \
>>             CPUID_EXT_XSAVE | /* CPUID_EXT_OSXSAVE is dynamic */   \
>>             CPUID_EXT_MOVBE | CPUID_EXT_AES | CPUID_EXT_HYPERVISOR | \
>> -          CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C)
>> +          CPUID_EXT_RDRAND | CPUID_EXT_AVX | CPUID_EXT_F16C | \
>> +          CPUID_EXT_FMA)
>>             /* missing:
>>             CPUID_EXT_DTES64, CPUID_EXT_DSCPL, CPUID_EXT_VMX, 
>> CPUID_EXT_SMX,
>> -          CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID, CPUID_EXT_FMA,
>> +          CPUID_EXT_EST, CPUID_EXT_TM2, CPUID_EXT_CID,
>>             CPUID_EXT_XTPR, CPUID_EXT_PDCM, CPUID_EXT_PCID, 
>> CPUID_EXT_DCA,
>>             CPUID_EXT_X2APIC, CPUID_EXT_TSC_DEADLINE_TIMER */
>> diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
>> index 33c61896ee..041a048a70 100644
>> --- a/target/i386/ops_sse.h
>> +++ b/target/i386/ops_sse.h
>> @@ -2522,6 +2522,69 @@ void helper_vpermd_ymm(Reg *d, Reg *v, Reg *s)
>>   }
>>   #endif
>> +/* FMA3 op helpers */
>> +#if SHIFT == 1
>> +#define SSE_HELPER_FMAS(name, elem, 
>> F)                                         \
>> +    void name(CPUX86State *env, Reg *d, Reg *a, Reg *b, Reg 
>> *c)                \
>> +    
>> {                                                                          \
>> +        d->elem(0) = F(a->elem(0), b->elem(0), 
>> c->elem(0));                    \
>> +    }
>> +#define SSE_HELPER_FMAP(name, elem, num, 
>> F)                                    \
>> +    void glue(name, SUFFIX)(CPUX86State *env, Reg *d, Reg *a, Reg *b, 
>> Reg *c)  \
>> +    
>> {                                                                          \
>> +        int 
>> i;                                                                 \
>> +        for (i = 0; i < num; i++) 
>> {                                            \
>> +            d->elem(i) = F(a->elem(i), b->elem(i), 
>> c->elem(i));                \
>> +        
>> }                                                                      \
>> +    }
>> +
>> +#define FMADD32(a, b, c) float32_muladd(a, b, c, 0, &env->sse_status)
>> +#define FMADD64(a, b, c) float64_muladd(a, b, c, 0, &env->sse_status)
>> +
>> +#define FMNADD32(a, b, c) float32_muladd(a, b, c, float_muladd_negate_product, &env->sse_status)
>> +#define FMNADD64(a, b, c) float64_muladd(a, b, c, float_muladd_negate_product, &env->sse_status)
>> [...]
>
> Would it be worth passing the muladd constant(s) as a parameter to a 
> reduced number of helper functions?

Sure, will do that.

> void fmas_name(..., int flags)
> {
>      d = type_muladd(a, b, c, flags, status);
> }
> 
> void fmap_name(..., int flags2)
> {
>      int f_even = flags2 & 0xf;
>      int f_odd = flags2 >> 4;
>      for (int i = 0; i < num; ) {
>         d(i) = type_muladd(a(i), b(i), c(i), f_even, status);
>         i++;
>         d(i) = type_muladd(a(i), b(i), c(i), f_odd, status);
>         i++;
>      }

Another possibility is to add two arguments for even and odd.

>> +FMA_SSE(VFNMSUB231,   fmnsub,   OP_PTR1, OP_PTR2, OP_PTR0)
>> +FMA_SSE(VFNMSUB213,   fmnsub,   OP_PTR1, OP_PTR0, OP_PTR2)
>> +FMA_SSE(VFNMSUB132,   fmnsub,   OP_PTR0, OP_PTR2, OP_PTR1)
>> +
>> +FMA_SSE_PACKED(VFMSUBADD231, fmsubadd, OP_PTR1, OP_PTR2, OP_PTR0)
>> +FMA_SSE_PACKED(VFMSUBADD213, fmsubadd, OP_PTR1, OP_PTR0, OP_PTR2)
>> +FMA_SSE_PACKED(VFMSUBADD132, fmsubadd, OP_PTR0, OP_PTR2, OP_PTR1)
> 
> Is it more or less confusing to macroize this further?

I think more. :)

Paolo

     prev parent reply	other threads:[~2022-10-20 16:28 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 15:06 [PATCH 0/4] target/i386: support x86_64-v3 for user mode applications Paolo Bonzini
2022-10-19 15:06 ` [PATCH 1/4] target/i386: decode-new: avoid out-of-bounds access to xmm_regs[-1] Paolo Bonzini
2022-10-19 19:47   ` Philippe Mathieu-Daudé
2022-10-20  2:21   ` Richard Henderson
2022-10-19 15:06 ` [PATCH 2/4] target/i386: introduce function to set rounding mode from FPCW or MXCSR bits Paolo Bonzini
2022-10-19 19:41   ` Philippe Mathieu-Daudé
2022-10-20  2:22   ` Richard Henderson
2022-10-19 15:06 ` [PATCH 3/4] target/i386: implement F16C instructions Paolo Bonzini
2022-10-20  2:31   ` Richard Henderson
2022-10-19 15:06 ` [PATCH 4/4] target/i386: implement FMA instructions Paolo Bonzini
2022-10-20  3:02   ` Richard Henderson
2022-10-20 13:23     ` Paolo Bonzini [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c729c3ec-5538-d9d4-6eec-3c7ee3d43f31@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).