Re: [RFC PATCH] softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Richard Henderson <richard.henderson@linaro.org>
To: "BALATON Zoltan" <balaton@eik.bme.hu>,
	"Alex Bennée" <alex.bennee@linaro.org>
Cc: qemu-devel@nongnu.org, Aurelien Jarno <aurelien@aurel32.net>,
	Peter Maydell <peter.maydell@linaro.org>
Subject: Re: [RFC PATCH] softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining
Date: Fri, 23 Jun 2023 07:50:39 +0200	[thread overview]
Message-ID: <644f6d2e-0c6c-e97e-6930-706d36af24f6@linaro.org> (raw)
In-Reply-To: <5082a19d-0fc2-a140-eeb7-8c608b33e410@eik.bme.hu>

On 6/22/23 22:55, BALATON Zoltan wrote:
> Hello,
> 
> What happened to this patch? Will this be merged by somebody?

Thanks for the reminder.  Queued to tcg-next.

r~

> 
> Regards,
> BALATON Zoltan
> 
> On Tue, 23 May 2023, BALATON Zoltan wrote:
>> On Tue, 23 May 2023, Alex Bennée wrote:
>>> Balton discovered that asserts for the extract/deposit calls had a
>>
>> Missing an a in my name and my given name is Zoltan. (First name and last name is in the 
>> other way in Hungarian.) Maybe just add a Reported-by instead of here if you want to 
>> record it.
>>
>>> significant impact on a lame benchmark on qemu-ppc. Replicating with:
>>>
>>>  ./qemu-ppc64 ~/lsrc/tests/lame.git-svn/builds/ppc64/frontend/lame \
>>>    -h pts-trondheim-3.wav pts-trondheim-3.mp3
>>>
>>> showed up the pack/unpack routines not eliding the assert checks as it
>>> should have done causing them to prominently figure in the profile:
>>>
>>>  11.44%  qemu-ppc64  qemu-ppc64               [.] unpack_raw64.isra.0
>>>  11.03%  qemu-ppc64  qemu-ppc64               [.] parts64_uncanon_normal
>>>   8.26%  qemu-ppc64  qemu-ppc64               [.] helper_compute_fprf_float64
>>>   6.75%  qemu-ppc64  qemu-ppc64               [.] do_float_check_status
>>>   5.34%  qemu-ppc64  qemu-ppc64               [.] parts64_muladd
>>>   4.75%  qemu-ppc64  qemu-ppc64               [.] pack_raw64.isra.0
>>>   4.38%  qemu-ppc64  qemu-ppc64               [.] parts64_canonicalize
>>>   3.62%  qemu-ppc64  qemu-ppc64               [.] float64r32_round_pack_canonical
>>>
>>> After this patch the same test runs 31 seconds faster with a profile
>>> where the generated code dominates more:
>>>
>>> +   14.12%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000619420
>>> +   13.30%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000616850
>>> +   12.58%    12.19%  qemu-ppc64  qemu-ppc64               [.] parts64_uncanon_normal
>>> +   10.62%     0.00%  qemu-ppc64  [unknown]                [.] 0x000000400061bf70
>>> +    9.91%     9.73%  qemu-ppc64  qemu-ppc64               [.] helper_compute_fprf_float64
>>> +    7.84%     7.82%  qemu-ppc64  qemu-ppc64               [.] do_float_check_status
>>> +    6.47%     5.78%  qemu-ppc64  qemu-ppc64               [.] 
>>> parts64_canonicalize.constprop.0
>>> +    6.46%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000620130
>>> +    6.42%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000004000619400
>>> +    6.17%     6.04%  qemu-ppc64  qemu-ppc64               [.] parts64_muladd
>>> +    5.85%     0.00%  qemu-ppc64  [unknown]                [.] 0x00000040006167e0
>>> +    5.74%     0.00%  qemu-ppc64  [unknown]                [.] 0x0000b693fcffffd3
>>> +    5.45%     4.78%  qemu-ppc64  qemu-ppc64               [.] 
>>> float64r32_round_pack_canonical
>>>
>>> Suggested-by: Richard Henderson <richard.henderson@linaro.org>
>>> Message-Id: <ec9cfe5a-d5f2-466d-34dc-c35817e7e010@linaro.org>
>>> [AJB: Patchified rth's suggestion]
>>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>>> Cc: BALATON Zoltan <balaton@eik.bme.hu>
>>
>> Replace Cc: with
>> Tested-by: BALATON Zoltan <balaton@eik.bme.hu>
>>
>> This solves the softfloat related usages, the rest probably are lower overhead, I could 
>> not measure any more improvement with removing asserts on top of this patch. I still 
>> have these functions high in my profiling result:
>>
>> children  self    command          symbol
>> 11.40%    10.86%  qemu-system-ppc  helper_compute_fprf_float64
>> 11.25%     0.61%  qemu-system-ppc  helper_fmadds
>> 10.01%     3.23%  qemu-system-ppc  float64r32_round_pack_canonical
>> 8.59%     1.80%  qemu-system-ppc  helper_float_check_status
>> 8.34%     7.23%  qemu-system-ppc  parts64_muladd
>> 8.16%     0.67%  qemu-system-ppc  helper_fmuls
>> 8.08%     0.43%  qemu-system-ppc  parts64_uncanon
>> 7.49%     1.78%  qemu-system-ppc  float64r32_mul
>> 7.32%     7.32%  qemu-system-ppc  parts64_uncanon_normal
>> 6.48%     0.52%  qemu-system-ppc  helper_fadds
>> 6.31%     6.31%  qemu-system-ppc  do_float_check_status
>> 5.99%     1.14%  qemu-system-ppc  float64r32_add
>>
>> Any idea on those?
>>
>> Unrelated to this patch I also started to see random crashes with a DSI on a dcbz 
>> instruction now which did not happen before (or not frequently enough for me to notice). 
>> I did not bisect that as it happens randomly but I wonder if it could be related to 
>> recent unaligned access changes or some other TCG change? Any idea what to check?
>>
>> Regards,
>> BALATON Zoltan
>>
>>> ---
>>> fpu/softfloat.c | 22 +++++++++++-----------
>>> 1 file changed, 11 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fpu/softfloat.c b/fpu/softfloat.c
>>> index 108f9cb224..42e6c188b4 100644
>>> --- a/fpu/softfloat.c
>>> +++ b/fpu/softfloat.c
>>> @@ -593,27 +593,27 @@ static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, 
>>> uint64_t raw)
>>>     };
>>> }
>>>
>>> -static inline void float16_unpack_raw(FloatParts64 *p, float16 f)
>>> +static void QEMU_FLATTEN float16_unpack_raw(FloatParts64 *p, float16 f)
>>> {
>>>     unpack_raw64(p, &float16_params, f);
>>> }
>>>
>>> -static inline void bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
>>> +static void QEMU_FLATTEN bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
>>> {
>>>     unpack_raw64(p, &bfloat16_params, f);
>>> }
>>>
>>> -static inline void float32_unpack_raw(FloatParts64 *p, float32 f)
>>> +static void QEMU_FLATTEN float32_unpack_raw(FloatParts64 *p, float32 f)
>>> {
>>>     unpack_raw64(p, &float32_params, f);
>>> }
>>>
>>> -static inline void float64_unpack_raw(FloatParts64 *p, float64 f)
>>> +static void QEMU_FLATTEN float64_unpack_raw(FloatParts64 *p, float64 f)
>>> {
>>>     unpack_raw64(p, &float64_params, f);
>>> }
>>>
>>> -static void floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
>>> +static void QEMU_FLATTEN floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
>>> {
>>>     *p = (FloatParts128) {
>>>         .cls = float_class_unclassified,
>>> @@ -623,7 +623,7 @@ static void floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
>>>     };
>>> }
>>>
>>> -static void float128_unpack_raw(FloatParts128 *p, float128 f)
>>> +static void QEMU_FLATTEN float128_unpack_raw(FloatParts128 *p, float128 f)
>>> {
>>>     const int f_size = float128_params.frac_size - 64;
>>>     const int e_size = float128_params.exp_size;
>>> @@ -650,27 +650,27 @@ static uint64_t pack_raw64(const FloatParts64 *p, const FloatFmt 
>>> *fmt)
>>>     return ret;
>>> }
>>>
>>> -static inline float16 float16_pack_raw(const FloatParts64 *p)
>>> +static float16 QEMU_FLATTEN float16_pack_raw(const FloatParts64 *p)
>>> {
>>>     return make_float16(pack_raw64(p, &float16_params));
>>> }
>>>
>>> -static inline bfloat16 bfloat16_pack_raw(const FloatParts64 *p)
>>> +static bfloat16 QEMU_FLATTEN bfloat16_pack_raw(const FloatParts64 *p)
>>> {
>>>     return pack_raw64(p, &bfloat16_params);
>>> }
>>>
>>> -static inline float32 float32_pack_raw(const FloatParts64 *p)
>>> +static float32 QEMU_FLATTEN float32_pack_raw(const FloatParts64 *p)
>>> {
>>>     return make_float32(pack_raw64(p, &float32_params));
>>> }
>>>
>>> -static inline float64 float64_pack_raw(const FloatParts64 *p)
>>> +static float64 QEMU_FLATTEN float64_pack_raw(const FloatParts64 *p)
>>> {
>>>     return make_float64(pack_raw64(p, &float64_params));
>>> }
>>>
>>> -static float128 float128_pack_raw(const FloatParts128 *p)
>>> +static float128 QEMU_FLATTEN float128_pack_raw(const FloatParts128 *p)
>>> {
>>>     const int f_size = float128_params.frac_size - 64;
>>>     const int e_size = float128_params.exp_size;
>>

next prev parent reply	other threads:[~2023-06-23  5:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-23 13:11 [RFC PATCH] softfloat: use QEMU_FLATTEN to avoid mistaken isra inlining Alex Bennée
2023-05-23 13:57 ` BALATON Zoltan
2023-05-23 14:33   ` Richard Henderson
2023-05-23 17:51     ` BALATON Zoltan
2023-05-25 13:22     ` Paolo Bonzini
2023-05-25 13:30       ` Richard Henderson
2023-05-25 23:15       ` BALATON Zoltan
2023-05-25 13:30     ` Paolo Bonzini
2023-05-25 23:19       ` BALATON Zoltan
2023-05-26 11:56   ` BALATON Zoltan
2023-06-22 20:55   ` BALATON Zoltan
2023-06-23  5:50     ` Richard Henderson [this message]
2023-05-23 14:18 ` Philippe Mathieu-Daudé
2023-05-23 15:34 ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=644f6d2e-0c6c-e97e-6930-706d36af24f6@linaro.org \
    --to=richard.henderson@linaro.org \
    --cc=alex.bennee@linaro.org \
    --cc=aurelien@aurel32.net \
    --cc=balaton@eik.bme.hu \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).