From: Richard Henderson <richard.henderson@linaro.org>
To: "BALATON Zoltan" <balaton@eik.bme.hu>,
"Alex Bennée" <alex.bennee@linaro.org>
Cc: qemu-trivial@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [PATCH] bitops.h: Compile out asserts without --enable-debug
Date: Mon, 22 May 2023 16:34:33 -0700 [thread overview]
Message-ID: <ec9cfe5a-d5f2-466d-34dc-c35817e7e010@linaro.org> (raw)
In-Reply-To: <49c4781a-1e91-85ef-d9cb-6996e9bbb10c@eik.bme.hu>
[-- Attachment #1: Type: text/plain, Size: 6588 bytes --]
On 5/22/23 15:26, BALATON Zoltan wrote:
> On Mon, 22 May 2023, Alex Bennée wrote:
>> (ajb: add Richard for his compiler-fu)
>> BALATON Zoltan <balaton@eik.bme.hu> writes:
>>> On Mon, 22 May 2023, Alex Bennée wrote:
>>>> BALATON Zoltan <balaton@eik.bme.hu> writes:
>>>>
>>>>> The low level extract and deposit funtions provided by bitops.h are
>>>>> used in performance critical places. It crept into target/ppc via
>>>>> FIELD_EX64 and also used by softfloat so PPC code using a lot of FPU
>>>>> where hardfloat is also disabled is doubly affected.
>>>>
>>>> Most of these asserts compile out to nothing if the compiler is able to
>>>> verify the constants are in the range. For example examining
>>>> the start of float64_add:
>>>>
>> <snip>
>>>>
>>>> I don't see any check and abort steps because all the shift and mask
>>>> values are known at compile time. The softfloat compilation certainly
>>>> does have some assert points though:
>>>>
>>>> readelf -s ./libqemu-ppc64-softmmu.fa.p/fpu_softfloat.c.o |grep assert
>>>> 136: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND g_assertion_mess[...]
>>>> 138: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __assert_fail
>>>>
>>>> but the references are for the ISRA segments so its tricky to know if
>>>> they get used or are just there for LTO purposes.
>>>>
>>>> If there are hot-paths that show up the extract/deposit functions I
>>>> suspect a better approach would be to implement _nocheck variants (or
>>>> maybe _noassert?) and use them where required rather than turning off
>>>> the assert checking for these utility functions.
>>>
>>> Just to clarify again, the asserts are still there when compiled with
>>> --enable-debug. The patch only turns them off for optimised release
>>> builds which I think makes sense if these asserts are to catch
>>> programming errors.
>>
>> Well as Peter said the general policy is to keep asserts in but I
>> appreciate this is a hotpath case.
>>
>>> I think I've also suggested adding noassert
>>> versions of these but that wasn't a popular idea and it may also not
>>> be easy to convert all places to use that like for example the
>>> register fields related usage in target/ppc as that would also affect
>>> other places.
>>
>> Is code generation or device emulation really on the hot-path. Generally
>> a well predicted assert is in the noise for those operations.
>
> They aren't in code generation but in helpers as you can also see in the profile below and
> so they can be on hot path. Also I've noticed that extract8 and extract16 just call
> extract32 after adding another assert on their own in addition to the one in extract32
> which is double overhead for really no reason. I'd delete all these asserts as the
> likelhood of bugs these could catch is very low anyway (how often do you expect somebody
> to call these with out of bound values that would not be obvious from the results
> otherwise?) but leaving them in non-debug builds is totally useless in my opinion.
>
>>> So this seems to be the simplest and most effective
>>> approach.
>>>
>>> The softfloat related usage in these tests I've done seem to mostly
>>> come from unpacking and repacking floats in softfloat which is done
>>> for every operation, e.g. muladd which mp3 encoding mostly uses does 3
>>> unpacks and 1 pack for each call and each unpack is 3 extracts so even
>>> small overheads add app quickly. Just 1 muladd will result in 9
>>> extracts and 2 deposits at least plus updating PPC flags for each FPU
>>> op adds a bunch more. I did some profiling with perf to find these.
>>
>> After some messing about trying to get lame to cross compile to a static
>> binary I was able to replicate what you've seen:
>>
>> 11.44% qemu-ppc64 qemu-ppc64 [.] unpack_raw64.isra.0
>> 11.03% qemu-ppc64 qemu-ppc64 [.] parts64_uncanon_normal
>> 8.26% qemu-ppc64 qemu-ppc64 [.] helper_compute_fprf_float64
>> 6.75% qemu-ppc64 qemu-ppc64 [.] do_float_check_status
>> 5.34% qemu-ppc64 qemu-ppc64 [.] parts64_muladd
>> 4.75% qemu-ppc64 qemu-ppc64 [.] pack_raw64.isra.0
>> 4.38% qemu-ppc64 qemu-ppc64 [.] parts64_canonicalize
>> 3.62% qemu-ppc64 qemu-ppc64 [.] float64r32_round_pack_canonical
>> 3.32% qemu-ppc64 qemu-ppc64 [.] helper_todouble
>> 2.68% qemu-ppc64 qemu-ppc64 [.] float64_add
>> 2.51% qemu-ppc64 qemu-ppc64 [.] float64_hs_compare
>> 2.30% qemu-ppc64 qemu-ppc64 [.] float64r32_muladd
>> 1.80% qemu-ppc64 qemu-ppc64 [.] float64r32_mul
>> 1.40% qemu-ppc64 qemu-ppc64 [.] float64r32_add
>> 1.34% qemu-ppc64 qemu-ppc64 [.] parts64_mul
>> 1.16% qemu-ppc64 qemu-ppc64 [.] parts64_addsub
>> 1.14% qemu-ppc64 qemu-ppc64 [.] helper_reset_fpstatus
>> 1.06% qemu-ppc64 qemu-ppc64 [.] helper_float_check_status
>> 1.04% qemu-ppc64 qemu-ppc64 [.] float64_muladd
>
> I've run 32 bit PPC version in qemu-system-ppc so the profile is a bit different (has more
> system related overhead that I plan to look at separately) but this part is similar to the
> above. I also wonder what makes helper_compute_fprf_float64 a bottleneck as that does not
> seem to have much extract/deposit, only a call to clz but it's hard to tell what it really
> does due to nested calls and macros. I've also seen this function among the top contenders
> in my profiling.
>
>> what I find confusing is the fact the parts extraction and packing
>> should all be known constants which should cause the asserts to
>> disappear. However it looks like the compiler decided to bring in a copy
>> of the whole inline function (ISRA = >interprocedural scalar replacement
>> of aggregates) which obviously can't fold the constants and eliminate
>> the assert.
>
> Could it be related to that while the parts size and start are marked const but pulled out
> of a struct field so the compiler may not know their actual value until run time?
>
>> Richard,
>>
>> Any idea of why the compiler might decide to do something like this?
Try this.
r~
[-- Attachment #2: z.patch --]
[-- Type: text/x-patch, Size: 2697 bytes --]
diff --git a/fpu/softfloat.c b/fpu/softfloat.c
index 108f9cb224..42e6c188b4 100644
--- a/fpu/softfloat.c
+++ b/fpu/softfloat.c
@@ -593,27 +593,27 @@ static void unpack_raw64(FloatParts64 *r, const FloatFmt *fmt, uint64_t raw)
};
}
-static inline void float16_unpack_raw(FloatParts64 *p, float16 f)
+static void QEMU_FLATTEN float16_unpack_raw(FloatParts64 *p, float16 f)
{
unpack_raw64(p, &float16_params, f);
}
-static inline void bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
+static void QEMU_FLATTEN bfloat16_unpack_raw(FloatParts64 *p, bfloat16 f)
{
unpack_raw64(p, &bfloat16_params, f);
}
-static inline void float32_unpack_raw(FloatParts64 *p, float32 f)
+static void QEMU_FLATTEN float32_unpack_raw(FloatParts64 *p, float32 f)
{
unpack_raw64(p, &float32_params, f);
}
-static inline void float64_unpack_raw(FloatParts64 *p, float64 f)
+static void QEMU_FLATTEN float64_unpack_raw(FloatParts64 *p, float64 f)
{
unpack_raw64(p, &float64_params, f);
}
-static void floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
+static void QEMU_FLATTEN floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
{
*p = (FloatParts128) {
.cls = float_class_unclassified,
@@ -623,7 +623,7 @@ static void floatx80_unpack_raw(FloatParts128 *p, floatx80 f)
};
}
-static void float128_unpack_raw(FloatParts128 *p, float128 f)
+static void QEMU_FLATTEN float128_unpack_raw(FloatParts128 *p, float128 f)
{
const int f_size = float128_params.frac_size - 64;
const int e_size = float128_params.exp_size;
@@ -650,27 +650,27 @@ static uint64_t pack_raw64(const FloatParts64 *p, const FloatFmt *fmt)
return ret;
}
-static inline float16 float16_pack_raw(const FloatParts64 *p)
+static float16 QEMU_FLATTEN float16_pack_raw(const FloatParts64 *p)
{
return make_float16(pack_raw64(p, &float16_params));
}
-static inline bfloat16 bfloat16_pack_raw(const FloatParts64 *p)
+static bfloat16 QEMU_FLATTEN bfloat16_pack_raw(const FloatParts64 *p)
{
return pack_raw64(p, &bfloat16_params);
}
-static inline float32 float32_pack_raw(const FloatParts64 *p)
+static float32 QEMU_FLATTEN float32_pack_raw(const FloatParts64 *p)
{
return make_float32(pack_raw64(p, &float32_params));
}
-static inline float64 float64_pack_raw(const FloatParts64 *p)
+static float64 QEMU_FLATTEN float64_pack_raw(const FloatParts64 *p)
{
return make_float64(pack_raw64(p, &float64_params));
}
-static float128 float128_pack_raw(const FloatParts128 *p)
+static float128 QEMU_FLATTEN float128_pack_raw(const FloatParts128 *p)
{
const int f_size = float128_params.frac_size - 64;
const int e_size = float128_params.exp_size;
next prev parent reply other threads:[~2023-05-22 23:35 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-20 20:54 [PATCH] bitops.h: Compile out asserts without --enable-debug BALATON Zoltan
2023-05-22 9:16 ` Peter Maydell
2023-05-22 12:00 ` BALATON Zoltan
2023-05-22 12:05 ` Peter Maydell
2023-05-22 11:26 ` Alex Bennée
2023-05-22 13:25 ` BALATON Zoltan
2023-05-22 16:48 ` Alex Bennée
2023-05-22 22:26 ` BALATON Zoltan
2023-05-22 23:34 ` Richard Henderson [this message]
2023-05-23 6:43 ` Alex Bennée
2023-06-05 22:06 ` BALATON Zoltan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ec9cfe5a-d5f2-466d-34dc-c35817e7e010@linaro.org \
--to=richard.henderson@linaro.org \
--cc=alex.bennee@linaro.org \
--cc=balaton@eik.bme.hu \
--cc=qemu-devel@nongnu.org \
--cc=qemu-trivial@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).