From: Brian Norris <briannorris@chromium.org>
To: Yury Norov <yury.norov@gmail.com>
Cc: Nathan Chancellor <nathan@kernel.org>,
llvm@lists.linux.dev, linux-kernel@vger.kernel.org,
Bill Wendling <morbo@google.com>,
Justin Stitt <justinstitt@google.com>,
Nick Desaulniers <ndesaulniers@google.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Kees Cook <kees@kernel.org>
Subject: Re: [PATCH v2] bitmap: Switch from inline to __always_inline
Date: Fri, 12 Jul 2024 13:42:22 -0700 [thread overview]
Message-ID: <ZpGVLl7NQZScd8aH@google.com> (raw)
In-Reply-To: <ZpFhv5VSYZ6jnsd4@yury-ThinkPad>
Hi Yury,
On Fri, Jul 12, 2024 at 10:02:55AM -0700, Yury Norov wrote:
> On Thu, Jul 11, 2024 at 09:37:11AM -0700, Brian Norris wrote:
> > Based on recent discussions, it seems like a good idea to inline the
> > bitmap functions which make up cpumask*() implementations, as well as a
> > few other bitmap users, to ensure the inlining doesn't break in the
> > future and produce the same problems, as well as to give the best chance
> > that the intended small_const_nbits() optimizations carry through.
>
> small_const_nbits() optimization always carries through. As far as I
> understand this things, the problem is that inliner may make a (wrong)
> no-go decision before unwinding the small_const_nbits() part.
>
> small_const_nbits() always leads to eliminating either inline or outline
> code, but inliner seemingly doesn't understand it. This leads to having
> those tiny functions uninlined.
I think we're more or less saying the same thing -- if for whatever
reason the compiler doesn't inline, then these will never be "small
consts", and therefore the machinery effectively "doesn't carry
through".
Per previous discussions, there are many reasons a compiler may ignore
the 'inline' hint.
> But I'm not sure about that and don't know how to check what happens
> under the compilers' hood. Can compiler gurus please clarify?
>
> > This change has been previously proposed in the past as:
> >
> > Subject: [PATCH 1/3] bitmap: switch from inline to __always_inline
> > https://lore.kernel.org/all/20221027043810.350460-2-yury.norov@gmail.com/
> >
> > In the end, this looks like a rebase of Yury's work, although
> > technically it's a rewrite.
>
> I like it more, when it's split onto several patches. We already have
> my bitmap.h rework and your cpumask.h one with their own reasonable
> motivations. Can you keep that patch structure please? Then, we can add
I'm not quite sure what you mean. I imitated the patch structure of your
patch 1 that I linked, which had the following file-diff list:
include/linux/bitmap.h | 46 ++++++-------
include/linux/cpumask.h | 144 +++++++++++++++++++--------------------
include/linux/find.h | 40 +++++------
include/linux/nodemask.h | 86 +++++++++++------------
Are you suggesting I should split that into 2 (one for cpumask and one
for the rest)?
> separate patches for find.h and nodemask.h. If rebasing the old bitmap.h
> patch isn't clean and takes some effort, feel free to add your
> co-developed-by.
I don't really care who gets authorship, but I did manually rewrite it,
since there were enough conflicts, and it's basically scriptable. Feel
free to suggest which Author/S-o-b/Co-developed combo makes sense for
that, and I can adopt it in v3 :)
> > According to bloat-o-meter, my vmlinux decreased in size by a total of
> > 1538 bytes (-0.01%) with this change.
> >
> > [0] CONFIG_HOTPLUG_PARALLEL=y ('select'ed for x86 as of [1]) and
> > CONFIG_GCOV_PROFILE_ALL.
> >
> > [1] commit 0c7ffa32dbd6 ("x86/smpboot/64: Implement
> > arch_cpuhp_init_parallel_bringup() and enable it")
> >
> > Cc: Yury Norov <yury.norov@gmail.com>
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
>
> So... This whole optimization is only worth the reviewer's time if we
> can prove it's sustainable across different compilers and configurations.
>
> Just saying that under some particular config it works, brings little
> value for those who are not interested in that config. Moreover, if
> one enables GCOV, he likely doesn't care much about the .text size.
> And for those using release configs, it only brings uncertainty.
I think I partially disagree. If it's neutral for one compiler but
helpful for the other, that's still a win. But otherwise, I agree we
should have some accounting of diversity -- not just a single test.
> Let's test it broader? We've got 2 main compilers - gcc and llvm, and
> at least two configurations that may be relevant: defconfig and your
> HOTPLUG_PARALLEL + GCOV_PROFILE_ALL. And it would be nice to prove
> that the optimization is sustainable across compiler versions.
>
> I have the following table in mind:
>
> defconfig your conf
> GCC 9 | -1900 | |
> CLANG 13 | | |
> GCC 13 | | |
> CLANG 18 | -100 | -1538 |
>
> The defconfig nubmers above are from my past experiments. And you can
> add more configs/compilers, of course.
Perhaps I wasn't too clear, but I was actually testing a few things:
* my ChromeOS-y build, with gcov and clang -- to ensure the section
mismatch warning/error disappears
* a pure mainline / semi-defaults build (which happened to use my host
distro's gcc 13), to do size comparisons. I also happened to enable
gcov in the currently-posted experiments.
But agreed, I can get a larger matrix with a bit more specifity in my
commit log. TBD on the ease of getting a Clang 13 or GCC 9 toolchain,
but I think I can convince my distro to provide them.
> > --- a/include/linux/bitmap.h
> > +++ b/include/linux/bitmap.h
> > @@ -203,7 +203,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
> > * the bit offset of all zero areas this function finds is multiples of that
> > * power of 2. A @align_mask of 0 means no alignment is required.
> > */
> > -static inline unsigned long
> > +static __always_inline unsigned long
> > bitmap_find_next_zero_area(unsigned long *map,
> > unsigned long size,
> > unsigned long start,
>
> Let's split them such that return type would be at the same line as
> the function name. It would help to distinguish function declaration
> from invocations in grep output:
Ack. I thought I was being consistent with existing cpumask.h style, but
I got this set wrong. Will fix.
Brian
next prev parent reply other threads:[~2024-07-12 20:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-11 16:37 [PATCH v2] bitmap: Switch from inline to __always_inline Brian Norris
2024-07-12 17:02 ` Yury Norov
2024-07-12 18:01 ` Kees Cook
2024-07-12 23:54 ` Yury Norov
2024-07-12 20:42 ` Brian Norris [this message]
2024-07-13 0:19 ` Yury Norov
2024-07-13 23:27 ` Nathan Chancellor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpGVLl7NQZScd8aH@google.com \
--to=briannorris@chromium.org \
--cc=justinstitt@google.com \
--cc=kees@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=llvm@lists.linux.dev \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox