From: Brian Norris <briannorris@chromium.org>
To: Yury Norov <yury.norov@gmail.com>
Cc: Nathan Chancellor <nathan@kernel.org>,
llvm@lists.linux.dev, linux-kernel@vger.kernel.org,
Bill Wendling <morbo@google.com>,
Justin Stitt <justinstitt@google.com>,
Nick Desaulniers <ndesaulniers@google.com>,
Rasmus Villemoes <linux@rasmusvillemoes.dk>,
Kees Cook <kees@kernel.org>
Subject: Re: [PATCH v2] bitmap: Switch from inline to __always_inline
Date: Fri, 12 Jul 2024 13:42:22 -0700 [thread overview]
Message-ID: <ZpGVLl7NQZScd8aH@google.com> (raw)
In-Reply-To: <ZpFhv5VSYZ6jnsd4@yury-ThinkPad>
Hi Yury,
On Fri, Jul 12, 2024 at 10:02:55AM -0700, Yury Norov wrote:
> On Thu, Jul 11, 2024 at 09:37:11AM -0700, Brian Norris wrote:
> > Based on recent discussions, it seems like a good idea to inline the
> > bitmap functions which make up cpumask*() implementations, as well as a
> > few other bitmap users, to ensure the inlining doesn't break in the
> > future and produce the same problems, as well as to give the best chance
> > that the intended small_const_nbits() optimizations carry through.
>
> small_const_nbits() optimization always carries through. As far as I
> understand this things, the problem is that inliner may make a (wrong)
> no-go decision before unwinding the small_const_nbits() part.
>
> small_const_nbits() always leads to eliminating either inline or outline
> code, but inliner seemingly doesn't understand it. This leads to having
> those tiny functions uninlined.
I think we're more or less saying the same thing -- if for whatever
reason the compiler doesn't inline, then these will never be "small
consts", and therefore the machinery effectively "doesn't carry
through".
Per previous discussions, there are many reasons a compiler may ignore
the 'inline' hint.
> But I'm not sure about that and don't know how to check what happens
> under the compilers' hood. Can compiler gurus please clarify?
>
> > This change has been previously proposed in the past as:
> >
> > Subject: [PATCH 1/3] bitmap: switch from inline to __always_inline
> > https://lore.kernel.org/all/20221027043810.350460-2-yury.norov@gmail.com/
> >
> > In the end, this looks like a rebase of Yury's work, although
> > technically it's a rewrite.
>
> I like it more, when it's split onto several patches. We already have
> my bitmap.h rework and your cpumask.h one with their own reasonable
> motivations. Can you keep that patch structure please? Then, we can add
I'm not quite sure what you mean. I imitated the patch structure of your
patch 1 that I linked, which had the following file-diff list:
include/linux/bitmap.h | 46 ++++++-------
include/linux/cpumask.h | 144 +++++++++++++++++++--------------------
include/linux/find.h | 40 +++++------
include/linux/nodemask.h | 86 +++++++++++------------
Are you suggesting I should split that into 2 (one for cpumask and one
for the rest)?
> separate patches for find.h and nodemask.h. If rebasing the old bitmap.h
> patch isn't clean and takes some effort, feel free to add your
> co-developed-by.
I don't really care who gets authorship, but I did manually rewrite it,
since there were enough conflicts, and it's basically scriptable. Feel
free to suggest which Author/S-o-b/Co-developed combo makes sense for
that, and I can adopt it in v3 :)
> > According to bloat-o-meter, my vmlinux decreased in size by a total of
> > 1538 bytes (-0.01%) with this change.
> >
> > [0] CONFIG_HOTPLUG_PARALLEL=y ('select'ed for x86 as of [1]) and
> > CONFIG_GCOV_PROFILE_ALL.
> >
> > [1] commit 0c7ffa32dbd6 ("x86/smpboot/64: Implement
> > arch_cpuhp_init_parallel_bringup() and enable it")
> >
> > Cc: Yury Norov <yury.norov@gmail.com>
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
>
> So... This whole optimization is only worth the reviewer's time if we
> can prove it's sustainable across different compilers and configurations.
>
> Just saying that under some particular config it works, brings little
> value for those who are not interested in that config. Moreover, if
> one enables GCOV, he likely doesn't care much about the .text size.
> And for those using release configs, it only brings uncertainty.
I think I partially disagree. If it's neutral for one compiler but
helpful for the other, that's still a win. But otherwise, I agree we
should have some accounting of diversity -- not just a single test.
> Let's test it broader? We've got 2 main compilers - gcc and llvm, and
> at least two configurations that may be relevant: defconfig and your
> HOTPLUG_PARALLEL + GCOV_PROFILE_ALL. And it would be nice to prove
> that the optimization is sustainable across compiler versions.
>
> I have the following table in mind:
>
> defconfig your conf
> GCC 9 | -1900 | |
> CLANG 13 | | |
> GCC 13 | | |
> CLANG 18 | -100 | -1538 |
>
> The defconfig nubmers above are from my past experiments. And you can
> add more configs/compilers, of course.
Perhaps I wasn't too clear, but I was actually testing a few things:
* my ChromeOS-y build, with gcov and clang -- to ensure the section
mismatch warning/error disappears
* a pure mainline / semi-defaults build (which happened to use my host
distro's gcc 13), to do size comparisons. I also happened to enable
gcov in the currently-posted experiments.
But agreed, I can get a larger matrix with a bit more specifity in my
commit log. TBD on the ease of getting a Clang 13 or GCC 9 toolchain,
but I think I can convince my distro to provide them.
> > --- a/include/linux/bitmap.h
> > +++ b/include/linux/bitmap.h
> > @@ -203,7 +203,7 @@ unsigned long bitmap_find_next_zero_area_off(unsigned long *map,
> > * the bit offset of all zero areas this function finds is multiples of that
> > * power of 2. A @align_mask of 0 means no alignment is required.
> > */
> > -static inline unsigned long
> > +static __always_inline unsigned long
> > bitmap_find_next_zero_area(unsigned long *map,
> > unsigned long size,
> > unsigned long start,
>
> Let's split them such that return type would be at the same line as
> the function name. It would help to distinguish function declaration
> from invocations in grep output:
Ack. I thought I was being consistent with existing cpumask.h style, but
I got this set wrong. Will fix.
Brian
next prev parent reply other threads:[~2024-07-12 20:42 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-11 16:37 [PATCH v2] bitmap: Switch from inline to __always_inline Brian Norris
2024-07-12 17:02 ` Yury Norov
2024-07-12 18:01 ` Kees Cook
2024-07-12 23:54 ` Yury Norov
2024-07-12 20:42 ` Brian Norris [this message]
2024-07-13 0:19 ` Yury Norov
2024-07-13 23:27 ` Nathan Chancellor
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZpGVLl7NQZScd8aH@google.com \
--to=briannorris@chromium.org \
--cc=justinstitt@google.com \
--cc=kees@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@rasmusvillemoes.dk \
--cc=llvm@lists.linux.dev \
--cc=morbo@google.com \
--cc=nathan@kernel.org \
--cc=ndesaulniers@google.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.