From: Geert Uytterhoeven <geert@linux-m68k.org>
To: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>, Yury Norov <yury.norov@gmail.com>,
Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
Richard Henderson <rth@twiddle.net>,
Matt Turner <mattst88@gmail.com>, Brian Cain <bcain@quicinc.com>,
Yoshinori Sato <ysato@users.sourceforge.jp>,
Rich Felker <dalias@libc.org>,
"David S. Miller" <davem@davemloft.net>,
Kees Cook <keescook@chromium.org>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Marco Elver <elver@google.com>, Borislav Petkov <bp@suse.de>,
Tony Luck <tony.luck@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
alpha <linux-alpha@vger.kernel.org>,
"open list:QUALCOMM HEXAGON..." <linux-hexagon@vger.kernel.org>,
"linux-ia64@vger.kernel.org" <linux-ia64@vger.kernel.org>,
linux-m68k <linux-m68k@lists.linux-m68k.org>,
Linux-sh list <linux-sh@vger.kernel.org>,
sparclinux <sparclinux@vger.kernel.org>,
Linux-Arch <linux-arch@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants
Date: Wed, 8 Jun 2022 11:55:08 +0200 [thread overview]
Message-ID: <CAMuHMdUeM_ntgZzmeHVMJ_8neyOSRUa_xDNE46eM7cHt_sDj1g@mail.gmail.com> (raw)
In-Reply-To: <20220607154759.43549-1-alexandr.lobakin@intel.com>
Hi Olek,
On Tue, Jun 7, 2022 at 5:51 PM Alexander Lobakin
<alexandr.lobakin@intel.com> wrote:
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> > On Mon, Jun 6, 2022 at 1:50 PM Alexander Lobakin
> > <alexandr.lobakin@intel.com> wrote:
> > > While I was working on converting some structure fields from a fixed
> > > type to a bitmap, I started observing code size increase not only in
> > > places where the code works with the converted structure fields, but
> > > also where the converted vars were on the stack. That said, the
> > > following code:
> > >
> > > DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1];
> > > unsigned long bar = BIT(BAR_BIT);
> > > unsigned long baz = 0;
> > >
> > > __set_bit(FOO_BIT, foo);
> > > baz |= BIT(BAZ_BIT);
> > >
> > > BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo));
> > > BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT));
> > > BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT));
> > >
> > > triggers the first assertion on x86_64, which means that the
> > > compiler is unable to evaluate it to a compile-time initializer
> > > when the architecture-specific bitop is used even if it's obvious.
> > > I found that this is due to that many architecture-specific
> > > non-atomic bitop implementations use inline asm or other hacks which
> > > are faster or more robust when working with "real" variables (i.e.
> > > fields from the structures etc.), but the compilers have no clue how
> > > to optimize them out when called on compile-time constants.
> > >
> > > So, in order to let the compiler optimize out such cases, expand the
> > > test_bit() and __*_bit() definitions with a compile-time condition
> > > check, so that they will pick the generic C non-atomic bitop
> > > implementations when all of the arguments passed are compile-time
> > > constants, which means that the result will be a compile-time
> > > constant as well and the compiler will produce more efficient and
> > > simple code in 100% cases (no changes when there's at least one
> > > non-compile-time-constant argument).
> > > The condition itself:
> > >
> > > if (
> > > __builtin_constant_p(nr) && /* <- bit position is constant */
> > > __builtin_constant_p(!!addr) && /* <- compiler knows bitmap addr is
> > > always either NULL or not */
> > > addr && /* <- bitmap addr is not NULL */
> > > __builtin_constant_p(*addr) /* <- compiler knows the value of
> > > the target bitmap */
> > > )
> > > /* then pick the generic C variant
> > > else
> > > /* old code path, arch-specific
> > >
> > > I also tried __is_constexpr() as suggested by Andy, but it was
> > > always returning 0 ('not a constant') for the 2,3 and 4th
> > > conditions.
> > >
> > > The savings on x86_64 with LLVM are insane (.text):
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,test}
> > > add/remove: 72/75 grow/shrink: 182/518 up/down: 53925/-137810 (-83885)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,mod}
> > > add/remove: 7/1 grow/shrink: 1/19 up/down: 1135/-4082 (-2947)
> > >
> > > $ scripts/bloat-o-meter -c vmlinux.{base,all}
> > > add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816)
> >
> > Thank you!
> >
> > I gave it a try on m68k, and am a bit disappointed seeing an increase
> > in code size:
> >
> > add/remove: 49/13 grow/shrink: 279/138 up/down: 6434/-3342 (3092)
>
> Ufff, that sucks =\
> Could you please try to compile the following code snippet (with the
> series applied)?
>
> unsigned long map;
>
> bitmap_zero(&map, BITS_PER_LONG);
> __set_bit(1, &map);
> BUILD_BUG_ON(!__builtin_constant_p(map));
>
> If it fails during the vmlinux linkage, it will mean that on your
> architecture/setup the compiler is unable to optimize the generic
> implementations to compile-time constants and I'll need to debug
> this more (probably via some compiler explorer).
Builds and links fine.
> You could also check the vmlinux size after applying each patch
> to see which one does this if you feel like it :)
The (incremental) impact of the various patches is shown below:
bitops: unify non-atomic bitops prototypes across architectures
add/remove: 4/11 grow/shrink: 123/160 up/down: 1700/-2786 (-1086)
bitops: let optimize out non-atomic bitops on compile-time constants
add/remove: 50/7 grow/shrink: 280/101 up/down: 6798/-2620 (4178)
I.e. the total impact is -1086 + 4178 = +3092
Looking at the impact of the last change on a single file, with rather
small functions to make it easier to analyze, the results are:
bloat-o-meter net/core/sock.o{.orig,}
add/remove: 3/1 grow/shrink: 20/3 up/down: 286/-68 (218)
Function old new delta
sock_flag - 38 +38
sock_set_flag - 22 +22
sock_reset_flag - 22 +22
sock_recv_errqueue 264 284 +20
sock_alloc_send_pskb 406 424 +18
__sock_set_timestamps 104 122 +18
sock_setsockopt 2412 2428 +16
sock_pfree 52 66 +14
sock_wfree 236 248 +12
sk_wait_data 222 234 +12
sk_destruct 70 82 +12
sk_wake_async 40 50 +10
sk_set_memalloc 74 84 +10
__sock_queue_rcv_skb 254 264 +10
__sk_backlog_rcv 92 102 +10
sock_getsockopt 1734 1742 +8
sock_no_linger 36 42 +6
sk_clone_lock 478 484 +6
sk_clear_memalloc 98 104 +6
__sk_receive_skb 194 200 +6
sock_init_data 344 348 +4
__sock_cmsg_send 196 200 +4
sk_common_release 152 154 +2
sock_set_keepalive 62 60 -2
sock_enable_timestamp 80 72 -8
sock_valbool_flag 34 12 -22
bset_mem_set_bit 36 - -36
Total: Before=18862, After=19080, chg +1.16%
Several small inline functions are no longer inlined.
And e.g. __sk_backlog_rcv() increases as it now calls sock_flag()
out-of-line, and needs to save more on the stack:
__sk_backlog_rcv:
+ move.l %a3,-(%sp) |,
move.l %d2,-(%sp) |,
- move.l 8(%sp),%a0 | sk, sk
-| arch/m68k/include/asm/bitops.h:163: return (addr[nr >> 5] & (1UL
<< (nr & 31))) != 0;
- move.l 76(%a0),%d0 | MEM[(const long unsigned int
*)sk_6(D) + 76B], _14
+ move.l 12(%sp),%a3 | sk, sk
| net/core/sock.c:330: BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
- btst #14,%d0 |, _14
- jne .L193 |
+ pea 14.w |
+ move.l %a3,-(%sp) | sk,
+ jsr sock_flag |
+ addq.l #8,%sp |,
+ tst.b %d0 | tmp50
+ jne .L192 |
pea 330.w |
pea .LC0 |
pea .LC3 |
jsr _printk |
trap #7
Note that the above is with atari_defconfig, which has
CONFIG_CC_OPTIMIZE_FOR_SIZE=y.
Switching to CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y results in a kernel
that is ca. 25% larger, and the net impact of this series is:
add/remove: 24/27 grow/shrink: 227/233 up/down: 7494/-8080 (-586)
i.e. a reduction in size...
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
next prev parent reply other threads:[~2022-06-08 10:11 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-06 11:49 [PATCH 0/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-06 11:49 ` [PATCH 1/6] ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr() Alexander Lobakin
2022-06-06 11:49 ` [PATCH 2/6] bitops: always define asm-generic non-atomic bitops Alexander Lobakin
2022-06-06 12:44 ` Mark Rutland
2022-06-06 14:21 ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 3/6] bitops: define gen_test_bit() the same way as the rest of functions Alexander Lobakin
2022-06-06 16:19 ` Mark Rutland
2022-06-07 13:43 ` Marco Elver
2022-06-07 15:57 ` Alexander Lobakin
2022-06-07 16:15 ` Marco Elver
2022-06-07 16:28 ` Andy Shevchenko
2022-06-06 11:49 ` [PATCH 4/6] bitops: unify non-atomic bitops prototypes across architectures Alexander Lobakin
2022-06-06 16:25 ` Mark Rutland
2022-06-06 20:48 ` Yury Norov
2022-06-07 11:03 ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 5/6] bitops: wrap non-atomic bitops with a transparent macro Alexander Lobakin
2022-06-06 16:27 ` Mark Rutland
2022-06-07 10:57 ` Alexander Lobakin
2022-06-07 11:07 ` Alexander Lobakin
2022-06-06 11:49 ` [PATCH 6/6] bitops: let optimize out non-atomic bitops on compile-time constants Alexander Lobakin
2022-06-06 13:50 ` [PATCH 0/6] " Mark Rutland
2022-06-07 12:45 ` Geert Uytterhoeven
2022-06-07 15:47 ` Alexander Lobakin
2022-06-08 9:55 ` Geert Uytterhoeven [this message]
2022-06-08 13:31 ` Alexander Lobakin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMuHMdUeM_ntgZzmeHVMJ_8neyOSRUa_xDNE46eM7cHt_sDj1g@mail.gmail.com \
--to=geert@linux-m68k.org \
--cc=alexandr.lobakin@intel.com \
--cc=andriy.shevchenko@linux.intel.com \
--cc=arnd@arndb.de \
--cc=bcain@quicinc.com \
--cc=bp@suse.de \
--cc=dalias@libc.org \
--cc=davem@davemloft.net \
--cc=elver@google.com \
--cc=gregkh@linuxfoundation.org \
--cc=keescook@chromium.org \
--cc=linux-alpha@vger.kernel.org \
--cc=linux-arch@vger.kernel.org \
--cc=linux-hexagon@vger.kernel.org \
--cc=linux-ia64@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-m68k@lists.linux-m68k.org \
--cc=linux-sh@vger.kernel.org \
--cc=mattst88@gmail.com \
--cc=peterz@infradead.org \
--cc=rth@twiddle.net \
--cc=sparclinux@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=ysato@users.sourceforge.jp \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).