From: Michal Nazarewicz <mina86@mina86.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <rostedt@goodmis.org>,
Hagen Paul Pfeifer <hagen@jauu.net>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] include: kernel.h: rewrite min3, max3 and clamp using min and max
Date: Tue, 17 Jun 2014 15:49:05 +0200 [thread overview]
Message-ID: <xa1tionzwsry.fsf@mina86.com> (raw)
In-Reply-To: <20140616161804.bb06ed842f59371c031d1252@linux-foundation.org>
On Mon, Jun 16 2014, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Mon, 16 Jun 2014 23:07:22 +0200 Michal Nazarewicz <mina86@mina86.com> wrote:
>
>> It appears that gcc is better at optimising a double call to min
>> and max rather than open coded min3 and max3. This can be observed
>> here:
>>
>> ...
>>
>> Furthermore, after ___make allmodconfig && make bzImage modules___ this is the
>> comparison of image and modules sizes:
>>
>> # Without this patch applied
>> $ ls -l arch/x86/boot/bzImage **/*.ko |awk '{size += $5} END {print size}'
>> 350715800
>>
>> # With this patch applied
>> $ ls -l arch/x86/boot/bzImage **/*.ko |awk '{size += $5} END {print size}'
>> 349856528
>
> We saved nearly a megabyte by optimising min3(), max3() and clamp()?
>
> I'm counting a grand total of 182 callsites for those macros. So the
> saving is 4700 bytes per invokation? I don't believe it...
You're absolutely right. I must have messed something up here. This
portion of the commit message should be removed.
So I've redone this just on just the kernel image with allmodconfig:
Linux mpn-glaptop 3.13.0-29-generic #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mpn eng 51224656 Jun 17 14:15 vmlinux.before
-rwx------ 1 mpn eng 51224608 Jun 17 13:57 vmlinux.after
48 bytes reduction. The do_fault_around was a few instruction shorter
and as far as I can tell saved 12 bytes on the stack, i.e.:
$ grep -e rsp -e pop -e push do_fault_around.*
do_fault_around.before.s:push %rbp
do_fault_around.before.s:mov %rsp,%rbp
do_fault_around.before.s:push %r13
do_fault_around.before.s:push %r12
do_fault_around.before.s:push %rbx
do_fault_around.before.s:sub $0x38,%rsp
do_fault_around.before.s:add $0x38,%rsp
do_fault_around.before.s:pop %rbx
do_fault_around.before.s:pop %r12
do_fault_around.before.s:pop %r13
do_fault_around.before.s:pop %rbp
do_fault_around.after.s:push %rbp
do_fault_around.after.s:mov %rsp,%rbp
do_fault_around.after.s:push %r12
do_fault_around.after.s:push %rbx
do_fault_around.after.s:sub $0x30,%rsp
do_fault_around.after.s:add $0x30,%rsp
do_fault_around.after.s:pop %rbx
do_fault_around.after.s:pop %r12
do_fault_around.after.s:pop %rbp
or here side-by-side:
Before After
push %rbp push %rbp
mov %rsp,%rbp mov %rsp,%rbp
push %r13
push %r12 push %r12
push %rbx push %rbx
sub $0x38,%rsp sub $0x30,%rsp
add $0x38,%rsp add $0x30,%rsp
pop %rbx pop %rbx
pop %r12 pop %r12
pop %r13
pop %rbp pop %rbp
There are also fewer branches:
$ grep ^j do_fault_around.*
do_fault_around.before.s:jae ffffffff812079b7
do_fault_around.before.s:jmp ffffffff812079c5
do_fault_around.before.s:jmp ffffffff81207a14
do_fault_around.before.s:ja ffffffff812079f9
do_fault_around.before.s:jb ffffffff81207a10
do_fault_around.before.s:jmp ffffffff81207a63
do_fault_around.before.s:jne ffffffff812079df
do_fault_around.after.s:jmp ffffffff812079fd
do_fault_around.after.s:ja ffffffff812079e2
do_fault_around.after.s:jb ffffffff812079f9
do_fault_around.after.s:jmp ffffffff81207a4c
do_fault_around.after.s:jne ffffffff812079c8
And here's with allyesconfig on a different machine:
$ uname -a; gcc --version; ls -l vmlinux.*
Linux erwin 3.14.7-mn #54 SMP Sun Jun 15 11:25:08 CEST 2014 x86_64 AMD Phenom(tm) II X3 710 Processor AuthenticAMD GNU/Linux
gcc (GCC) 4.8.3
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-rwx------ 1 mina86 mina86 230616126 Jun 17 15:39 vmlinux.before*
-rwx------ 1 mina86 mina86 230614861 Jun 17 14:36 vmlinux.after*
1265 bytes reduction.
--
Best regards, _ _
.o. | Liege of Serenely Enlightened Majesty of o' \,=./ `o
..o | Computer Science, Michał “mina86” Nazarewicz (o o)
ooo +--<mpn@google.com>--<xmpp:mina86@jabber.org>--ooO--(_)--Ooo--
next prev parent reply other threads:[~2014-06-17 13:49 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-06-16 21:07 [PATCH] include: kernel.h: rewrite min3, max3 and clamp using min and max Michal Nazarewicz
2014-06-16 23:18 ` Andrew Morton
2014-06-16 23:25 ` David Rientjes
2014-06-16 23:35 ` Andrew Morton
2014-06-16 23:54 ` David Rientjes
2014-06-17 0:21 ` Steven Rostedt
2014-06-17 4:01 ` Steven Rostedt
2014-06-17 13:49 ` Michal Nazarewicz [this message]
2014-06-17 11:37 ` Hagen Paul Pfeifer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xa1tionzwsry.fsf@mina86.com \
--to=mina86@mina86.com \
--cc=akpm@linux-foundation.org \
--cc=hagen@jauu.net \
--cc=linux-kernel@vger.kernel.org \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox