All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: pabeni@redhat.com, Jens Axboe <axboe@kernel.dk>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	bp@alien8.de, Peter Anvin <hpa@zytor.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrew Lutomirski <luto@kernel.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	dvlasenk@redhat.com, brgerst@gmail.com,
	Linux List Kernel Mailing <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes
Date: Thu, 22 Nov 2018 12:13:41 +0100	[thread overview]
Message-ID: <20181122111341.GA107459@gmail.com> (raw)
In-Reply-To: <20181122103231.GA102790@gmail.com>


* Ingo Molnar <mingo@kernel.org> wrote:

> The kernel text size reduction with Jen's patch is small but real:
> 
>  text		data		bss		dec		hex	filename
>  19572694	11516934	19873888	50963516	309a43c	vmlinux.before
>  19572468	11516934	19873888	50963290	309a35a	vmlinux.after
> 
> But I checked the disassembly, and it's not a real win, the new code is 
> actually more complex than the old one, as expected, but GCC (7.3.0) does 
> some particularly stupid things which bloats the generated code.

So I dug into this some more:

1)

Firstly I tracked down GCC bloating the might_fault() checks and the 
related out-of-line code exception handling which bloats the full 
generated function.

2)

But with even that complication eliminated, there's a size reduction when 
Jen's patch is applied, which is puzzling:

19563640	11516790	19882080	50962510	309a04e	vmlinux.before
19563274	11516790	19882080	50962144	3099ee0	vmlinux.after

but this is entirely due to the .altinstructions section being counted as 
'text' part of the vmlinux - while in reality it's not:

3)

The _real_ part of the vmlinux gets bloated by Jen's patch:

 ffffffff81000000 <_stext>:

 before:  ffffffff81b0e5e0 <__clear_user>
 after:   ffffffff81b0e670 <__clear_user>:

I.e. we get a e5e0 => e670 bloat, as expected.

In the config I tested a later section of the kernel image first aligns 
away the bloat:

 before: ffffffff82fa6321 <.altinstr_aux>:
 after:  ffffffff82fa6321 <.altinstr_aux>:

and then artificially debloats the modified kernel via the 
altinstructions section:

  before: Disassembly of section .exit.text: ffffffff83160798 <intel_uncore_exit>
  after:  Disassembly of section .exit.text: ffffffff83160608 <intel_uncore_exit>

Note that there's a third level of obfuscation here: Jen's patch actually 
*adds* a new altinstructions statement:

+       /*
+        * For smaller copies, don't use ERMS as it's slower.
+        */
+       if (len < 128) {
+               alternative_call(copy_user_generic_unrolled,
+                                copy_user_generic_string, X86_FEATURE_REP_GOOD,
+                                ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
+                                            "=d" (len)),
+                                "1" (to), "2" (from), "3" (len)
+                                : "memory", "rcx", "r8", "r9", "r10", "r11");
+               return ret;
+       }
+
        /*
         * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
         * Otherwise, if CPU has rep_good feature, use copy_user_generic_string.
         * Otherwise, use copy_user_generic_unrolled.
         */
        alternative_call_2(copy_user_generic_unrolled,
-                        copy_user_generic_string,
-                        X86_FEATURE_REP_GOOD,
-                        copy_user_enhanced_fast_string,
-                        X86_FEATURE_ERMS,
+                        copy_user_generic_string, X86_FEATURE_REP_GOOD,
+                        copy_user_enhanced_fast_string, X86_FEATURE_ERMS,
                         ASM_OUTPUT2("=a" (ret), "=D" (to), "=S" (from),
                                     "=d" (len)),
                         "1" (to), "2" (from), "3" (len)

So how can this change possibly result in a *small* altinstructions 
section?

4)

The reason is GCC's somewhat broken __builtin_constant() logic, which 
leaves ~10% of the constant call sites actually active, but which are 
then optimized by GCC's later stages, and the alternative_call_2() gets 
optimized out and replaced with the alternative_call() call.

This is where Jens's patch 'debloats' the vmlinux and confuses the 'size' 
utility and gains its code reduction street cred.

Note to self: watch out for patches that change altinstructions and don't 
make premature vmlinux size impact assumptions. :-)

Thanks,

	Ingo

  reply	other threads:[~2018-11-22 11:13 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <02bfc577-32a5-66be-64bf-d476b7d447d2@kernel.dk>
2018-11-20 20:24 ` [PATCH] x86: only use ERMS for user copies for larger sizes Jens Axboe
2018-11-21  6:36 ` Ingo Molnar
2018-11-21 13:32   ` Jens Axboe
2018-11-21 13:44     ` Denys Vlasenko
2018-11-22 17:36       ` David Laight
2018-11-22 17:52         ` Linus Torvalds
2018-11-22 18:06           ` Andy Lutomirski
2018-11-22 18:58             ` Linus Torvalds
2018-11-23  9:34               ` David Laight
2018-11-23 10:12                 ` David Laight
2018-11-23 16:36                   ` Linus Torvalds
2018-11-23 17:42                     ` Linus Torvalds
2018-11-23 18:39                       ` Andy Lutomirski
2018-11-23 18:44                         ` Linus Torvalds
2018-11-23 19:11                           ` Andy Lutomirski
2018-11-26 10:12                             ` David Laight
2018-11-26 10:01                     ` David Laight
2018-11-26 10:26                     ` David Laight
2019-01-05  2:38                       ` Linus Torvalds
2019-01-07  9:55                         ` David Laight
2019-01-07 17:43                           ` Linus Torvalds
2019-01-08  9:10                             ` David Laight
2019-01-08 18:01                               ` Linus Torvalds
2018-11-21 13:45     ` Paolo Abeni
2018-11-21 17:27       ` Linus Torvalds
2018-11-21 18:04         ` Jens Axboe
2018-11-21 18:26           ` Andy Lutomirski
2018-11-21 18:43             ` Linus Torvalds
2018-11-21 22:38               ` Andy Lutomirski
2018-11-21 18:16         ` Linus Torvalds
2018-11-21 19:01           ` Linus Torvalds
2018-11-22 10:32             ` Ingo Molnar
2018-11-22 11:13               ` Ingo Molnar [this message]
2018-11-22 11:21                 ` Ingo Molnar
2018-11-23 16:40                 ` Josh Poimboeuf
2018-11-22 16:55               ` Linus Torvalds
2018-11-22 17:26                 ` Andy Lutomirski
2018-11-22 17:35                   ` Linus Torvalds
2018-11-24  6:09           ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181122111341.GA107459@gmail.com \
    --to=mingo@kernel.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.