* [RFC 0/8] Improving compiler inlining decisions
@ 2018-05-15 14:11 Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
` (4 more replies)
0 siblings, 5 replies; 10+ messages in thread
From: Nadav Amit @ 2018-05-15 14:11 UTC (permalink / raw)
To: linux-kernel
Cc: nadav.amit, Nadav Amit, Alok Kataria, Christopher Li,
H. Peter Anvin, Ingo Molnar, Jan Beulich, Jonathan Corbet,
Josh Poimboeuf, Juergen Gross, Kees Cook, linux-sparse,
Peter Zijlstra, Randy Dunlap, Thomas Gleixner, virtualization,
x86
This patch-set deals with an interesting yet stupid problem: code that
does not get inlined despite its simplicity.
I find 5 classes of causes:
1. Inline assembly blocks in which code and data are added to
alternative sections. The compiler is oblivious to the content of the
blocks and assumes their cost in space and time is proportional to the
number of the perceived assembly "instruction", according to the number
of newlines and semicolons. Alternatives, paravirt and other mechanisms
are affected.
2. Inline assembly with redundant new-lines and semicolons. Similarly to
(1) this code is considered "heavier" than it actually is.
3. Code with constant value optimizations. Quite a few parts of the
kernel check whether a variable is constant (using
__builtin_constant_p()) and perform heavy computations in that case.
These computations are eventually optimized out so they do not land in
the binary. However, the cost of these computations is also associated
with the calling function, which might prevent inlining of the calling
function. ilog2() is an example for such case.
4. Code that is marked with the "cold" attribute, including all the
__init functions. Some may consider it the desired behavior.
5. Code that is marked with a different optimization levels. This
affects for example vmx_vcpu_run(), inducing overheads of up to 10% on
exit.
This patch-set deals with some instances of first 3 classes.
For (1) we insert an assembly macro, and call it from the inline
assembly block. As a result, the compiler sees a single "instruction"
and assigns the more appropriate cost to the code.
For (2) the solution is trivial: just remove the newlines.
(3) is somewhat tricky. The proposed solution is to use
__builtin_choose_expr() to check whether a variable is actually constant
instead of using an if-condition or the C ternary operator.
__builtin_choose_expr() is evaluated earlier in the compilation, so it
allows the compiler to associate the right cost for the variable case
before the inlining decisions take place. So far so good.
Still, there is a drawback. Since __builtin_choose_expr() is evaluated
earlier, it can fail to recognize constants, which an if-condition would
recognize correctly. As a result, this patch-set only applies it to the
simplest cases.
Overall this patch-set slightly increases the kernel size (my build was
done using localmodconfig + localyesconfig for the record):
text data bss dec hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%)
The patch-set eliminates many of the static text symbols:
Before: 40033
After: 39632 (-10%)
There is a measurable effect on performance in some cases. A loop of
MADV_DONTNEED/page-fault shows a 2% performance improvement with this
patch-set.
Some inline comments or self-explaining C macros might still be needed.
[1] https://lkml.org/lkml/2018/5/5/159
Cc: Alok Kataria <akataria@vmware.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-sparse@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: x86@kernel.org
Nadav Amit (8):
x86: objtool: use asm macro for better compiler decisions
x86: bug: prevent gcc distortions
x86: alternative: macrofy locks for better inlining
x86: prevent inline distortion by paravirt ops
x86: refcount: prevent gcc distortions
x86: removing unneeded new-lines
ilog2: preventing compiler distortion due to big condition
bitops: prevent compiler inline decision distortion
arch/x86/include/asm/alternative.h | 28 ++++++++++----
arch/x86/include/asm/asm.h | 4 +-
arch/x86/include/asm/bitops.h | 8 ++--
arch/x86/include/asm/bug.h | 48 ++++++++++++++---------
arch/x86/include/asm/cmpxchg.h | 10 ++---
arch/x86/include/asm/paravirt_types.h | 53 +++++++++++++++-----------
arch/x86/include/asm/refcount.h | 55 ++++++++++++++++-----------
arch/x86/include/asm/special_insns.h | 12 +++---
include/linux/compiler.h | 29 ++++++++++----
include/linux/log2.h | 11 +++---
10 files changed, 156 insertions(+), 102 deletions(-)
--
2.17.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 1/8] x86: objtool: use asm macro for better compiler decisions
2018-05-15 14:11 [RFC 0/8] Improving compiler inlining decisions Nadav Amit
@ 2018-05-15 14:11 ` Nadav Amit
2018-05-15 21:37 ` Josh Triplett
2018-05-15 14:11 ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Nadav Amit @ 2018-05-15 14:11 UTC (permalink / raw)
To: linux-kernel; +Cc: nadav.amit, Nadav Amit, Christopher Li, linux-sparse
GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.
In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.
The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.
This patch slightly increases the kernel size.
text data bss dec hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)
But allows more aggressive inlining. Static text symbols:
Before: 40033
After: 40015 (-18)
Cc: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
include/linux/compiler.h | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..369753000541 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -98,17 +98,30 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
* The __COUNTER__ based labels are a hack to make each instance of the macros
* unique, to convince GCC not to merge duplicate inline asm statements.
*/
+
+asm ("\n"
+ ".macro __annotate_reachable counter:req\n"
+ "\\counter:\n\t"
+ ".pushsection .discard.reachable\n\t"
+ ".long \\counter\\()b -.\n\t"
+ ".popsection\n"
+ ".endm\n");
+
#define annotate_reachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.reachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("__annotate_reachable %c0" : : "i" (__COUNTER__)); \
})
+
+asm ("\n"
+ ".macro __annotate_unreachable counter:req\n"
+ "\\counter:\n\t"
+ ".pushsection .discard.unreachable\n\t"
+ ".long \\counter\\()b -.\n\t"
+ ".popsection\n"
+ ".endm");
+
#define annotate_unreachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.unreachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("__annotate_unreachable %c0" : : \
+ "i" (__COUNTER__)); \
})
#define ASM_UNREACHABLE \
"999:\n\t" \
--
2.17.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC 0/8] Improving compiler inlining decisions
2018-05-15 14:11 [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
@ 2018-05-15 14:11 ` Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2018-05-15 14:11 UTC (permalink / raw)
To: linux-kernel
Cc: nadav.amit, Nadav Amit, Alok Kataria, Christopher Li,
H. Peter Anvin, Ingo Molnar, Jan Beulich, Jonathan Corbet,
Josh Poimboeuf, Juergen Gross, Kees Cook, linux-sparse,
Peter Zijlstra, Randy Dunlap, Thomas Gleixner, virtualization,
x86
This patch-set deals with an interesting yet stupid problem: code that
does not get inlined despite its simplicity.
I find 5 classes of causes:
1. Inline assembly blocks in which code and data are added to
alternative sections. The compiler is oblivious to the content of the
blocks and assumes their cost in space and time is proportional to the
number of the perceived assembly "instruction", according to the number
of newlines and semicolons. Alternatives, paravirt and other mechanisms
are affected.
2. Inline assembly with redundant new-lines and semicolons. Similarly to
(1) this code is considered "heavier" than it actually is.
3. Code with constant value optimizations. Quite a few parts of the
kernel check whether a variable is constant (using
__builtin_constant_p()) and perform heavy computations in that case.
These computations are eventually optimized out so they do not land in
the binary. However, the cost of these computations is also associated
with the calling function, which might prevent inlining of the calling
function. ilog2() is an example for such case.
4. Code that is marked with the "cold" attribute, including all the
__init functions. Some may consider it the desired behavior.
5. Code that is marked with a different optimization levels. This
affects for example vmx_vcpu_run(), inducing overheads of up to 10% on
exit.
This patch-set deals with some instances of first 3 classes.
For (1) we insert an assembly macro, and call it from the inline
assembly block. As a result, the compiler sees a single "instruction"
and assigns the more appropriate cost to the code.
For (2) the solution is trivial: just remove the newlines.
(3) is somewhat tricky. The proposed solution is to use
__builtin_choose_expr() to check whether a variable is actually constant
instead of using an if-condition or the C ternary operator.
__builtin_choose_expr() is evaluated earlier in the compilation, so it
allows the compiler to associate the right cost for the variable case
before the inlining decisions take place. So far so good.
Still, there is a drawback. Since __builtin_choose_expr() is evaluated
earlier, it can fail to recognize constants, which an if-condition would
recognize correctly. As a result, this patch-set only applies it to the
simplest cases.
Overall this patch-set slightly increases the kernel size (my build was
done using localmodconfig + localyesconfig for the record):
text data bss dec hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%)
The patch-set eliminates many of the static text symbols:
Before: 40033
After: 39632 (-10%)
There is a measurable effect on performance in some cases. A loop of
MADV_DONTNEED/page-fault shows a 2% performance improvement with this
patch-set.
Some inline comments or self-explaining C macros might still be needed.
[1] https://lkml.org/lkml/2018/5/5/159
Cc: Alok Kataria <akataria@vmware.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-sparse@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: x86@kernel.org
Nadav Amit (8):
x86: objtool: use asm macro for better compiler decisions
x86: bug: prevent gcc distortions
x86: alternative: macrofy locks for better inlining
x86: prevent inline distortion by paravirt ops
x86: refcount: prevent gcc distortions
x86: removing unneeded new-lines
ilog2: preventing compiler distortion due to big condition
bitops: prevent compiler inline decision distortion
arch/x86/include/asm/alternative.h | 28 ++++++++++----
arch/x86/include/asm/asm.h | 4 +-
arch/x86/include/asm/bitops.h | 8 ++--
arch/x86/include/asm/bug.h | 48 ++++++++++++++---------
arch/x86/include/asm/cmpxchg.h | 10 ++---
arch/x86/include/asm/paravirt_types.h | 53 +++++++++++++++-----------
arch/x86/include/asm/refcount.h | 55 ++++++++++++++++-----------
arch/x86/include/asm/special_insns.h | 12 +++---
include/linux/compiler.h | 29 ++++++++++----
include/linux/log2.h | 11 +++---
10 files changed, 156 insertions(+), 102 deletions(-)
--
2.17.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [RFC 1/8] x86: objtool: use asm macro for better compiler decisions
2018-05-15 14:11 [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-05-15 14:11 ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
@ 2018-05-15 14:11 ` Nadav Amit
2018-05-15 22:14 ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-16 3:48 ` Josh Poimboeuf
4 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2018-05-15 14:11 UTC (permalink / raw)
To: linux-kernel; +Cc: nadav.amit, Nadav Amit, Christopher Li, linux-sparse
GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.
In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.
The solution is to set an assembly macro and call it from the inlinedv
assembly block. As a result GCC considers the inline assembly block as
a single instruction.
This patch slightly increases the kernel size.
text data bss dec hex filename
18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)
But allows more aggressive inlining. Static text symbols:
Before: 40033
After: 40015 (-18)
Cc: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org
Signed-off-by: Nadav Amit <namit@vmware.com>
---
include/linux/compiler.h | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..369753000541 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -98,17 +98,30 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
* The __COUNTER__ based labels are a hack to make each instance of the macros
* unique, to convince GCC not to merge duplicate inline asm statements.
*/
+
+asm ("\n"
+ ".macro __annotate_reachable counter:req\n"
+ "\\counter:\n\t"
+ ".pushsection .discard.reachable\n\t"
+ ".long \\counter\\()b -.\n\t"
+ ".popsection\n"
+ ".endm\n");
+
#define annotate_reachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.reachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("__annotate_reachable %c0" : : "i" (__COUNTER__)); \
})
+
+asm ("\n"
+ ".macro __annotate_unreachable counter:req\n"
+ "\\counter:\n\t"
+ ".pushsection .discard.unreachable\n\t"
+ ".long \\counter\\()b -.\n\t"
+ ".popsection\n"
+ ".endm");
+
#define annotate_unreachable() ({ \
- asm volatile("%c0:\n\t" \
- ".pushsection .discard.unreachable\n\t" \
- ".long %c0b - .\n\t" \
- ".popsection\n\t" : : "i" (__COUNTER__)); \
+ asm volatile("__annotate_unreachable %c0" : : \
+ "i" (__COUNTER__)); \
})
#define ASM_UNREACHABLE \
"999:\n\t" \
--
2.17.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC 1/8] x86: objtool: use asm macro for better compiler decisions
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
@ 2018-05-15 21:37 ` Josh Triplett
2018-05-15 21:53 ` Nadav Amit
0 siblings, 1 reply; 10+ messages in thread
From: Josh Triplett @ 2018-05-15 21:37 UTC (permalink / raw)
To: Nadav Amit; +Cc: linux-kernel, nadav.amit, Christopher Li, linux-sparse
On Tue, May 15, 2018 at 07:11:08AM -0700, Nadav Amit wrote:
> GCC considers the number of statements in inlined assembly blocks,
> according to new-lines and semicolons, as an indication to the cost of
> the block in time and space. This data is distorted by the kernel code,
> which puts information in alternative sections. As a result, the
> compiler may perform incorrect inlining and branch optimizations.
>
> In the case of objtool, this distortion is extreme, since anyhow the
> annotations of objtool are discarded during linkage.
>
> The solution is to set an assembly macro and call it from the inlinedv
> assembly block. As a result GCC considers the inline assembly block as
> a single instruction.
>
> This patch slightly increases the kernel size.
>
> text data bss dec hex filename
> 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
> 18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)
With what kernel config? What's the text size change for an allnoconfig?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/8] x86: objtool: use asm macro for better compiler decisions
2018-05-15 21:37 ` Josh Triplett
@ 2018-05-15 21:53 ` Nadav Amit
2018-05-15 21:55 ` Josh Triplett
0 siblings, 1 reply; 10+ messages in thread
From: Nadav Amit @ 2018-05-15 21:53 UTC (permalink / raw)
To: Josh Triplett; +Cc: linux-kernel, Christopher Li, linux-sparse
Josh Triplett <josh@joshtriplett.org> wrote:
> On Tue, May 15, 2018 at 07:11:08AM -0700, Nadav Amit wrote:
>> GCC considers the number of statements in inlined assembly blocks,
>> according to new-lines and semicolons, as an indication to the cost of
>> the block in time and space. This data is distorted by the kernel code,
>> which puts information in alternative sections. As a result, the
>> compiler may perform incorrect inlining and branch optimizations.
>>
>> In the case of objtool, this distortion is extreme, since anyhow the
>> annotations of objtool are discarded during linkage.
>>
>> The solution is to set an assembly macro and call it from the inlinedv
>> assembly block. As a result GCC considers the inline assembly block as
>> a single instruction.
>>
>> This patch slightly increases the kernel size.
>>
>> text data bss dec hex filename
>> 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
>> 18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)
>
> With what kernel config? What's the text size change for an allnoconfig?
I use my custom config and to include the drivers in the image. Other
configs were not too appropriate: defconfig does not enable paravirt and
allyesconfig is too heavy.
There is no effect with allnoconfig on size:
text data bss dec hex filename
1018441 230148 1215604 2464193 2599c1 ./vmlinux
1018441 230148 1215604 2464193 2599c1 ./vmlinux
I think it is expected since CONFIG_STACK_VALIDATION is off. No?
Thanks,
Nadav
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 1/8] x86: objtool: use asm macro for better compiler decisions
2018-05-15 21:53 ` Nadav Amit
@ 2018-05-15 21:55 ` Josh Triplett
0 siblings, 0 replies; 10+ messages in thread
From: Josh Triplett @ 2018-05-15 21:55 UTC (permalink / raw)
To: Nadav Amit; +Cc: linux-kernel, Christopher Li, linux-sparse
On Tue, May 15, 2018 at 02:53:52PM -0700, Nadav Amit wrote:
> Josh Triplett <josh@joshtriplett.org> wrote:
>
> > On Tue, May 15, 2018 at 07:11:08AM -0700, Nadav Amit wrote:
> >> GCC considers the number of statements in inlined assembly blocks,
> >> according to new-lines and semicolons, as an indication to the cost of
> >> the block in time and space. This data is distorted by the kernel code,
> >> which puts information in alternative sections. As a result, the
> >> compiler may perform incorrect inlining and branch optimizations.
> >>
> >> In the case of objtool, this distortion is extreme, since anyhow the
> >> annotations of objtool are discarded during linkage.
> >>
> >> The solution is to set an assembly macro and call it from the inlinedv
> >> assembly block. As a result GCC considers the inline assembly block as
> >> a single instruction.
> >>
> >> This patch slightly increases the kernel size.
> >>
> >> text data bss dec hex filename
> >> 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
> >> 18126824 10067268 2936832 31130924 1db052c ./vmlinux after (+665)
> >
> > With what kernel config? What's the text size change for an allnoconfig?
>
> I use my custom config and to include the drivers in the image. Other
> configs were not too appropriate: defconfig does not enable paravirt and
> allyesconfig is too heavy.
>
> There is no effect with allnoconfig on size:
>
> text data bss dec hex filename
> 1018441 230148 1215604 2464193 2599c1 ./vmlinux
> 1018441 230148 1215604 2464193 2599c1 ./vmlinux
>
> I think it is expected since CONFIG_STACK_VALIDATION is off. No?
Thanks for checking; fine by me then. :)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/8] Improving compiler inlining decisions
2018-05-15 14:11 [RFC 0/8] Improving compiler inlining decisions Nadav Amit
` (2 preceding siblings ...)
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
@ 2018-05-15 22:14 ` Nadav Amit
2018-05-16 3:48 ` Josh Poimboeuf
4 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2018-05-15 22:14 UTC (permalink / raw)
To: linux-kernel@vger.kernel.org
Cc: Alok Kataria, Christopher Li, H. Peter Anvin, Ingo Molnar,
Jan Beulich, Jonathan Corbet, Josh Poimboeuf, Juergen Gross,
Kees Cook, linux-sparse@vger.kernel.org, Peter Zijlstra,
Randy Dunlap, Thomas Gleixner,
virtualization@lists.linux-foundation.org, x86@kernel.org
Nadav Amit <namit@vmware.com> wrote:
> This patch-set deals with an interesting yet stupid problem: code that
> does not get inlined despite its simplicity.
>
> I find 5 classes of causes:
>
> 1. Inline assembly blocks in which code and data are added to
> alternative sections. The compiler is oblivious to the content of the
> blocks and assumes their cost in space and time is proportional to the
> number of the perceived assembly "instruction", according to the number
> of newlines and semicolons. Alternatives, paravirt and other mechanisms
> are affected.
>
> 2. Inline assembly with redundant new-lines and semicolons. Similarly to
> (1) this code is considered "heavier" than it actually is.
>
> 3. Code with constant value optimizations. Quite a few parts of the
> kernel check whether a variable is constant (using
> __builtin_constant_p()) and perform heavy computations in that case.
> These computations are eventually optimized out so they do not land in
> the binary. However, the cost of these computations is also associated
> with the calling function, which might prevent inlining of the calling
> function. ilog2() is an example for such case.
>
> 4. Code that is marked with the "cold" attribute, including all the
> __init functions. Some may consider it the desired behavior.
>
> 5. Code that is marked with a different optimization levels. This
> affects for example vmx_vcpu_run(), inducing overheads of up to 10% on
> exit.
>
>
> This patch-set deals with some instances of first 3 classes.
>
> For (1) we insert an assembly macro, and call it from the inline
> assembly block. As a result, the compiler sees a single "instruction"
> and assigns the more appropriate cost to the code.
>
> For (2) the solution is trivial: just remove the newlines.
>
> (3) is somewhat tricky. The proposed solution is to use
> __builtin_choose_expr() to check whether a variable is actually constant
> instead of using an if-condition or the C ternary operator.
> __builtin_choose_expr() is evaluated earlier in the compilation, so it
> allows the compiler to associate the right cost for the variable case
> before the inlining decisions take place. So far so good.
>
> Still, there is a drawback. Since __builtin_choose_expr() is evaluated
> earlier, it can fail to recognize constants, which an if-condition would
> recognize correctly. As a result, this patch-set only applies it to the
> simplest cases.
>
> Overall this patch-set slightly increases the kernel size (my build was
> done using localmodconfig + localyesconfig for the record):
>
> text data bss dec hex filename
> 18126699 10066728 2936832 31130259 1db0293 ./vmlinux before
> 18149210 10064048 2936832 31150090 1db500a ./vmlinux after (+0.06%)
>
> The patch-set eliminates many of the static text symbols:
> Before: 40033
> After: 39632 (-10%)
Oops. Should be -1%...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/8] Improving compiler inlining decisions
2018-05-15 14:11 [RFC 0/8] Improving compiler inlining decisions Nadav Amit
` (3 preceding siblings ...)
2018-05-15 22:14 ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
@ 2018-05-16 3:48 ` Josh Poimboeuf
2018-05-16 4:30 ` Nadav Amit
4 siblings, 1 reply; 10+ messages in thread
From: Josh Poimboeuf @ 2018-05-16 3:48 UTC (permalink / raw)
To: Nadav Amit
Cc: Juergen Gross, x86, Kees Cook, Jonathan Corbet, Peter Zijlstra,
Christopher Li, Randy Dunlap, linux-kernel, virtualization,
linux-sparse, Ingo Molnar, Jan Beulich, H. Peter Anvin,
Alok Kataria, nadav.amit, Thomas Gleixner
On Tue, May 15, 2018 at 07:11:07AM -0700, Nadav Amit wrote:
> This patch-set deals with an interesting yet stupid problem: code that
> does not get inlined despite its simplicity.
I got the 0/8 patch twice, and didn't get the 1/8 patch. Was there an
issue with the sending of the patches?
--
Josh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC 0/8] Improving compiler inlining decisions
2018-05-16 3:48 ` Josh Poimboeuf
@ 2018-05-16 4:30 ` Nadav Amit
0 siblings, 0 replies; 10+ messages in thread
From: Nadav Amit @ 2018-05-16 4:30 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: Linux Kernel Mailing List, Alok Kataria, Christopher Li,
H. Peter Anvin, Ingo Molnar, Jan Beulich, Jonathan Corbet,
Juergen Gross, Kees Cook, linux-sparse@vger.kernel.org,
Peter Zijlstra, Randy Dunlap, Thomas Gleixner,
virtualization@lists.linux-foundation.org, x86@kernel.org
Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Tue, May 15, 2018 at 07:11:07AM -0700, Nadav Amit wrote:
>> This patch-set deals with an interesting yet stupid problem: code that
>> does not get inlined despite its simplicity.
>
> I got the 0/8 patch twice, and didn't get the 1/8 patch. Was there an
> issue with the sending of the patches?
Strange. I am not sure why it happened. Anyhow 1/8 is available here:
https://lkml.org/lkml/2018/5/15/961 . I will forward it to you.
Thanks,
Nadav
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2018-05-16 4:30 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-05-15 14:11 [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-05-15 21:37 ` Josh Triplett
2018-05-15 21:53 ` Nadav Amit
2018-05-15 21:55 ` Josh Triplett
2018-05-15 14:11 ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-15 14:11 ` [RFC 1/8] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-05-15 22:14 ` [RFC 0/8] Improving compiler inlining decisions Nadav Amit
2018-05-16 3:48 ` Josh Poimboeuf
2018-05-16 4:30 ` Nadav Amit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).