linux-sparse.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] x86: macrofying inline asm for better compilation
@ 2018-06-04 11:21 Nadav Amit
  2018-06-04 11:21 ` [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions Nadav Amit
  2018-06-04 19:05 ` [PATCH v2 0/9] x86: macrofying inline asm for better compilation Josh Poimboeuf
  0 siblings, 2 replies; 6+ messages in thread
From: Nadav Amit @ 2018-06-04 11:21 UTC (permalink / raw)
  To: linux-kernel, x86
  Cc: Nadav Amit, Alok Kataria, Christopher Li, Greg Kroah-Hartman,
	H. Peter Anvin, Ingo Molnar, Jan Beulich, Josh Poimboeuf,
	Juergen Gross, Kate Stewart, Kees Cook, linux-sparse,
	Peter Zijlstra, Philippe Ombredanne, Thomas Gleixner,
	virtualization, Linus Torvalds

This patch-set deals with an interesting yet stupid problem: kernel code
that does not get inlined despite its simplicity. There are several
causes for this behavior: "cold" attribute on __init, different function
optimization levels; conditional constant computations based on
__builtin_constant_p(); and finally large inline assembly blocks.

This patch-set deals with the inline assembly problem. I separated these
patches from the others (that were sent in the RFC) for easier
inclusion. I also separated the removal of unnecessary new-lines which
would be sent separately.

The problem with inline assembly is that inline assembly is often used
by the kernel for things that are other than code - for example,
assembly directives and data. GCC however is oblivious to the content of
the blocks and assumes their cost in space and time is proportional to
the number of the perceived assembly "instruction", according to the
number of newlines and semicolons. Alternatives, paravirt and other
mechanisms are affected, causing code not to be inlined, and degrading
compilation quality in general.

The solution that this patch-set carries for this problem is to create
an assembly macro, and then call it from the inline assembly block.  As
a result, the compiler sees a single "instruction" and assigns the more
appropriate cost to the code.

To avoid uglification of the code, as many noted, the macros are first
precompiled into an assembly file, which is later assembled together
with the the C files. This also enables to avoid duplicate
implementation that was set before for the asm and C code. This can be
seen in the exception table changes.

Overall this patch-set slightly increases the kernel size (my build was
done using my Ubuntu 18.04 config + localyesconfig for the record):

   text	   data	    bss	    dec	    hex	filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)

The number of static functions in the image is reduced by 379, but
actually inlining is even better, which does not always shows in these
numbers: a function may be inlined causing the calling function not to
be inlined.

The Makefile stuff may not be too clean. Ideas for improvements are
welcome.

v1->v2:	* Compiling the macros into a separate .s file, improving
	  readability (Linus)
	* Improving assembly formatting, applying most of the comments
	  according to my judgment (Jan)
	* Adding exception-table, cpufeature and jump-labels
	* Removing new-line cleanup; to be submitted separately

Cc: Alok Kataria <akataria@vmware.com>
Cc: Christopher Li <sparse@chrisli.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jan Beulich <JBeulich@suse.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Kate Stewart <kstewart@linuxfoundation.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: linux-sparse@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Philippe Ombredanne <pombredanne@nexb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: x86@kernel.org

Nadav Amit (9):
  Makefile: Prepare for using macros for inline asm
  x86: objtool: use asm macro for better compiler decisions
  x86: refcount: prevent gcc distortions
  x86: alternatives: macrofy locks for better inlining
  x86: bug: prevent gcc distortions
  x86: prevent inline distortion by paravirt ops
  x86: extable: use macros instead of inline assembly
  x86: cpufeature: use macros instead of inline assembly
  x86: jump-labels: use macros instead of inline assembly

 Makefile                               |  9 ++-
 arch/x86/Makefile                      | 11 ++-
 arch/x86/include/asm/alternative-asm.h | 20 ++++--
 arch/x86/include/asm/alternative.h     | 16 +----
 arch/x86/include/asm/asm.h             | 61 +++++++---------
 arch/x86/include/asm/bug.h             | 98 +++++++++++++++-----------
 arch/x86/include/asm/cpufeature.h      | 82 ++++++++++++---------
 arch/x86/include/asm/jump_label.h      | 65 ++++++++++-------
 arch/x86/include/asm/paravirt_types.h  | 54 +++++++-------
 arch/x86/include/asm/refcount.h        | 73 +++++++++++--------
 arch/x86/kernel/Makefile               |  6 ++
 arch/x86/kernel/macros.S               | 16 +++++
 include/linux/compiler.h               | 60 ++++++++++++----
 scripts/Kbuild.include                 |  4 +-
 14 files changed, 346 insertions(+), 229 deletions(-)
 create mode 100644 arch/x86/kernel/macros.S

-- 
2.17.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions
  2018-06-04 11:21 [PATCH v2 0/9] x86: macrofying inline asm for better compilation Nadav Amit
@ 2018-06-04 11:21 ` Nadav Amit
  2018-06-04 19:04   ` Josh Poimboeuf
  2018-06-05  5:41   ` kbuild test robot
  2018-06-04 19:05 ` [PATCH v2 0/9] x86: macrofying inline asm for better compilation Josh Poimboeuf
  1 sibling, 2 replies; 6+ messages in thread
From: Nadav Amit @ 2018-06-04 11:21 UTC (permalink / raw)
  To: linux-kernel, x86; +Cc: Nadav Amit, Christopher Li, linux-sparse

GCC considers the number of statements in inlined assembly blocks,
according to new-lines and semicolons, as an indication to the cost of
the block in time and space. This data is distorted by the kernel code,
which puts information in alternative sections. As a result, the
compiler may perform incorrect inlining and branch optimizations.

In the case of objtool, this distortion is extreme, since anyhow the
annotations of objtool are discarded during linkage.

The solution is to set an assembly macro and call it from the inline
assembly block. As a result GCC considers the inline assembly block as
a single instruction.

This patch slightly increases the kernel size.

   text	   data	    bss	    dec	    hex	filename
18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
18140970 10225412 2957312 31323694 1ddf62e ./vmlinux after (+829)

Static text symbols:
Before:	40321
After:	40302	(-19)

Cc: Christopher Li <sparse@chrisli.org>
Cc: linux-sparse@vger.kernel.org

Signed-off-by: Nadav Amit <namit@vmware.com>
---
 arch/x86/kernel/macros.S |  2 ++
 include/linux/compiler.h | 60 +++++++++++++++++++++++++++++++---------
 2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
index cfc1c7d1a6eb..cee28c3246dc 100644
--- a/arch/x86/kernel/macros.S
+++ b/arch/x86/kernel/macros.S
@@ -5,3 +5,5 @@
  * commonly used. The macros are precompiled into assmebly file which is later
  * assembled together with each compiled file.
  */
+
+#include <linux/compiler.h>
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index ab4711c63601..d10e752036c4 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -91,6 +91,10 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
 # define barrier_before_unreachable() do { } while (0)
 #endif
 
+/* A wrapper to clearly document when a macro is used */
+#define __ASM_MACRO(name, ...)		__stringify(name) __stringify(__VA_ARGS__)
+#define ASM_MACRO(name, ...)		__ASM_MACRO(name, __VA_ARGS__) "\n\t"
+
 /* Unreachable code */
 #ifdef CONFIG_STACK_VALIDATION
 /*
@@ -99,22 +103,13 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val,
  * unique, to convince GCC not to merge duplicate inline asm statements.
  */
 #define annotate_reachable() ({						\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.reachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("ANNOTATE_REACHABLE counter=%c0"			\
+		     : : "i" (__COUNTER__));				\
 })
 #define annotate_unreachable() ({					\
-	asm volatile("%c0:\n\t"						\
-		     ".pushsection .discard.unreachable\n\t"		\
-		     ".long %c0b - .\n\t"				\
-		     ".popsection\n\t" : : "i" (__COUNTER__));		\
+	asm volatile("ANNOTATE_UNREACHABLE counter=%c0"		\
+		     : : "i" (__COUNTER__));				\
 })
-#define ASM_UNREACHABLE							\
-	"999:\n\t"							\
-	".pushsection .discard.unreachable\n\t"				\
-	".long 999b - .\n\t"						\
-	".popsection\n\t"
 #else
 #define annotate_reachable()
 #define annotate_unreachable()
@@ -280,6 +275,45 @@ unsigned long read_word_at_a_time(const void *addr)
 
 #endif /* __KERNEL__ */
 
+#else /* __ASSEMBLY__ */
+
+#ifdef __KERNEL__
+#ifndef LINKER_SCRIPT
+
+#ifdef CONFIG_STACK_VALIDATION
+.macro ANNOTATE_UNREACHABLE counter:req
+\counter:
+	.pushsection .discard.unreachable
+	.long \counter\()b -.
+	.popsection
+.endm
+
+.macro ANNOTATE_REACHABLE counter:req
+\counter:
+	.pushsection .discard.reachable
+	.long \counter\()b -.
+	.popsection
+.endm
+
+.macro ASM_UNREACHABLE
+999:
+	.pushsection .discard.unreachable
+	.long 999b - .
+	.popsection
+.endm
+#else /* CONFIG_STACK_VALIDATION */
+.macro ANNOTATE_UNREACHABLE counter:req
+.endm
+
+.macro ANNOTATE_UNREACHABLE counter:req
+.endm
+
+.macro ASM_UNREACHABLE
+.endm /* CONFIG_STACK_VALIDATION */
+#endif
+
+#endif /* LINKER_SCRIPT */
+#endif /* __KERNEL__ */
 #endif /* __ASSEMBLY__ */
 
 #ifndef __optimize
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions
  2018-06-04 11:21 ` [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions Nadav Amit
@ 2018-06-04 19:04   ` Josh Poimboeuf
  2018-06-05  5:41   ` kbuild test robot
  1 sibling, 0 replies; 6+ messages in thread
From: Josh Poimboeuf @ 2018-06-04 19:04 UTC (permalink / raw)
  To: Nadav Amit; +Cc: linux-kernel, x86, Christopher Li, linux-sparse

On Mon, Jun 04, 2018 at 04:21:24AM -0700, Nadav Amit wrote:
> +#ifdef CONFIG_STACK_VALIDATION
> +.macro ANNOTATE_UNREACHABLE counter:req
> +\counter:
> +	.pushsection .discard.unreachable
> +	.long \counter\()b -.
> +	.popsection
> +.endm
> +
> +.macro ANNOTATE_REACHABLE counter:req
> +\counter:
> +	.pushsection .discard.reachable
> +	.long \counter\()b -.
> +	.popsection
> +.endm
> +
> +.macro ASM_UNREACHABLE
> +999:
> +	.pushsection .discard.unreachable
> +	.long 999b - .
> +	.popsection
> +.endm
> +#else /* CONFIG_STACK_VALIDATION */
> +.macro ANNOTATE_UNREACHABLE counter:req
> +.endm
> +
> +.macro ANNOTATE_UNREACHABLE counter:req
> +.endm
> +
> +.macro ASM_UNREACHABLE
> +.endm /* CONFIG_STACK_VALIDATION */
> +#endif

The '/* CONFIG_STACK_VALIDATION */' comment is on the wrong line.

Otherwise:

Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/9] x86: macrofying inline asm for better compilation
  2018-06-04 11:21 [PATCH v2 0/9] x86: macrofying inline asm for better compilation Nadav Amit
  2018-06-04 11:21 ` [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions Nadav Amit
@ 2018-06-04 19:05 ` Josh Poimboeuf
  2018-06-04 19:56   ` Nadav Amit
  1 sibling, 1 reply; 6+ messages in thread
From: Josh Poimboeuf @ 2018-06-04 19:05 UTC (permalink / raw)
  To: Nadav Amit
  Cc: Juergen Gross, Kate Stewart, Kees Cook, Peter Zijlstra,
	Greg Kroah-Hartman, Christopher Li, x86, linux-kernel,
	Philippe Ombredanne, virtualization, linux-sparse, Ingo Molnar,
	Jan Beulich, H. Peter Anvin, Alok Kataria, Linus Torvalds,
	Thomas Gleixner

On Mon, Jun 04, 2018 at 04:21:22AM -0700, Nadav Amit wrote:
> This patch-set deals with an interesting yet stupid problem: kernel code
> that does not get inlined despite its simplicity. There are several
> causes for this behavior: "cold" attribute on __init, different function
> optimization levels; conditional constant computations based on
> __builtin_constant_p(); and finally large inline assembly blocks.
> 
> This patch-set deals with the inline assembly problem. I separated these
> patches from the others (that were sent in the RFC) for easier
> inclusion. I also separated the removal of unnecessary new-lines which
> would be sent separately.
> 
> The problem with inline assembly is that inline assembly is often used
> by the kernel for things that are other than code - for example,
> assembly directives and data. GCC however is oblivious to the content of
> the blocks and assumes their cost in space and time is proportional to
> the number of the perceived assembly "instruction", according to the
> number of newlines and semicolons. Alternatives, paravirt and other
> mechanisms are affected, causing code not to be inlined, and degrading
> compilation quality in general.
> 
> The solution that this patch-set carries for this problem is to create
> an assembly macro, and then call it from the inline assembly block.  As
> a result, the compiler sees a single "instruction" and assigns the more
> appropriate cost to the code.
> 
> To avoid uglification of the code, as many noted, the macros are first
> precompiled into an assembly file, which is later assembled together
> with the the C files. This also enables to avoid duplicate
> implementation that was set before for the asm and C code. This can be
> seen in the exception table changes.
> 
> Overall this patch-set slightly increases the kernel size (my build was
> done using my Ubuntu 18.04 config + localyesconfig for the record):
> 
>    text	   data	    bss	    dec	    hex	filename
> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
> 
> The number of static functions in the image is reduced by 379, but
> actually inlining is even better, which does not always shows in these
> numbers: a function may be inlined causing the calling function not to
> be inlined.
> 
> The Makefile stuff may not be too clean. Ideas for improvements are
> welcome.
> 
> v1->v2:	* Compiling the macros into a separate .s file, improving
> 	  readability (Linus)
> 	* Improving assembly formatting, applying most of the comments
> 	  according to my judgment (Jan)
> 	* Adding exception-table, cpufeature and jump-labels
> 	* Removing new-line cleanup; to be submitted separately

How did you find these issues?  Is there some way to find them
automatically in the future?  Perhaps with a GCC plugin?

-- 
Josh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/9] x86: macrofying inline asm for better compilation
  2018-06-04 19:05 ` [PATCH v2 0/9] x86: macrofying inline asm for better compilation Josh Poimboeuf
@ 2018-06-04 19:56   ` Nadav Amit
  0 siblings, 0 replies; 6+ messages in thread
From: Nadav Amit @ 2018-06-04 19:56 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linux Kernel Mailing List, the arch/x86 maintainers, Alok Kataria,
	Christopher Li, Greg Kroah-Hartman, H. Peter Anvin, Ingo Molnar,
	Jan Beulich, Juergen Gross, Kate Stewart, Kees Cook,
	linux-sparse@vger.kernel.org, Peter Zijlstra, Philippe Ombredanne,
	Thomas Gleixner, virtualization@lists.linux-foundation.org,
	Linus Torvalds

Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Mon, Jun 04, 2018 at 04:21:22AM -0700, Nadav Amit wrote:
>> This patch-set deals with an interesting yet stupid problem: kernel code
>> that does not get inlined despite its simplicity. There are several
>> causes for this behavior: "cold" attribute on __init, different function
>> optimization levels; conditional constant computations based on
>> __builtin_constant_p(); and finally large inline assembly blocks.
>> 
>> This patch-set deals with the inline assembly problem. I separated these
>> patches from the others (that were sent in the RFC) for easier
>> inclusion. I also separated the removal of unnecessary new-lines which
>> would be sent separately.
>> 
>> The problem with inline assembly is that inline assembly is often used
>> by the kernel for things that are other than code - for example,
>> assembly directives and data. GCC however is oblivious to the content of
>> the blocks and assumes their cost in space and time is proportional to
>> the number of the perceived assembly "instruction", according to the
>> number of newlines and semicolons. Alternatives, paravirt and other
>> mechanisms are affected, causing code not to be inlined, and degrading
>> compilation quality in general.
>> 
>> The solution that this patch-set carries for this problem is to create
>> an assembly macro, and then call it from the inline assembly block.  As
>> a result, the compiler sees a single "instruction" and assigns the more
>> appropriate cost to the code.
>> 
>> To avoid uglification of the code, as many noted, the macros are first
>> precompiled into an assembly file, which is later assembled together
>> with the the C files. This also enables to avoid duplicate
>> implementation that was set before for the asm and C code. This can be
>> seen in the exception table changes.
>> 
>> Overall this patch-set slightly increases the kernel size (my build was
>> done using my Ubuntu 18.04 config + localyesconfig for the record):
>> 
>>   text	   data	    bss	    dec	    hex	filename
>> 18140829 10224724 2957312 31322865 1ddf2f1 ./vmlinux before
>> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+0.1%)
>> 
>> The number of static functions in the image is reduced by 379, but
>> actually inlining is even better, which does not always shows in these
>> numbers: a function may be inlined causing the calling function not to
>> be inlined.
>> 
>> The Makefile stuff may not be too clean. Ideas for improvements are
>> welcome.
>> 
>> v1->v2:	* Compiling the macros into a separate .s file, improving
>> 	  readability (Linus)
>> 	* Improving assembly formatting, applying most of the comments
>> 	  according to my judgment (Jan)
>> 	* Adding exception-table, cpufeature and jump-labels
>> 	* Removing new-line cleanup; to be submitted separately
> 
> How did you find these issues?  Is there some way to find them
> automatically in the future?  Perhaps with a GCC plugin?

Initially I found it while developing something unrelated and seeing the
disassembly going crazy for no good reason.

One way to see problematic functions is finding duplicate static functions,
which mostly happens when inline function in a header is not inlined:

	nm ./vmlinux | grep ' t ' | cut -d' ' -f3 | uniq -c | sort | \
	grep -v '      1’ 

But due to all kind of reasons (duplicate function names, inlined functions
which are being set a function pointers), it still requires manual work to
filter the false-positive.

Another way is to look on small functions, doing something like:
	nm --print-size ./vmlinux | grep ' t ' | cut -d' ' -f2- | sort | \
	head -n 10000

But again, there are many false-positives so I only looked at functions that
I know or only considered those that are marked as “inline”.

I don’t know how this process can be fully automated.

Regards,
Nadav


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions
  2018-06-04 11:21 ` [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions Nadav Amit
  2018-06-04 19:04   ` Josh Poimboeuf
@ 2018-06-05  5:41   ` kbuild test robot
  1 sibling, 0 replies; 6+ messages in thread
From: kbuild test robot @ 2018-06-05  5:41 UTC (permalink / raw)
  Cc: kbuild-all, linux-kernel, x86, Nadav Amit, Christopher Li,
	linux-sparse

[-- Attachment #1: Type: text/plain, Size: 1180 bytes --]

Hi Nadav,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.17 next-20180604]
[cannot apply to tip/x86/core]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Nadav-Amit/x86-macrofying-inline-asm-for-better-compilation/20180605-124313
config: c6x-evmc6678_defconfig (attached as .config)
compiler: c6x-elf-gcc (GCC) 8.1.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=c6x 

All errors (new ones prefixed by >>):

   include/linux/compiler.h: Assembler messages:
>> include/linux/compiler.h:308: Error: Macro `annotate_unreachable' was already defined

vim +308 include/linux/compiler.h

   307	
 > 308	.macro ANNOTATE_UNREACHABLE counter:req
   309	.endm
   310	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 4959 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-05  5:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-04 11:21 [PATCH v2 0/9] x86: macrofying inline asm for better compilation Nadav Amit
2018-06-04 11:21 ` [PATCH v2 2/9] x86: objtool: use asm macro for better compiler decisions Nadav Amit
2018-06-04 19:04   ` Josh Poimboeuf
2018-06-05  5:41   ` kbuild test robot
2018-06-04 19:05 ` [PATCH v2 0/9] x86: macrofying inline asm for better compilation Josh Poimboeuf
2018-06-04 19:56   ` Nadav Amit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).