* [PATCH RFC 0/5] x86/asm: Improve code generation
@ 2025-04-08 8:21 Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC Josh Poimboeuf
` (4 more replies)
0 siblings, 5 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-08 8:21 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Peter Zijlstra
Some various improvements for inline asm code generation.
The "RFC" is really patches 4 and 5, they propose a more readable format
for alternatives and objtool annotations. Bikeshedding welcome.
If a general format is agreed upon, we could do similar for other
annotations like WARN/BUG, static branches/calls, etc.
Josh Poimboeuf (5):
objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC
objtool, x86/hweight: Remove ANNOTATE_IGNORE_ALTERNATIVE
noinstr: Use asm_inline() in instrumentation_{begin,end}()
x86/alternative: Improve code generation readability
objtool: Improve code generation readability
arch/x86/include/asm/alternative.h | 88 ++++++++++++++++++--------
arch/x86/include/asm/arch_hweight.h | 6 +-
arch/x86/include/asm/smap.h | 12 ++--
include/linux/instrumentation.h | 4 +-
include/linux/objtool.h | 97 ++++++++++++++---------------
tools/objtool/check.c | 30 ++++++++-
6 files changed, 149 insertions(+), 88 deletions(-)
--
2.49.0
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC
2025-04-08 8:21 [PATCH RFC 0/5] x86/asm: Improve code generation Josh Poimboeuf
@ 2025-04-08 8:21 ` Josh Poimboeuf
2025-04-08 18:07 ` Linus Torvalds
2025-04-08 20:45 ` [tip: objtool/urgent] " tip-bot2 for Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 2/5] objtool, x86/hweight: Remove ANNOTATE_IGNORE_ALTERNATIVE Josh Poimboeuf
` (3 subsequent siblings)
4 siblings, 2 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-08 8:21 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Peter Zijlstra
ANNOTATE_IGNORE_ALTERNATIVE adds additional noise to the code generated
by CLAC/STAC alternatives, hurting readability for those whose read
uaccess-related code generation on a regular basis.
Remove the annotation specifically for the "NOP patched with CLAC/STAC"
case in favor of a manual check.
Leave the other uses of that annotation in place as they're less common
and more difficult to detect.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
arch/x86/include/asm/smap.h | 12 ++++++------
tools/objtool/check.c | 30 +++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 55a5e656e4b9..4f84d421d1cf 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -16,23 +16,23 @@
#ifdef __ASSEMBLER__
#define ASM_CLAC \
- ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "clac", X86_FEATURE_SMAP
+ ALTERNATIVE "", "clac", X86_FEATURE_SMAP
#define ASM_STAC \
- ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "stac", X86_FEATURE_SMAP
+ ALTERNATIVE "", "stac", X86_FEATURE_SMAP
#else /* __ASSEMBLER__ */
static __always_inline void clac(void)
{
/* Note: a barrier is implicit in alternative() */
- alternative(ANNOTATE_IGNORE_ALTERNATIVE "", "clac", X86_FEATURE_SMAP);
+ alternative("", "clac", X86_FEATURE_SMAP);
}
static __always_inline void stac(void)
{
/* Note: a barrier is implicit in alternative() */
- alternative(ANNOTATE_IGNORE_ALTERNATIVE "", "stac", X86_FEATURE_SMAP);
+ alternative("", "stac", X86_FEATURE_SMAP);
}
static __always_inline unsigned long smap_save(void)
@@ -59,9 +59,9 @@ static __always_inline void smap_restore(unsigned long flags)
/* These macros can be used in asm() statements */
#define ASM_CLAC \
- ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE "", "clac", X86_FEATURE_SMAP)
+ ALTERNATIVE("", "clac", X86_FEATURE_SMAP)
#define ASM_STAC \
- ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE "", "stac", X86_FEATURE_SMAP)
+ ALTERNATIVE("", "stac", X86_FEATURE_SMAP)
#define ASM_CLAC_UNSAFE \
ALTERNATIVE("", ANNOTATE_IGNORE_ALTERNATIVE "clac", X86_FEATURE_SMAP)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 69f94bc47499..b649049b6a11 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3505,6 +3505,34 @@ static struct instruction *next_insn_to_validate(struct objtool_file *file,
return next_insn_same_sec(file, alt_group->orig_group->last_insn);
}
+static bool skip_alt_group(struct instruction *insn)
+{
+ struct instruction *alt_insn = insn->alts ? insn->alts->insn : NULL;
+
+ /* ANNOTATE_IGNORE_ALTERNATIVE */
+ if (insn->alt_group && insn->alt_group->ignore)
+ return true;
+
+ /*
+ * For NOP patched with CLAC/STAC, only follow the latter to avoid
+ * impossible code paths combining patched CLAC with unpatched STAC
+ * or vice versa.
+ *
+ * ANNOTATE_IGNORE_ALTERNATIVE could have been used here, but Linus
+ * requested not to do that to avoid hurting .s file readability
+ * around CLAC/STAC alternative sites.
+ */
+
+ if (!alt_insn)
+ return false;
+
+ /* Don't override ASM_{CLAC,STAC}_UNSAFE */
+ if (alt_insn->alt_group && alt_insn->alt_group->ignore)
+ return false;
+
+ return alt_insn->type == INSN_CLAC || alt_insn->type == INSN_STAC;
+}
+
/*
* Follow the branch starting at the given instruction, and recursively follow
* any other branches (jumps). Meanwhile, track the frame pointer state at
@@ -3625,7 +3653,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
}
}
- if (insn->alt_group && insn->alt_group->ignore)
+ if (skip_alt_group(insn))
return 0;
if (handle_insn_ops(insn, next_insn, &state))
--
2.49.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH RFC 2/5] objtool, x86/hweight: Remove ANNOTATE_IGNORE_ALTERNATIVE
2025-04-08 8:21 [PATCH RFC 0/5] x86/asm: Improve code generation Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC Josh Poimboeuf
@ 2025-04-08 8:21 ` Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}() Josh Poimboeuf
` (2 subsequent siblings)
4 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-08 8:21 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Peter Zijlstra
Since objtool's inception, frame pointer warnings have been manually
silenced for __arch_hweight*() to allow those functions' inline asm to
avoid using ASM_CALL_CONSTRAINT.
The potentially dubious reasoning for that decision over nine years ago
was that since !X86_FEATURE_POPCNT is exceedingly rare, it's not worth
hurting the code layout for a function call that will never happen on
the vast majority of systems.
However, those functions actually started using ASM_CALL_CONSTRAINT with
the following commit:
194a613088a8 ("x86/hweight: Use ASM_CALL_CONSTRAINT in inline asm()")
And rightfully so, as it makes the code correct. ASM_CALL_CONSTRAINT
will soon have no effect for non-FP configs anyway.
With ASM_CALL_CONSTRAINT in place, ANNOTATE_IGNORE_ALTERNATIVE no longer
has a purpose for the hweight functions. Remove it.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
arch/x86/include/asm/arch_hweight.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/arch_hweight.h b/arch/x86/include/asm/arch_hweight.h
index cbc6157f0b4b..b5982b94bdba 100644
--- a/arch/x86/include/asm/arch_hweight.h
+++ b/arch/x86/include/asm/arch_hweight.h
@@ -16,8 +16,7 @@ static __always_inline unsigned int __arch_hweight32(unsigned int w)
{
unsigned int res;
- asm_inline (ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE
- "call __sw_hweight32",
+ asm_inline (ALTERNATIVE("call __sw_hweight32",
"popcntl %[val], %[cnt]", X86_FEATURE_POPCNT)
: [cnt] "=" REG_OUT (res), ASM_CALL_CONSTRAINT
: [val] REG_IN (w));
@@ -46,8 +45,7 @@ static __always_inline unsigned long __arch_hweight64(__u64 w)
{
unsigned long res;
- asm_inline (ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE
- "call __sw_hweight64",
+ asm_inline (ALTERNATIVE("call __sw_hweight64",
"popcntq %[val], %[cnt]", X86_FEATURE_POPCNT)
: [cnt] "=" REG_OUT (res), ASM_CALL_CONSTRAINT
: [val] REG_IN (w));
--
2.49.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
2025-04-08 8:21 [PATCH RFC 0/5] x86/asm: Improve code generation Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 2/5] objtool, x86/hweight: Remove ANNOTATE_IGNORE_ALTERNATIVE Josh Poimboeuf
@ 2025-04-08 8:21 ` Josh Poimboeuf
2025-04-08 8:30 ` Ingo Molnar
2025-04-08 8:21 ` [PATCH RFC 4/5] x86/alternative: Improve code generation readability Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 5/5] objtool: " Josh Poimboeuf
4 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-08 8:21 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Peter Zijlstra
Use asm_inline() to prevent the compiler from making poor inlining
decisions based on the length of the objtool annotations.
For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
0.18% text size increase:
add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
Total: Before=19448407, After=19483356, chg +0.18%
Presumably the text growth is due to increased inlining. A net total of
345 functions were removed.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
include/linux/instrumentation.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h
index bf675a8aef8a..b1007407d272 100644
--- a/include/linux/instrumentation.h
+++ b/include/linux/instrumentation.h
@@ -9,7 +9,7 @@
/* Begin/end of an instrumentation safe region */
#define __instrumentation_begin(c) ({ \
- asm volatile(__stringify(c) ": nop\n\t" \
+ asm_inline volatile(__stringify(c) ": nop\n\t" \
ANNOTATE_INSTR_BEGIN(__ASM_BREF(c)) \
: : "i" (c)); \
})
@@ -47,7 +47,7 @@
* part of the condition block and does not escape.
*/
#define __instrumentation_end(c) ({ \
- asm volatile(__stringify(c) ": nop\n\t" \
+ asm_inline volatile(__stringify(c) ": nop\n\t" \
ANNOTATE_INSTR_END(__ASM_BREF(c)) \
: : "i" (c)); \
})
--
2.49.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-08 8:21 [PATCH RFC 0/5] x86/asm: Improve code generation Josh Poimboeuf
` (2 preceding siblings ...)
2025-04-08 8:21 ` [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}() Josh Poimboeuf
@ 2025-04-08 8:21 ` Josh Poimboeuf
2025-04-09 14:38 ` Peter Zijlstra
2025-04-08 8:21 ` [PATCH RFC 5/5] objtool: " Josh Poimboeuf
4 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-08 8:21 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Peter Zijlstra
Improve the readability and compactness of alternatives code.
---------------------
ALTERNATIVE() before:
---------------------
# ALT: oldinstr
771:
rep movsb
772:
# ALT: padding
.skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
.pushsection .altinstructions,"a"
.long 771b - .
.long 774f - .
.4byte (((1 << 0) << 16) | ((18*32+ 4)))
.byte 773b-771b
.byte 775f-774f
.popsection
.pushsection .altinstr_replacement, "ax"
# ALT: replacement
774:
call rep_movs_alternative
775:
.popsection
--------------------
ALTERNATIVE() after:
--------------------
# <ALTERNATIVE>
771: rep movsb
772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
# ALT ENTRY:
.pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f; .popsection
# ALT REPLACEMENT:
.pushsection .altinstr_replacement, "ax"
774: call rep_movs_alternative
775:
.popsection
# </ALTERNATIVE>
-----------------------
ALTERNATIVE_2() before:
-----------------------
# ALT: oldinstr
771:
# ALT: oldinstr
771:
jmp 6f
772:
# ALT: padding
.skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
.pushsection .altinstructions,"a"
.long 771b - .
.long 774f - .
.4byte ( 3*32+21)
.byte 773b-771b
.byte 775f-774f
.popsection
.pushsection .altinstr_replacement, "ax"
# ALT: replacement
774:
jmp .L4 #
775:
.popsection
772:
# ALT: padding
.skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
.pushsection .altinstructions,"a"
.long 771b - .
.long 774f - .
.4byte 297 #
.byte 773b-771b
.byte 775f-774f
.popsection
.pushsection .altinstr_replacement, "ax"
# ALT: replacement
774:
775:
.popsection
----------------------
ALTERNATIVE_2() after:
----------------------
# <ALTERNATIVE_2>
771:
771: jmp 6f
772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
# ALT ENTRY:
.pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte ( 3*32+21); .byte 773b-771b; .byte 775f-774f; .popsection
# ALT REPLACEMENT:
.pushsection .altinstr_replacement, "ax"
774: jmp .L4 #
775:
.popsection
772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
# ALT ENTRY:
.pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte 297; .byte 773b-771b; .byte 775f-774f; .popsection #
# ALT REPLACEMENT:
.pushsection .altinstr_replacement, "ax"
774:
775:
.popsection
# </ALTERNATIVE_2>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
arch/x86/include/asm/alternative.h | 88 +++++++++++++++++++++---------
1 file changed, 63 insertions(+), 25 deletions(-)
diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h
index 4a37a8bd87fd..6472d53625dc 100644
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -151,46 +151,84 @@ static inline int alternatives_text_reserved(void *start, void *end)
#define alt_rlen "775f-774f"
#define OLDINSTR(oldinstr) \
- "# ALT: oldinstr\n" \
- "771:\n\t" oldinstr "\n772:\n" \
- "# ALT: padding\n" \
- ".skip -(((" alt_rlen ")-(" alt_slen ")) > 0) * " \
- "((" alt_rlen ")-(" alt_slen ")),0x90\n" \
+ "771:" oldinstr "\n" \
+ "772:\t.skip -(((" alt_rlen ")-(" alt_slen ")) > 0) * " \
+ "((" alt_rlen ")-(" alt_slen ")),0x90\n" \
"773:\n"
-#define ALTINSTR_ENTRY(ft_flags) \
- ".pushsection .altinstructions,\"a\"\n" \
- " .long 771b - .\n" /* label */ \
- " .long 774f - .\n" /* new instruction */ \
- " .4byte " __stringify(ft_flags) "\n" /* feature + flags */ \
- " .byte " alt_total_slen "\n" /* source len */ \
- " .byte " alt_rlen "\n" /* replacement len */ \
+#define ALTINSTR_ENTRY(ft_flags) \
+ "# ALT ENTRY:\n" \
+ ".pushsection .altinstructions,\"a\"; " \
+ " .long 771b - .; " /* label */ \
+ " .long 774f - .; " /* new instruction */ \
+ " .4byte " __stringify(ft_flags) "; " /* feature + flags */ \
+ " .byte " alt_total_slen "; " /* source len */ \
+ " .byte " alt_rlen "; " /* replacement len */ \
".popsection\n"
-#define ALTINSTR_REPLACEMENT(newinstr) /* replacement */ \
+#define ALTINSTR_REPLACEMENT(newinstr) \
+ "# ALT REPLACEMENT:\n" \
".pushsection .altinstr_replacement, \"ax\"\n" \
- "# ALT: replacement\n" \
- "774:\n\t" newinstr "\n775:\n" \
- ".popsection\n"
+ "774:\t" newinstr "\n" \
+ "775:\n" \
+ ".popsection"
-/* alternative assembly primitive: */
-#define ALTERNATIVE(oldinstr, newinstr, ft_flags) \
+
+#define __ALTERNATIVE(oldinstr, newinstr, ft_flags) \
OLDINSTR(oldinstr) \
ALTINSTR_ENTRY(ft_flags) \
ALTINSTR_REPLACEMENT(newinstr)
-#define ALTERNATIVE_2(oldinstr, newinstr1, ft_flags1, newinstr2, ft_flags2) \
- ALTERNATIVE(ALTERNATIVE(oldinstr, newinstr1, ft_flags1), newinstr2, ft_flags2)
+#define __ALTERNATIVE_2(oldinstr, \
+ newinstr1, ft_flags1, \
+ newinstr2, ft_flags2) \
+ __ALTERNATIVE("\n" \
+ __ALTERNATIVE(oldinstr, \
+ newinstr1, ft_flags1), \
+ newinstr2, ft_flags2)
+
+#define __ALTERNATIVE_3(oldinstr, \
+ newinstr1, ft_flags1, \
+ newinstr2, ft_flags2, \
+ newinstr3, ft_flags3) \
+ __ALTERNATIVE("\n" \
+ __ALTERNATIVE_2(oldinstr, \
+ newinstr1, ft_flags1, \
+ newinstr2, ft_flags2), \
+ newinstr3, ft_flags3)
+
+
+#define ALTERNATIVE(oldinstr, newinstr, ft_flags) \
+ "\n# <ALTERNATIVE>\n" \
+ __ALTERNATIVE("\t" oldinstr, newinstr, ft_flags) \
+ "\n# </ALTERNATIVE>\n"
+
+#define ALTERNATIVE_2(oldinstr, \
+ newinstr1, ft_flags1, \
+ newinstr2, ft_flags2) \
+ "\n# <ALTERNATIVE_2>\n" \
+ __ALTERNATIVE_2("\t" \
+ oldinstr, \
+ newinstr1, ft_flags1, \
+ newinstr2, ft_flags2) \
+ "\n# </ALTERNATIVE_2>\n"
+
+#define ALTERNATIVE_3(oldinstr, \
+ newinstr1, ft_flags1, \
+ newinstr2, ft_flags2, \
+ newinstr3, ft_flags3) \
+ "\n# <ALTERNATIVE_3>\n" \
+ __ALTERNATIVE_3("\t" \
+ oldinstr, \
+ newinstr1, ft_flags1, \
+ newinstr2, ft_flags2, \
+ newinstr3, ft_flags3) \
+ "\n# </ALTERNATIVE_3>\n"
/* If @feature is set, patch in @newinstr_yes, otherwise @newinstr_no. */
#define ALTERNATIVE_TERNARY(oldinstr, ft_flags, newinstr_yes, newinstr_no) \
ALTERNATIVE_2(oldinstr, newinstr_no, X86_FEATURE_ALWAYS, newinstr_yes, ft_flags)
-#define ALTERNATIVE_3(oldinstr, newinstr1, ft_flags1, newinstr2, ft_flags2, \
- newinstr3, ft_flags3) \
- ALTERNATIVE(ALTERNATIVE_2(oldinstr, newinstr1, ft_flags1, newinstr2, ft_flags2), \
- newinstr3, ft_flags3)
-
/*
* Alternative instructions for different CPU types or capabilities.
*
--
2.49.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH RFC 5/5] objtool: Improve code generation readability
2025-04-08 8:21 [PATCH RFC 0/5] x86/asm: Improve code generation Josh Poimboeuf
` (3 preceding siblings ...)
2025-04-08 8:21 ` [PATCH RFC 4/5] x86/alternative: Improve code generation readability Josh Poimboeuf
@ 2025-04-08 8:21 ` Josh Poimboeuf
2025-04-09 14:40 ` Peter Zijlstra
4 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-08 8:21 UTC (permalink / raw)
To: x86; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Peter Zijlstra
Improve the readability and compactness of the objtool annotations.
This makes it easier to see them and differentiate from other code.
Before:
-------
911:
.pushsection .discard.annotate_insn,"M",@progbits,8
.long 911b - .
.long 1
.popsection
After:
------
# <ANNOTATE_NOENDBR>
911: .pushsection .discard.annotate_insn,"M",@progbits,8; .long 911b - .; .long 1; .popsection
# </ANNOTATE_NOENDBR>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
include/linux/objtool.h | 97 ++++++++++++++++++++---------------------
1 file changed, 47 insertions(+), 50 deletions(-)
diff --git a/include/linux/objtool.h b/include/linux/objtool.h
index 366ad004d794..66549603147e 100644
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -8,19 +8,31 @@
#include <asm/asm.h>
+#define __UNWIND_HINT(label, type, sp_reg, sp_offset, signal) \
+ .pushsection .discard.unwind_hints; \
+ /* struct unwind_hint */ \
+ .long label - .; \
+ .short sp_offset; \
+ .byte sp_reg; \
+ .byte type; \
+ .byte signal; \
+ .balign 4; \
+ .popsection
+
+#define __ASM_ANNOTATE(label, type) \
+ .pushsection .discard.annotate_insn,"M",@progbits,8; \
+ .long label - .; \
+ .long type; \
+ .popsection
+
#ifndef __ASSEMBLY__
-#define UNWIND_HINT(type, sp_reg, sp_offset, signal) \
- "987: \n\t" \
- ".pushsection .discard.unwind_hints\n\t" \
- /* struct unwind_hint */ \
- ".long 987b - .\n\t" \
- ".short " __stringify(sp_offset) "\n\t" \
- ".byte " __stringify(sp_reg) "\n\t" \
- ".byte " __stringify(type) "\n\t" \
- ".byte " __stringify(signal) "\n\t" \
- ".balign 4 \n\t" \
- ".popsection\n\t"
+#define UNWIND_HINT(type, sp_reg, sp_offset, signal) \
+ "\n# <UNWIND_HINT>\n" \
+ "987: " \
+ __stringify(__UNWIND_HINT(987b, type, sp_reg, \
+ sp_offset, signal)) \
+ "\n# </UNWIND_HINT>\n\t"
/*
* This macro marks the given function's stack frame as "non-standard", which
@@ -45,23 +57,18 @@
#define STACK_FRAME_NON_STANDARD_FP(func)
#endif
-#define ASM_REACHABLE \
- "998:\n\t" \
- ".pushsection .discard.reachable\n\t" \
- ".long 998b\n\t" \
- ".popsection\n\t"
-
#define __ASM_BREF(label) label ## b
-#define __ASM_ANNOTATE(label, type) \
- ".pushsection .discard.annotate_insn,\"M\",@progbits,8\n\t" \
- ".long " __stringify(label) " - .\n\t" \
- ".long " __stringify(type) "\n\t" \
- ".popsection\n\t"
-
#define ASM_ANNOTATE(type) \
- "911:\n\t" \
- __ASM_ANNOTATE(911b, type)
+ "\n# <ANNOTATE_" __stringify(type) ">\n" \
+ "911:\t" \
+ __stringify(__ASM_ANNOTATE(911b, __PASTE(ANNOTYPE_, type))) \
+ "\n# </ANNOTATE_" __stringify(type) ">\n\t"
+
+#define ASM_ANNOTATE_LABEL(label, type) \
+ "\n# BEGIN ANNOTATE_" __stringify(type) "\n" \
+ __stringify(__ASM_ANNOTATE(label, __PASTE(ANNOTYPE_, type))) \
+ "\n# </ANNOTATE_" __stringify(type) "\n\t"
#else /* __ASSEMBLY__ */
@@ -88,15 +95,7 @@
*/
.macro UNWIND_HINT type:req sp_reg=0 sp_offset=0 signal=0
.Lhere_\@:
- .pushsection .discard.unwind_hints
- /* struct unwind_hint */
- .long .Lhere_\@ - .
- .short \sp_offset
- .byte \sp_reg
- .byte \type
- .byte \signal
- .balign 4
- .popsection
+ __UNWIND_HINT(.Lhere_\@, \type, \sp_reg, \sp_offset, \signal)
.endm
.macro STACK_FRAME_NON_STANDARD func:req
@@ -113,10 +112,7 @@
.macro ANNOTATE type:req
.Lhere_\@:
- .pushsection .discard.annotate_insn,"M",@progbits,8
- .long .Lhere_\@ - .
- .long \type
- .popsection
+ __ASM_ANNOTATE(.Lhere_\@, \type)
.endm
#endif /* __ASSEMBLY__ */
@@ -125,11 +121,12 @@
#ifndef __ASSEMBLY__
-#define UNWIND_HINT(type, sp_reg, sp_offset, signal) "\n\t"
#define STACK_FRAME_NON_STANDARD(func)
#define STACK_FRAME_NON_STANDARD_FP(func)
-#define __ASM_ANNOTATE(label, type) ""
-#define ASM_ANNOTATE(type)
+
+#define UNWIND_HINT(type, sp_reg, sp_offset, signal) ""
+#define ASM_ANNOTATE(type) ""
+#define ASM_ANNOTATE_LABEL(sym, type) ""
#else
.macro UNWIND_HINT type:req sp_reg=0 sp_offset=0 signal=0
.endm
@@ -146,30 +143,30 @@
* Annotate away the various 'relocation to !ENDBR` complaints; knowing that
* these relocations will never be used for indirect calls.
*/
-#define ANNOTATE_NOENDBR ASM_ANNOTATE(ANNOTYPE_NOENDBR)
-#define ANNOTATE_NOENDBR_SYM(sym) asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOENDBR))
+#define ANNOTATE_NOENDBR ASM_ANNOTATE(NOENDBR)
+#define ANNOTATE_NOENDBR_SYM(sym) asm(ASM_ANNOTATE_LABEL(sym, NOENDBR))
/*
* This should be used immediately before an indirect jump/call. It tells
* objtool the subsequent indirect jump/call is vouched safe for retpoline
* builds.
*/
-#define ANNOTATE_RETPOLINE_SAFE ASM_ANNOTATE(ANNOTYPE_RETPOLINE_SAFE)
+#define ANNOTATE_RETPOLINE_SAFE ASM_ANNOTATE(RETPOLINE_SAFE)
/*
* See linux/instrumentation.h
*/
-#define ANNOTATE_INSTR_BEGIN(label) __ASM_ANNOTATE(label, ANNOTYPE_INSTR_BEGIN)
-#define ANNOTATE_INSTR_END(label) __ASM_ANNOTATE(label, ANNOTYPE_INSTR_END)
+#define ANNOTATE_INSTR_BEGIN(label) ASM_ANNOTATE_LABEL(label, INSTR_BEGIN)
+#define ANNOTATE_INSTR_END(label) ASM_ANNOTATE_LABEL(label, INSTR_END)
/*
* objtool annotation to ignore the alternatives and only consider the original
* instruction(s).
*/
-#define ANNOTATE_IGNORE_ALTERNATIVE ASM_ANNOTATE(ANNOTYPE_IGNORE_ALTS)
+#define ANNOTATE_IGNORE_ALTERNATIVE ASM_ANNOTATE(IGNORE_ALTS)
/*
* This macro indicates that the following intra-function call is valid.
* Any non-annotated intra-function call will cause objtool to issue a warning.
*/
-#define ANNOTATE_INTRA_FUNCTION_CALL ASM_ANNOTATE(ANNOTYPE_INTRA_FUNCTION_CALL)
+#define ANNOTATE_INTRA_FUNCTION_CALL ASM_ANNOTATE(INTRA_FUNCTION_CALL)
/*
* Use objtool to validate the entry requirement that all code paths do
* VALIDATE_UNRET_END before RET.
@@ -177,13 +174,13 @@
* NOTE: The macro must be used at the beginning of a global symbol, otherwise
* it will be ignored.
*/
-#define ANNOTATE_UNRET_BEGIN ASM_ANNOTATE(ANNOTYPE_UNRET_BEGIN)
+#define ANNOTATE_UNRET_BEGIN ASM_ANNOTATE(UNRET_BEGIN)
/*
* This should be used to refer to an instruction that is considered
* terminating, like a noreturn CALL or UD2 when we know they are not -- eg
* WARN using UD2.
*/
-#define ANNOTATE_REACHABLE(label) __ASM_ANNOTATE(label, ANNOTYPE_REACHABLE)
+#define ANNOTATE_REACHABLE(label) ASM_ANNOTATE_LABEL(label, REACHABLE)
#else
#define ANNOTATE_NOENDBR ANNOTATE type=ANNOTYPE_NOENDBR
--
2.49.0
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
2025-04-08 8:21 ` [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}() Josh Poimboeuf
@ 2025-04-08 8:30 ` Ingo Molnar
2025-04-08 11:10 ` Uros Bizjak
0 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2025-04-08 8:30 UTC (permalink / raw)
To: Josh Poimboeuf
Cc: x86, linux-kernel, Linus Torvalds, Peter Zijlstra, Uros Bizjak
* Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> Use asm_inline() to prevent the compiler from making poor inlining
> decisions based on the length of the objtool annotations.
>
> For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> 0.18% text size increase:
>
> add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
> Total: Before=19448407, After=19483356, chg +0.18%
>
> Presumably the text growth is due to increased inlining. A net total of
> 345 functions were removed.
Since +0.18% puts this into the 'significant' category of .text size
increases, it would be nice to see a bit more details about the nature
of these function calls removed: were they really explicit calls to
__instrumentation_begin()/end(), or somehow tail-call optimized out, or
something else?
Also, I'm wondering where the 34,949 bytes bloat comes from: with 345
functions removed that's 100 bytes per function? Doesn't sound right.
Also, is the bloat-o-meter output limited to the .text section, or does
it include growth in out-of-line sections too?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
2025-04-08 8:30 ` Ingo Molnar
@ 2025-04-08 11:10 ` Uros Bizjak
2025-04-08 16:46 ` Josh Poimboeuf
0 siblings, 1 reply; 20+ messages in thread
From: Uros Bizjak @ 2025-04-08 11:10 UTC (permalink / raw)
To: Ingo Molnar
Cc: Josh Poimboeuf, x86, linux-kernel, Linus Torvalds, Peter Zijlstra
On Tue, Apr 8, 2025 at 10:30 AM Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> > Use asm_inline() to prevent the compiler from making poor inlining
> > decisions based on the length of the objtool annotations.
> >
> > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > 0.18% text size increase:
> >
> > add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
> > Total: Before=19448407, After=19483356, chg +0.18%
> >
> > Presumably the text growth is due to increased inlining. A net total of
> > 345 functions were removed.
>
> Since +0.18% puts this into the 'significant' category of .text size
> increases, it would be nice to see a bit more details about the nature
> of these function calls removed: were they really explicit calls to
> __instrumentation_begin()/end(), or somehow tail-call optimized out, or
> something else?
This increase is mainly due to different inlining decisions.
> Also, I'm wondering where the 34,949 bytes bloat comes from: with 345
> functions removed that's 100 bytes per function? Doesn't sound right.
Please note that removed functions can be inlined at several places. E.g.:
$ grep "<encode_string>" objdump.old
00000000004506e0 <encode_string>:
45113c: e8 9f f5 ff ff call 4506e0 <encode_string>
452bcb: e9 10 db ff ff jmp 4506e0 <encode_string>
453d33: e8 a8 c9 ff ff call 4506e0 <encode_string>
453ef7: e8 e4 c7 ff ff call 4506e0 <encode_string>
45549f: e8 3c b2 ff ff call 4506e0 <encode_string>
455843: e8 98 ae ff ff call 4506e0 <encode_string>
455b37: e8 a4 ab ff ff call 4506e0 <encode_string>
455b47: e8 94 ab ff ff call 4506e0 <encode_string>
4564fa: e8 e1 a1 ff ff call 4506e0 <encode_string>
456669: e8 72 a0 ff ff call 4506e0 <encode_string>
456691: e8 4a a0 ff ff call 4506e0 <encode_string>
4566a0: e8 3b a0 ff ff call 4506e0 <encode_string>
4569aa: e8 31 9d ff ff call 4506e0 <encode_string>
456e79: e9 62 98 ff ff jmp 4506e0 <encode_string>
456efe: e9 dd 97 ff ff jmp 4506e0 <encode_string>
all these calls now inline:
encode_string 58 - -58
where for example encode_putfh() grows by:
encode_putfh 70 118 +48
> Also, is the bloat-o-meter output limited to the .text section, or does
> it include growth in out-of-line sections too?
bloat-o-meter by default looks at all sybol types ("tTdDbBrRvVwW" as
returned by nm). Adding "-c" option will categorize the output by
symbol type (this is x86_64 defconfig change with gcc-4.2.1):
add/remove: 72/350 grow/shrink: 918/348 up/down: 80532/-46857 (33675)
Function old new delta
...
Total: Before=16685068, After=16718743, chg +0.20%
add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
Data old new delta
Total: Before=4471889, After=4471889, chg +0.00%
add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-12 (-12)
RO Data old new delta
icl_voltage_level_max_cdclk 12 - -12
Total: Before=1783310, After=1783298, chg -0.00%
Uros.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}()
2025-04-08 11:10 ` Uros Bizjak
@ 2025-04-08 16:46 ` Josh Poimboeuf
0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-08 16:46 UTC (permalink / raw)
To: Uros Bizjak
Cc: Ingo Molnar, x86, linux-kernel, Linus Torvalds, Peter Zijlstra
On Tue, Apr 08, 2025 at 01:10:14PM +0200, Uros Bizjak wrote:
> On Tue, Apr 8, 2025 at 10:30 AM Ingo Molnar <mingo@kernel.org> wrote:
> > * Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > > Use asm_inline() to prevent the compiler from making poor inlining
> > > decisions based on the length of the objtool annotations.
> > >
> > > For a defconfig kernel built with GCC 14.2.1, bloat-o-meter reports a
> > > 0.18% text size increase:
> > >
> > > add/remove: 88/433 grow/shrink: 967/487 up/down: 87579/-52630 (34949)
> > > Total: Before=19448407, After=19483356, chg +0.18%
> > >
> > > Presumably the text growth is due to increased inlining. A net total of
> > > 345 functions were removed.
> >
> > Since +0.18% puts this into the 'significant' category of .text size
> > increases, it would be nice to see a bit more details about the nature
> > of these function calls removed: were they really explicit calls to
> > __instrumentation_begin()/end(), or somehow tail-call optimized out, or
> > something else?
The instrumentation macros are used by WARN*() and lockdep, so this can
affect a lot of functions.
BTW, without the objtool annotations, each of those macros resolves to a
single NOP. So using inline_asm() seems obviously correct here as it
accurately communicates the code size to the compiler. I'm not sure if
that was clear from the description.
> > Also, I'm wondering where the 34,949 bytes bloat comes from: with 345
> > functions removed that's 100 bytes per function? Doesn't sound right.
>
> Please note that removed functions can be inlined at several places. E.g.:
>
> $ grep "<encode_string>" objdump.old
>
> 00000000004506e0 <encode_string>:
> 45113c: e8 9f f5 ff ff call 4506e0 <encode_string>
> 452bcb: e9 10 db ff ff jmp 4506e0 <encode_string>
> 453d33: e8 a8 c9 ff ff call 4506e0 <encode_string>
> 453ef7: e8 e4 c7 ff ff call 4506e0 <encode_string>
> 45549f: e8 3c b2 ff ff call 4506e0 <encode_string>
> 455843: e8 98 ae ff ff call 4506e0 <encode_string>
> 455b37: e8 a4 ab ff ff call 4506e0 <encode_string>
> 455b47: e8 94 ab ff ff call 4506e0 <encode_string>
> 4564fa: e8 e1 a1 ff ff call 4506e0 <encode_string>
> 456669: e8 72 a0 ff ff call 4506e0 <encode_string>
> 456691: e8 4a a0 ff ff call 4506e0 <encode_string>
> 4566a0: e8 3b a0 ff ff call 4506e0 <encode_string>
> 4569aa: e8 31 9d ff ff call 4506e0 <encode_string>
> 456e79: e9 62 98 ff ff jmp 4506e0 <encode_string>
> 456efe: e9 dd 97 ff ff jmp 4506e0 <encode_string>
>
> all these calls now inline:
>
> encode_string 58 - -58
>
> where for example encode_putfh() grows by:
>
> encode_putfh 70 118 +48
Thanks for looking! That makes sense: encode_string() uses
WARN_ON_ONCE() which uses instrumentation_begin()/end().
encode_string() is getting inlined now that GCC has more accurate
information about its size.
> > Also, is the bloat-o-meter output limited to the .text section, or does
> > it include growth in out-of-line sections too?
>
> bloat-o-meter by default looks at all sybol types ("tTdDbBrRvVwW" as
> returned by nm). Adding "-c" option will categorize the output by
> symbol type (this is x86_64 defconfig change with gcc-4.2.1):
>
> add/remove: 72/350 grow/shrink: 918/348 up/down: 80532/-46857 (33675)
> Function old new delta
> ...
> Total: Before=16685068, After=16718743, chg +0.20%
> add/remove: 0/0 grow/shrink: 0/0 up/down: 0/0 (0)
> Data old new delta
> Total: Before=4471889, After=4471889, chg +0.00%
> add/remove: 0/1 grow/shrink: 0/0 up/down: 0/-12 (-12)
> RO Data old new delta
> icl_voltage_level_max_cdclk 12 - -12
> Total: Before=1783310, After=1783298, chg -0.00%
Right. That means that bloat-o-meter's results include any
text.unlikely growth/shrinkage, because that code is contained by
symbols (typically .cold subfunctons).
I suppose it would be more helpful if .text.unlikely were excluded or
separated out. .text.unlikely code growth is always a good thing, as
opposed to .text for which growth can be good or bad.
--
Josh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC
2025-04-08 8:21 ` [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC Josh Poimboeuf
@ 2025-04-08 18:07 ` Linus Torvalds
2025-04-08 20:45 ` [tip: objtool/urgent] " tip-bot2 for Josh Poimboeuf
1 sibling, 0 replies; 20+ messages in thread
From: Linus Torvalds @ 2025-04-08 18:07 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: x86, linux-kernel, Ingo Molnar, Peter Zijlstra
On Tue, 8 Apr 2025 at 01:21, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> #define ASM_CLAC \
> - ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "clac", X86_FEATURE_SMAP
> + ALTERNATIVE "", "clac", X86_FEATURE_SMAP
>
> #define ASM_STAC \
> - ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "stac", X86_FEATURE_SMAP
> + ALTERNATIVE "", "stac", X86_FEATURE_SMAP
Thanks. I didn't actually test the patch, but it obviously fixes my
concerns, so I'm acking it without any testing what-so-ever.
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* [tip: objtool/urgent] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC
2025-04-08 8:21 ` [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC Josh Poimboeuf
2025-04-08 18:07 ` Linus Torvalds
@ 2025-04-08 20:45 ` tip-bot2 for Josh Poimboeuf
1 sibling, 0 replies; 20+ messages in thread
From: tip-bot2 for Josh Poimboeuf @ 2025-04-08 20:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: Linus Torvalds, Josh Poimboeuf, Ingo Molnar, x86, linux-kernel
The following commit has been merged into the objtool/urgent branch of tip:
Commit-ID: 2d12c6fb78753925f494ca9079e2383529e8ae0e
Gitweb: https://git.kernel.org/tip/2d12c6fb78753925f494ca9079e2383529e8ae0e
Author: Josh Poimboeuf <jpoimboe@kernel.org>
AuthorDate: Tue, 08 Apr 2025 01:21:14 -07:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Tue, 08 Apr 2025 22:03:51 +02:00
objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC
ANNOTATE_IGNORE_ALTERNATIVE adds additional noise to the code generated
by CLAC/STAC alternatives, hurting readability for those whose read
uaccess-related code generation on a regular basis.
Remove the annotation specifically for the "NOP patched with CLAC/STAC"
case in favor of a manual check.
Leave the other uses of that annotation in place as they're less common
and more difficult to detect.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/fc972ba4995d826fcfb8d02733a14be8d670900b.1744098446.git.jpoimboe@kernel.org
---
arch/x86/include/asm/smap.h | 12 ++++++------
tools/objtool/check.c | 30 +++++++++++++++++++++++++++++-
2 files changed, 35 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 55a5e65..4f84d42 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -16,23 +16,23 @@
#ifdef __ASSEMBLER__
#define ASM_CLAC \
- ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "clac", X86_FEATURE_SMAP
+ ALTERNATIVE "", "clac", X86_FEATURE_SMAP
#define ASM_STAC \
- ALTERNATIVE __stringify(ANNOTATE_IGNORE_ALTERNATIVE), "stac", X86_FEATURE_SMAP
+ ALTERNATIVE "", "stac", X86_FEATURE_SMAP
#else /* __ASSEMBLER__ */
static __always_inline void clac(void)
{
/* Note: a barrier is implicit in alternative() */
- alternative(ANNOTATE_IGNORE_ALTERNATIVE "", "clac", X86_FEATURE_SMAP);
+ alternative("", "clac", X86_FEATURE_SMAP);
}
static __always_inline void stac(void)
{
/* Note: a barrier is implicit in alternative() */
- alternative(ANNOTATE_IGNORE_ALTERNATIVE "", "stac", X86_FEATURE_SMAP);
+ alternative("", "stac", X86_FEATURE_SMAP);
}
static __always_inline unsigned long smap_save(void)
@@ -59,9 +59,9 @@ static __always_inline void smap_restore(unsigned long flags)
/* These macros can be used in asm() statements */
#define ASM_CLAC \
- ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE "", "clac", X86_FEATURE_SMAP)
+ ALTERNATIVE("", "clac", X86_FEATURE_SMAP)
#define ASM_STAC \
- ALTERNATIVE(ANNOTATE_IGNORE_ALTERNATIVE "", "stac", X86_FEATURE_SMAP)
+ ALTERNATIVE("", "stac", X86_FEATURE_SMAP)
#define ASM_CLAC_UNSAFE \
ALTERNATIVE("", ANNOTATE_IGNORE_ALTERNATIVE "clac", X86_FEATURE_SMAP)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 69f94bc..b649049 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3505,6 +3505,34 @@ next_orig:
return next_insn_same_sec(file, alt_group->orig_group->last_insn);
}
+static bool skip_alt_group(struct instruction *insn)
+{
+ struct instruction *alt_insn = insn->alts ? insn->alts->insn : NULL;
+
+ /* ANNOTATE_IGNORE_ALTERNATIVE */
+ if (insn->alt_group && insn->alt_group->ignore)
+ return true;
+
+ /*
+ * For NOP patched with CLAC/STAC, only follow the latter to avoid
+ * impossible code paths combining patched CLAC with unpatched STAC
+ * or vice versa.
+ *
+ * ANNOTATE_IGNORE_ALTERNATIVE could have been used here, but Linus
+ * requested not to do that to avoid hurting .s file readability
+ * around CLAC/STAC alternative sites.
+ */
+
+ if (!alt_insn)
+ return false;
+
+ /* Don't override ASM_{CLAC,STAC}_UNSAFE */
+ if (alt_insn->alt_group && alt_insn->alt_group->ignore)
+ return false;
+
+ return alt_insn->type == INSN_CLAC || alt_insn->type == INSN_STAC;
+}
+
/*
* Follow the branch starting at the given instruction, and recursively follow
* any other branches (jumps). Meanwhile, track the frame pointer state at
@@ -3625,7 +3653,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
}
}
- if (insn->alt_group && insn->alt_group->ignore)
+ if (skip_alt_group(insn))
return 0;
if (handle_insn_ops(insn, next_insn, &state))
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-08 8:21 ` [PATCH RFC 4/5] x86/alternative: Improve code generation readability Josh Poimboeuf
@ 2025-04-09 14:38 ` Peter Zijlstra
2025-04-09 17:41 ` Josh Poimboeuf
0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2025-04-09 14:38 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: x86, linux-kernel, Linus Torvalds, Ingo Molnar
On Tue, Apr 08, 2025 at 01:21:17AM -0700, Josh Poimboeuf wrote:
> Improve the readability and compactness of alternatives code.
>
> ---------------------
> ALTERNATIVE() before:
> ---------------------
>
> # ALT: oldinstr
> 771:
> rep movsb
> 772:
> # ALT: padding
> .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
> 773:
> .pushsection .altinstructions,"a"
> .long 771b - .
> .long 774f - .
> .4byte (((1 << 0) << 16) | ((18*32+ 4)))
> .byte 773b-771b
> .byte 775f-774f
> .popsection
> .pushsection .altinstr_replacement, "ax"
> # ALT: replacement
> 774:
> call rep_movs_alternative
> 775:
> .popsection
>
> --------------------
> ALTERNATIVE() after:
> --------------------
>
> # <ALTERNATIVE>
> 771: rep movsb
> 772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
> 773:
> # ALT ENTRY:
> .pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f; .popsection
I find this very hard to read. I prefer the multi line form it had
before.
Other than that, I like the new layout.
> # ALT REPLACEMENT:
> .pushsection .altinstr_replacement, "ax"
> 774: call rep_movs_alternative
> 775:
> .popsection
> # </ALTERNATIVE>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 5/5] objtool: Improve code generation readability
2025-04-08 8:21 ` [PATCH RFC 5/5] objtool: " Josh Poimboeuf
@ 2025-04-09 14:40 ` Peter Zijlstra
0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2025-04-09 14:40 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: x86, linux-kernel, Linus Torvalds, Ingo Molnar
On Tue, Apr 08, 2025 at 01:21:18AM -0700, Josh Poimboeuf wrote:
> Improve the readability and compactness of the objtool annotations.
> This makes it easier to see them and differentiate from other code.
>
> Before:
> -------
>
> 911:
> .pushsection .discard.annotate_insn,"M",@progbits,8
> .long 911b - .
> .long 1
> .popsection
>
> After:
> ------
>
> # <ANNOTATE_NOENDBR>
> 911: .pushsection .discard.annotate_insn,"M",@progbits,8; .long 911b - .; .long 1; .popsection
> # </ANNOTATE_NOENDBR>
Conversely, I don't mind the compact form here too much, since the only
bit that really matters is the annotation type (1, in the above case)
and that is already explicit in the marker (ANNOTATE_NOENDBR).
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-09 14:38 ` Peter Zijlstra
@ 2025-04-09 17:41 ` Josh Poimboeuf
2025-04-09 18:02 ` Linus Torvalds
0 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-09 17:41 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: x86, linux-kernel, Linus Torvalds, Ingo Molnar
On Wed, Apr 09, 2025 at 04:38:21PM +0200, Peter Zijlstra wrote:
> On Tue, Apr 08, 2025 at 01:21:17AM -0700, Josh Poimboeuf wrote:
> > Improve the readability and compactness of alternatives code.
> >
> > ---------------------
> > ALTERNATIVE() before:
> > ---------------------
> >
> > # ALT: oldinstr
> > 771:
> > rep movsb
> > 772:
> > # ALT: padding
> > .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
> > 773:
> > .pushsection .altinstructions,"a"
> > .long 771b - .
> > .long 774f - .
> > .4byte (((1 << 0) << 16) | ((18*32+ 4)))
> > .byte 773b-771b
> > .byte 775f-774f
> > .popsection
> > .pushsection .altinstr_replacement, "ax"
> > # ALT: replacement
> > 774:
> > call rep_movs_alternative
> > 775:
> > .popsection
> >
> > --------------------
> > ALTERNATIVE() after:
> > --------------------
> >
> > # <ALTERNATIVE>
> > 771: rep movsb
> > 772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
> > 773:
> > # ALT ENTRY:
> > .pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f; .popsection
>
> I find this very hard to read. I prefer the multi line form it had
> before.
I actually think the compactness of putting the entry on a single line
is really nice.
We're usually more interested in reading the code *around* the
alternatives, rather than the alternatives themselves.
Making the alt entry a big sprawling mess, for what typically resolves
to just a few instructions, makes that a LOT harder. And it's *really*
bad for ALTERNATIVE_2() and ALTERNATIVE_3().
Also, 99% of the time, we're not going to be debugging the alternatives
themselves. Spreading out the alt entry directives across multiple
lines is way overkill. It wastes space and cognitive load.
That's what the comments are for. All you really need to care about are
the original instructions, the replacement instructions, and potentially
what feature+flags.
We could even print the feature by adding the '(((1 << 0) << 16) |
((18*32+ 4)))' to one of the comments.
Or even better, we can just show the actual human readable feature:
ALT_NOT(X86_FEATURE_FSRM).
Something like:
# <ALTERNATIVE>
771: rep movsb
772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
# ALT ENTRY:
.pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f; .popsection
# ALT REPLACEMENT: ALT_NOT(X86_FEATURE_FSRM):
.pushsection .altinstr_replacement, "ax"
774: call rep_movs_alternative
775:
.popsection
# </ALTERNATIVE>
Then with all the relevant information in the comments, there's no
reason to make the alt entry directives themselves very readable, right?
Unless alternatives are broken and you're debugging it. Which should
very much be the exception rather than the rule.
One thing that does bother me slightly is the asymmetry of having the
.pushsection and .popsection on the same line, whereas the replacement
has them on separate lines. We could just put the .popsection on its
own line:
# <ALTERNATIVE>
771: rep movsb
772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90
773:
# ALT ENTRY:
.pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f
.popsection
# ALT REPLACEMENT: ALT_NOT(X86_FEATURE_FSRM):
.pushsection .altinstr_replacement, "ax"
774: call rep_movs_alternative
775:
.popsection
# </ALTERNATIVE>
--
Josh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-09 17:41 ` Josh Poimboeuf
@ 2025-04-09 18:02 ` Linus Torvalds
2025-04-09 19:51 ` Josh Poimboeuf
2025-04-16 23:30 ` Josh Poimboeuf
0 siblings, 2 replies; 20+ messages in thread
From: Linus Torvalds @ 2025-04-09 18:02 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: Peter Zijlstra, x86, linux-kernel, Ingo Molnar
On Wed, 9 Apr 2025 at 10:41, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> I actually think the compactness of putting the entry on a single line
> is really nice.
Yeah, I think the noise with size calculations in particular is stuff
that I never look at, and making it one long line is better than
multiple lines, I think.
That said, for some of the alternatives, it would be even nicer if we
could make the noise more compact by just hardcoding sizes. Our
generic alternatives macros do result in some *very* verbose output
that is entirely illegible to humans.
And yes, we need that generic case for most of them, since the
instruction size will depend on register choice and the modrm
addressing details etc.
But some of them are kind of pointless.
I did have something that just knew that 'clac'/'stac' were three-byte
instructions (and it was very obvious indeed back when we encoded them
as such, using the ".byte" syntax).
And that avoided a *lot* of noise in the alternatives section, and
removed the subsequent ".skip" part entirely because both sides would
just use the right size without any calculations.
I can't find that hacky patch of mine, and I didn't keep it around
because it *was* hacky, but maybe I'll try to resurrect a cleaner
version of it.
There may not be very many people who care, because yeah, it really
only shows up when looking at the compiler-generated assembler output,
and I agree that that isn't exactly _common_.
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-09 18:02 ` Linus Torvalds
@ 2025-04-09 19:51 ` Josh Poimboeuf
2025-04-09 21:20 ` Linus Torvalds
2025-04-16 23:30 ` Josh Poimboeuf
1 sibling, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-09 19:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Peter Zijlstra, x86, linux-kernel, Ingo Molnar
On Wed, Apr 09, 2025 at 11:02:29AM -0700, Linus Torvalds wrote:
> On Wed, 9 Apr 2025 at 10:41, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > I actually think the compactness of putting the entry on a single line
> > is really nice.
>
> Yeah, I think the noise with size calculations in particular is stuff
> that I never look at, and making it one long line is better than
> multiple lines, I think.
>
> That said, for some of the alternatives, it would be even nicer if we
> could make the noise more compact by just hardcoding sizes. Our
> generic alternatives macros do result in some *very* verbose output
> that is entirely illegible to humans.
What if we were to use a global asm() to define an alternative .macro
whenever "alternative.h" gets included?
e.g.:
asm(".macro alternative oldinstr, newinstr, ft_flags\n"
"771:\t\\oldinstr\n"
"772:\t.skip -(((" alt_rlen ")-(" alt_slen ")) > 0) * "
"((" alt_rlen ")-(" alt_slen ")),0x90\n"
"773:\n"
".pushsection .altinstructions,\"a\"\n\t"
".long 771b - .\n\t" /* label */
".long 774f - .\n\t " /* new instruction */
".4byte \\ft_flags\n\t" /* feature + flags */
".byte " alt_total_slen "\n\t" /* source len */
".byte " alt_rlen "\n\t" /* replacement len */
".popsection\n"
".pushsection .altinstr_replacement, \"ax\"\n"
"774:\t\\newinstr\n"
"775:\n"
".popsection\n"
".endm");
#define ALTERNATIVE(oldinstr, newinstr, ft_flags) \
"alternative \"" oldinstr "\", \"" newinstr "\", \"" __stringify(ft_flags) "\"; "
Then it becomes extremely readable:
alternative "", "stac", "( 9*32+20)";
Of course the downside is the macro gets generated even if it's never
used.
--
Josh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-09 19:51 ` Josh Poimboeuf
@ 2025-04-09 21:20 ` Linus Torvalds
2025-04-09 21:27 ` Josh Poimboeuf
0 siblings, 1 reply; 20+ messages in thread
From: Linus Torvalds @ 2025-04-09 21:20 UTC (permalink / raw)
To: Josh Poimboeuf; +Cc: Peter Zijlstra, x86, linux-kernel, Ingo Molnar
On Wed, 9 Apr 2025 at 12:51, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
>
> What if we were to use a global asm() to define an alternative .macro
> whenever "alternative.h" gets included?
Yeah, I wouldn't mind that, but I have this dim memory of us having
tried it at some point and it didn't work.
I think the issue was that the in-compiler assembler was not as
complete as the external one (ie not doing macros at all or something
like that).
Linus
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-09 21:20 ` Linus Torvalds
@ 2025-04-09 21:27 ` Josh Poimboeuf
2025-04-09 21:55 ` Josh Poimboeuf
0 siblings, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-09 21:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Peter Zijlstra, x86, linux-kernel, Ingo Molnar
On Wed, Apr 09, 2025 at 02:20:55PM -0700, Linus Torvalds wrote:
> On Wed, 9 Apr 2025 at 12:51, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > What if we were to use a global asm() to define an alternative .macro
> > whenever "alternative.h" gets included?
>
> Yeah, I wouldn't mind that, but I have this dim memory of us having
> tried it at some point and it didn't work.
>
> I think the issue was that the in-compiler assembler was not as
> complete as the external one (ie not doing macros at all or something
> like that).
It seems to work with GCC 14 and Clang 18 at least. I can try to find
some old toolchains to test with.
--
Josh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-09 21:27 ` Josh Poimboeuf
@ 2025-04-09 21:55 ` Josh Poimboeuf
0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-09 21:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Peter Zijlstra, x86, linux-kernel, Ingo Molnar
On Wed, Apr 09, 2025 at 02:27:28PM -0700, Josh Poimboeuf wrote:
> On Wed, Apr 09, 2025 at 02:20:55PM -0700, Linus Torvalds wrote:
> > On Wed, 9 Apr 2025 at 12:51, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> > >
> > > What if we were to use a global asm() to define an alternative .macro
> > > whenever "alternative.h" gets included?
> >
> > Yeah, I wouldn't mind that, but I have this dim memory of us having
> > tried it at some point and it didn't work.
> >
> > I think the issue was that the in-compiler assembler was not as
> > complete as the external one (ie not doing macros at all or something
> > like that).
>
> It seems to work with GCC 14 and Clang 18 at least. I can try to find
> some old toolchains to test with.
Actually, Clang *compiled*, but on closer inspection it's actually
silently omitting the inline asm :-/
So yeah, that's not going to work.
--
Josh
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH RFC 4/5] x86/alternative: Improve code generation readability
2025-04-09 18:02 ` Linus Torvalds
2025-04-09 19:51 ` Josh Poimboeuf
@ 2025-04-16 23:30 ` Josh Poimboeuf
1 sibling, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2025-04-16 23:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Peter Zijlstra, x86, linux-kernel, Ingo Molnar
On Wed, Apr 09, 2025 at 11:02:29AM -0700, Linus Torvalds wrote:
> On Wed, 9 Apr 2025 at 10:41, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> >
> > I actually think the compactness of putting the entry on a single line
> > is really nice.
>
> Yeah, I think the noise with size calculations in particular is stuff
> that I never look at, and making it one long line is better than
> multiple lines, I think.
We could just give up on trying to making the code itself readable, and
instead put the code+features in comments, e.g.:
# <ALTERNATIVE>
#
# ORIG: rep movsb
# NEW: call rep_movs_alternative # ALT_NOT(X86_FEATURE_FSRM)
#
771: rep movsb; 772: .skip -(((775f-774f)-(772b-771b)) > 0) * ((775f-774f)-(772b-771b)),0x90; 773: .pushsection .altinstructions,"a"; .long 771b - .; .long 774f - .; .4byte (((1 << 0) << 16) | ((18*32+ 4))); .byte 773b-771b; .byte 775f-774f; .popsection; .pushsection .altinstr_replacement, "ax"; 774: call rep_movs_alternative; 775: .popsection
# </ALTERNATIVE>
--
Josh
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-04-16 23:30 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 8:21 [PATCH RFC 0/5] x86/asm: Improve code generation Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 1/5] objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC Josh Poimboeuf
2025-04-08 18:07 ` Linus Torvalds
2025-04-08 20:45 ` [tip: objtool/urgent] " tip-bot2 for Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 2/5] objtool, x86/hweight: Remove ANNOTATE_IGNORE_ALTERNATIVE Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 3/5] noinstr: Use asm_inline() in instrumentation_{begin,end}() Josh Poimboeuf
2025-04-08 8:30 ` Ingo Molnar
2025-04-08 11:10 ` Uros Bizjak
2025-04-08 16:46 ` Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 4/5] x86/alternative: Improve code generation readability Josh Poimboeuf
2025-04-09 14:38 ` Peter Zijlstra
2025-04-09 17:41 ` Josh Poimboeuf
2025-04-09 18:02 ` Linus Torvalds
2025-04-09 19:51 ` Josh Poimboeuf
2025-04-09 21:20 ` Linus Torvalds
2025-04-09 21:27 ` Josh Poimboeuf
2025-04-09 21:55 ` Josh Poimboeuf
2025-04-16 23:30 ` Josh Poimboeuf
2025-04-08 8:21 ` [PATCH RFC 5/5] objtool: " Josh Poimboeuf
2025-04-09 14:40 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).