* [PATCH v3 01/10] x86/cfi: Add warn option
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 17:50 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 02/10] x86/ibt: Add exact_endbr() helper Peter Zijlstra
` (10 subsequent siblings)
11 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
Rebuilding with CFI_PERMISSIVE toggled is such a pain, esp. since
clang is so slow.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/cfi.h | 1 +
arch/x86/kernel/alternative.c | 4 ++++
arch/x86/kernel/cfi.c | 14 ++++++++++----
3 files changed, 15 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/cfi.h
+++ b/arch/x86/include/asm/cfi.h
@@ -100,6 +100,7 @@ enum cfi_mode {
};
extern enum cfi_mode cfi_mode;
+extern bool cfi_warn;
struct pt_regs;
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -916,6 +916,7 @@ void __init_or_module apply_seal_endbr(s
#endif
enum cfi_mode cfi_mode __ro_after_init = __CFI_DEFAULT;
+bool cfi_warn __ro_after_init = false;
#ifdef CONFIG_CFI_CLANG
struct bpf_insn;
@@ -1022,6 +1023,9 @@ static __init int cfi_parse_cmdline(char
cfi_mode = CFI_FINEIBT;
} else if (!strcmp(str, "norand")) {
cfi_rand = false;
+ } else if (!strcmp(str, "warn")) {
+ pr_alert("CFI mismatch non-fatal!\n");
+ cfi_warn = true;
} else {
pr_err("Ignoring unknown cfi option (%s).", str);
}
--- a/arch/x86/kernel/cfi.c
+++ b/arch/x86/kernel/cfi.c
@@ -67,16 +67,17 @@ static bool decode_cfi_insn(struct pt_re
*/
enum bug_trap_type handle_cfi_failure(struct pt_regs *regs)
{
- unsigned long target;
+ unsigned long target, addr = regs->ip;
+ enum bug_trap_type btt;
u32 type;
switch (cfi_mode) {
case CFI_KCFI:
- if (!is_cfi_trap(regs->ip))
+ if (!is_cfi_trap(addr))
return BUG_TRAP_TYPE_NONE;
if (!decode_cfi_insn(regs, &target, &type))
- return report_cfi_failure_noaddr(regs, regs->ip);
+ return report_cfi_failure_noaddr(regs, addr);
break;
@@ -90,7 +91,12 @@ enum bug_trap_type handle_cfi_failure(st
return BUG_TRAP_TYPE_NONE;
}
- return report_cfi_failure(regs, regs->ip, &target, type);
+ btt = report_cfi_failure(regs, addr, &target, type);
+ if (btt == BUG_TRAP_TYPE_BUG && cfi_warn) {
+ __warn(NULL, 0, (void *)addr, 0, regs, NULL);
+ btt = BUG_TRAP_TYPE_WARN;
+ }
+ return btt;
}
/*
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 01/10] x86/cfi: Add warn option
2025-02-19 16:21 ` [PATCH v3 01/10] x86/cfi: Add warn option Peter Zijlstra
@ 2025-02-19 17:50 ` Kees Cook
2025-02-19 17:56 ` Peter Zijlstra
0 siblings, 1 reply; 38+ messages in thread
From: Kees Cook @ 2025-02-19 17:50 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:08PM +0100, Peter Zijlstra wrote:
> Rebuilding with CFI_PERMISSIVE toggled is such a pain, esp. since
> clang is so slow.
This seems too complex; report_cfi_failure() already has the fail/warn
logic test. I would have expected cfi_warn to take CONFIG_CFI_PERMISSIVE
as a default instead, like:
+bool cfi_warn __ro_after_init = IS_ENABLED(CONFIG_CFI_PERMISSIVE);
and then just replace report_cfi_failure()'s check of
CONFIG_CFI_PERMISSIVE with cfi_warn:
- if (IS_ENABLED(CONFIG_CFI_PERMISSIVE)) {
+ if (cfi_warn) {
-Kees
(I do worry about data-only attacks going after page tables and flipping
pages to r/w and changing cfi_warn to 1, but that's probably on the same
order of difficulty as targeting the cfi handler function itself. Hmpf.)
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 01/10] x86/cfi: Add warn option
2025-02-19 17:50 ` Kees Cook
@ 2025-02-19 17:56 ` Peter Zijlstra
0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 17:56 UTC (permalink / raw)
To: Kees Cook
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 09:50:54AM -0800, Kees Cook wrote:
> On Wed, Feb 19, 2025 at 05:21:08PM +0100, Peter Zijlstra wrote:
> > Rebuilding with CFI_PERMISSIVE toggled is such a pain, esp. since
> > clang is so slow.
>
> This seems too complex; report_cfi_failure() already has the fail/warn
> logic test. I would have expected cfi_warn to take CONFIG_CFI_PERMISSIVE
> as a default instead, like:
>
> +bool cfi_warn __ro_after_init = IS_ENABLED(CONFIG_CFI_PERMISSIVE);
In kernel/cfi.c, yes that works.
I somehow got stuck with having cfi_warn in arch/x86 and then not being
able to use it in generic code. Been too busy to debug all the fun fails
to realize I could simply stick the variable in generic code.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 02/10] x86/ibt: Add exact_endbr() helper
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
2025-02-19 16:21 ` [PATCH v3 01/10] x86/cfi: Add warn option Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 17:51 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 03/10] x86/traps: Decode 0xEA #UD Peter Zijlstra
` (9 subsequent siblings)
11 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
For when we want to exactly match ENDBR, and not everything that we
can scribble it with.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/alternative.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -863,6 +863,17 @@ __noendbr bool is_endbr(u32 *val)
return false;
}
+static __noendbr bool exact_endbr(u32 *val)
+{
+ u32 endbr;
+
+ __get_kernel_nofault(&endbr, val, u32, Efault);
+ return endbr == gen_endbr();
+
+Efault:
+ return false;
+}
+
static void poison_cfi(void *addr);
static void __init_or_module poison_endbr(void *addr)
@@ -1427,10 +1438,9 @@ static void poison_cfi(void *addr)
bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type)
{
unsigned long addr = regs->ip - fineibt_preamble_ud2;
- u32 endbr, hash;
+ u32 hash;
- __get_kernel_nofault(&endbr, addr, u32, Efault);
- if (endbr != gen_endbr())
+ if (!exact_endbr((void *)addr))
return false;
*target = addr + fineibt_preamble_size;
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 02/10] x86/ibt: Add exact_endbr() helper
2025-02-19 16:21 ` [PATCH v3 02/10] x86/ibt: Add exact_endbr() helper Peter Zijlstra
@ 2025-02-19 17:51 ` Kees Cook
0 siblings, 0 replies; 38+ messages in thread
From: Kees Cook @ 2025-02-19 17:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:09PM +0100, Peter Zijlstra wrote:
> For when we want to exactly match ENDBR, and not everything that we
> can scribble it with.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 03/10] x86/traps: Decode 0xEA #UD
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
2025-02-19 16:21 ` [PATCH v3 01/10] x86/cfi: Add warn option Peter Zijlstra
2025-02-19 16:21 ` [PATCH v3 02/10] x86/ibt: Add exact_endbr() helper Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 16:47 ` Andrew Cooper
2025-02-19 17:52 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 04/10] x86/traps: Allow custom fixups in handle_bug() Peter Zijlstra
` (8 subsequent siblings)
11 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
FineIBT will start using 0xEA as #UD
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/bug.h | 1 +
arch/x86/kernel/traps.c | 12 ++++++++++++
2 files changed, 13 insertions(+)
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -25,6 +25,7 @@
#define BUG_UD2 0xfffe
#define BUG_UD1 0xfffd
#define BUG_UD1_UBSAN 0xfffc
+#define BUG_EA 0xffea
#ifdef CONFIG_GENERIC_BUG
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -96,6 +96,7 @@ __always_inline int is_valid_bugaddr(uns
* Check for UD1 or UD2, accounting for Address Size Override Prefixes.
* If it's a UD1, further decode to determine its use:
*
+ * FineIBT: ea (bad)
* UBSan{0}: 67 0f b9 00 ud1 (%eax),%eax
* UBSan{10}: 67 0f b9 40 10 ud1 0x10(%eax),%eax
* static_call: 0f b9 cc ud1 %esp,%ecx
@@ -113,6 +114,10 @@ __always_inline int decode_bug(unsigned
v = *(u8 *)(addr++);
if (v == INSN_ASOP)
v = *(u8 *)(addr++);
+ if (v == 0xea) {
+ *len = addr - start;
+ return BUG_EA;
+ }
if (v != OPCODE_ESCAPE)
return BUG_NONE;
@@ -308,6 +313,13 @@ static noinstr bool handle_bug(struct pt
raw_local_irq_enable();
switch (ud_type) {
+ case BUG_EA:
+ if (handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
+ regs->ip += ud_len;
+ handled = true;
+ }
+ break;
+
case BUG_UD2:
if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ||
handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 03/10] x86/traps: Decode 0xEA #UD
2025-02-19 16:21 ` [PATCH v3 03/10] x86/traps: Decode 0xEA #UD Peter Zijlstra
@ 2025-02-19 16:47 ` Andrew Cooper
2025-02-19 16:49 ` Peter Zijlstra
2025-02-19 17:52 ` Kees Cook
1 sibling, 1 reply; 38+ messages in thread
From: Andrew Cooper @ 2025-02-19 16:47 UTC (permalink / raw)
To: Peter Zijlstra, x86
Cc: linux-kernel, alyssa.milburn, scott.d.constable, joao, jpoimboe,
jose.marchesi, hjl.tools, ndesaulniers, samitolvanen, nathan,
ojeda, kees, alexei.starovoitov, mhiramat, jmill
On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> FineIBT will start using 0xEA as #UD
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
FINEIBT probably ought to gain a "depends on X86_64" too, or this is
going to go wrong in a very fun way.
~Andrew
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 03/10] x86/traps: Decode 0xEA #UD
2025-02-19 16:47 ` Andrew Cooper
@ 2025-02-19 16:49 ` Peter Zijlstra
0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:49 UTC (permalink / raw)
To: Andrew Cooper
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
jpoimboe, jose.marchesi, hjl.tools, ndesaulniers, samitolvanen,
nathan, ojeda, kees, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 04:47:10PM +0000, Andrew Cooper wrote:
> On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> > FineIBT will start using 0xEA as #UD
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> FINEIBT probably ought to gain a "depends on X86_64" too, or this is
> going to go wrong in a very fun way.
It already is:
config X86_KERNEL_IBT
depends on X86_64 && CC_HAS_IBT && HAVE_OBJTOOL
Nobody wants to touch i386 if they don't have to :-)
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 03/10] x86/traps: Decode 0xEA #UD
2025-02-19 16:21 ` [PATCH v3 03/10] x86/traps: Decode 0xEA #UD Peter Zijlstra
2025-02-19 16:47 ` Andrew Cooper
@ 2025-02-19 17:52 ` Kees Cook
1 sibling, 0 replies; 38+ messages in thread
From: Kees Cook @ 2025-02-19 17:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:10PM +0100, Peter Zijlstra wrote:
> FineIBT will start using 0xEA as #UD
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 04/10] x86/traps: Allow custom fixups in handle_bug()
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (2 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 03/10] x86/traps: Decode 0xEA #UD Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 17:55 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence Peter Zijlstra
` (7 subsequent siblings)
11 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
The normal fixup in handle_bug() is simply continuing at the next
instruction. However upcomming patches make this the wrong thing, so
allow handlers (specifically handle_cfi_failure()) to over-ride
regs->ip.
The callchain is such that the fixup needs to be done before it is
determined if the exception is fatal, as such, revert any changes in
that case.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/traps.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -287,11 +287,12 @@ static inline void handle_invalid_op(str
static noinstr bool handle_bug(struct pt_regs *regs)
{
+ unsigned long addr = regs->ip;
bool handled = false;
int ud_type, ud_len;
s32 ud_imm;
- ud_type = decode_bug(regs->ip, &ud_imm, &ud_len);
+ ud_type = decode_bug(addr, &ud_imm, &ud_len);
if (ud_type == BUG_NONE)
return handled;
@@ -315,7 +316,8 @@ static noinstr bool handle_bug(struct pt
switch (ud_type) {
case BUG_EA:
if (handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
- regs->ip += ud_len;
+ if (regs->ip == addr)
+ regs->ip += ud_len;
handled = true;
}
break;
@@ -323,7 +325,8 @@ static noinstr bool handle_bug(struct pt
case BUG_UD2:
if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ||
handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
- regs->ip += ud_len;
+ if (regs->ip == addr)
+ regs->ip += ud_len;
handled = true;
}
break;
@@ -340,6 +343,9 @@ static noinstr bool handle_bug(struct pt
break;
}
+ if (!handled && regs->ip != addr)
+ regs->ip = addr;
+
if (regs->flags & X86_EFLAGS_IF)
raw_local_irq_disable();
instrumentation_end();
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 04/10] x86/traps: Allow custom fixups in handle_bug()
2025-02-19 16:21 ` [PATCH v3 04/10] x86/traps: Allow custom fixups in handle_bug() Peter Zijlstra
@ 2025-02-19 17:55 ` Kees Cook
2025-02-19 18:17 ` Peter Zijlstra
0 siblings, 1 reply; 38+ messages in thread
From: Kees Cook @ 2025-02-19 17:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:11PM +0100, Peter Zijlstra wrote:
> The normal fixup in handle_bug() is simply continuing at the next
> instruction. However upcomming patches make this the wrong thing, so
> allow handlers (specifically handle_cfi_failure()) to over-ride
> regs->ip.
>
> The callchain is such that the fixup needs to be done before it is
> determined if the exception is fatal, as such, revert any changes in
> that case.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/x86/kernel/traps.c | 12 +++++++++---
> 1 file changed, 9 insertions(+), 3 deletions(-)
>
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -287,11 +287,12 @@ static inline void handle_invalid_op(str
>
> static noinstr bool handle_bug(struct pt_regs *regs)
> {
> + unsigned long addr = regs->ip;
> bool handled = false;
> int ud_type, ud_len;
> s32 ud_imm;
>
> - ud_type = decode_bug(regs->ip, &ud_imm, &ud_len);
> + ud_type = decode_bug(addr, &ud_imm, &ud_len);
> if (ud_type == BUG_NONE)
> return handled;
>
> @@ -315,7 +316,8 @@ static noinstr bool handle_bug(struct pt
> switch (ud_type) {
> case BUG_EA:
> if (handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
> - regs->ip += ud_len;
> + if (regs->ip == addr)
> + regs->ip += ud_len;
> handled = true;
> }
> break;
> @@ -323,7 +325,8 @@ static noinstr bool handle_bug(struct pt
> case BUG_UD2:
> if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ||
> handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
> - regs->ip += ud_len;
> + if (regs->ip == addr)
> + regs->ip += ud_len;
> handled = true;
> }
> break;
> @@ -340,6 +343,9 @@ static noinstr bool handle_bug(struct pt
> break;
> }
>
> + if (!handled && regs->ip != addr)
> + regs->ip = addr;
Can you add a comment above this just to help with people scanning
through this code in the future, maybe:
/* Restore failure location if we're not continuing execution. */
> +
> if (regs->flags & X86_EFLAGS_IF)
> raw_local_irq_disable();
> instrumentation_end();
But yeah, seems fine:
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 04/10] x86/traps: Allow custom fixups in handle_bug()
2025-02-19 17:55 ` Kees Cook
@ 2025-02-19 18:17 ` Peter Zijlstra
0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 18:17 UTC (permalink / raw)
To: Kees Cook
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 09:55:10AM -0800, Kees Cook wrote:
> > @@ -340,6 +343,9 @@ static noinstr bool handle_bug(struct pt
> > break;
> > }
> >
> > + if (!handled && regs->ip != addr)
> > + regs->ip = addr;
>
> Can you add a comment above this just to help with people scanning
> through this code in the future, maybe:
>
> /* Restore failure location if we're not continuing execution. */
>
>
> > +
> > if (regs->flags & X86_EFLAGS_IF)
> > raw_local_irq_disable();
> > instrumentation_end();
Done.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (3 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 04/10] x86/traps: Allow custom fixups in handle_bug() Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 17:15 ` Andrew Cooper
2025-02-19 18:01 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD Peter Zijlstra
` (6 subsequent siblings)
11 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
Scott notes that non-taken branches are faster. Abuse overlapping code
that traps instead of explicit UD2 instructions.
And LEA does not modify flags and will have less dependencies.
Suggested-by: Scott Constable <scott.d.constable@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/kernel/alternative.c | 58 ++++++++++++++++++++++++++----------------
arch/x86/net/bpf_jit_comp.c | 5 +--
2 files changed, 39 insertions(+), 24 deletions(-)
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1054,9 +1054,9 @@ early_param("cfi", cfi_parse_cmdline);
* __cfi_\func: __cfi_\func:
* movl $0x12345678,%eax // 5 endbr64 // 4
* nop subl $0x12345678,%r10d // 7
- * nop jz 1f // 2
- * nop ud2 // 2
- * nop 1: nop // 1
+ * nop jne __cfi_\func+6 // 2
+ * nop nop3 // 3
+ * nop
* nop
* nop
* nop
@@ -1068,37 +1068,47 @@ early_param("cfi", cfi_parse_cmdline);
*
* caller: caller:
* movl $(-0x12345678),%r10d // 6 movl $0x12345678,%r10d // 6
- * addl $-15(%r11),%r10d // 4 sub $16,%r11 // 4
+ * addl $-15(%r11),%r10d // 4 lea -0x10(%r11),%r11 // 4
* je 1f // 2 nop4 // 4
* ud2 // 2
- * 1: call __x86_indirect_thunk_r11 // 5 call *%r11; nop2; // 5
+ * 1: cs call __x86_indirect_thunk_r11 // 6 call *%r11; nop3; // 6
*
*/
-asm( ".pushsection .rodata \n"
- "fineibt_preamble_start: \n"
- " endbr64 \n"
- " subl $0x12345678, %r10d \n"
- " je fineibt_preamble_end \n"
- "fineibt_preamble_ud2: \n"
- " ud2 \n"
- " nop \n"
- "fineibt_preamble_end: \n"
+/*
+ * <fineibt_preamble_start>:
+ * 0: f3 0f 1e fa endbr64
+ * 4: 41 81 <ea> 78 56 34 12 sub $0x12345678, %r10d
+ * b: 75 f9 jne 6 <fineibt_preamble_start+0x6>
+ * d: 0f 1f 00 nopl (%rax)
+ */
+asm( ".pushsection .rodata \n"
+ "fineibt_preamble_start: \n"
+ " endbr64 \n"
+ " subl $0x12345678, %r10d \n"
+ " jne fineibt_preamble_start+6 \n"
+ ASM_NOP3
+ "fineibt_preamble_end: \n"
".popsection\n"
);
extern u8 fineibt_preamble_start[];
-extern u8 fineibt_preamble_ud2[];
extern u8 fineibt_preamble_end[];
#define fineibt_preamble_size (fineibt_preamble_end - fineibt_preamble_start)
-#define fineibt_preamble_ud2 (fineibt_preamble_ud2 - fineibt_preamble_start)
+#define fineibt_preamble_ud 6
#define fineibt_preamble_hash 7
+/*
+ * <fineibt_caller_start>:
+ * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
+ * 6: 4d 8d 5b f0 lea -0x10(%r11), %r11
+ * a: 0f 1f 40 00 nopl 0x0(%rax)
+ */
asm( ".pushsection .rodata \n"
"fineibt_caller_start: \n"
" movl $0x12345678, %r10d \n"
- " sub $16, %r11 \n"
+ " lea -0x10(%r11), %r11 \n"
ASM_NOP4
"fineibt_caller_end: \n"
".popsection \n"
@@ -1429,15 +1439,15 @@ static void poison_cfi(void *addr)
}
/*
- * regs->ip points to a UD2 instruction, return true and fill out target and
- * type when this UD2 is from a FineIBT preamble.
+ * When regs->ip points to a 0xEA byte in the FineIBT preamble,
+ * return true and fill out target and type.
*
* We check the preamble by checking for the ENDBR instruction relative to the
- * UD2 instruction.
+ * 0xEA instruction.
*/
bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type)
{
- unsigned long addr = regs->ip - fineibt_preamble_ud2;
+ unsigned long addr = regs->ip - fineibt_preamble_ud;
u32 hash;
if (!exact_endbr((void *)addr))
@@ -1448,6 +1458,12 @@ bool decode_fineibt_insn(struct pt_regs
__get_kernel_nofault(&hash, addr + fineibt_preamble_hash, u32, Efault);
*type = (u32)regs->r10 + hash;
+ /*
+ * Since regs->ip points to the middle of an instruction; it cannot
+ * continue with the normal fixup.
+ */
+ regs->ip = *target;
+
return true;
Efault:
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -417,9 +417,8 @@ static void emit_fineibt(u8 **pprog, u32
EMIT_ENDBR();
EMIT3_off32(0x41, 0x81, 0xea, hash); /* subl $hash, %r10d */
- EMIT2(0x74, 0x07); /* jz.d8 +7 */
- EMIT2(0x0f, 0x0b); /* ud2 */
- EMIT1(0x90); /* nop */
+ EMIT2(0x75, 0xf9); /* jne.d8 .-7 */
+ EMIT3(0x0f, 0x1f, 0x00); /* nop3 */
EMIT_ENDBR_POISON();
*pprog = prog;
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
2025-02-19 16:21 ` [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence Peter Zijlstra
@ 2025-02-19 17:15 ` Andrew Cooper
2025-02-20 18:28 ` Constable, Scott D
2025-02-19 18:01 ` Kees Cook
1 sibling, 1 reply; 38+ messages in thread
From: Andrew Cooper @ 2025-02-19 17:15 UTC (permalink / raw)
To: Peter Zijlstra, x86
Cc: linux-kernel, alyssa.milburn, scott.d.constable, joao, jpoimboe,
jose.marchesi, hjl.tools, ndesaulniers, samitolvanen, nathan,
ojeda, kees, alexei.starovoitov, mhiramat, jmill
On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> Scott notes that non-taken branches are faster. Abuse overlapping code
> that traps instead of explicit UD2 instructions.
>
> And LEA does not modify flags and will have less dependencies.
>
> Suggested-by: Scott Constable <scott.d.constable@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Can we get a bit more info on this "non-taken branches are faster" ?
For modern cores which have branch prediction pre-decode, a branch
unknown to the predictor will behave as non-taken until the Jcc executes[1].
Something size of Linux is surely going to exceed the branch predictor
capacity, so it's perhaps fair to say that there's a reasonable chance
to miss in the predictor.
But, for a branch known to the predictor, taken branches ought to be
bubble-less these days. At least, this is what the marketing material
claims.
And, this doesn't account for branches which alias in the predictor and
end up with a wrong prediction.
~Andrew
[1] Yes, I know RWC has the reintroduced 0xee prefix with the decode
resteer.
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
2025-02-19 17:15 ` Andrew Cooper
@ 2025-02-20 18:28 ` Constable, Scott D
0 siblings, 0 replies; 38+ messages in thread
From: Constable, Scott D @ 2025-02-20 18:28 UTC (permalink / raw)
To: andrew.cooper3@citrix.com, Peter Zijlstra, x86@kernel.org
Cc: linux-kernel@vger.kernel.org, Milburn, Alyssa,
joao@overdrivepizza.com, jpoimboe@kernel.org,
jose.marchesi@oracle.com, hjl.tools@gmail.com,
ndesaulniers@google.com, samitolvanen@google.com,
nathan@kernel.org, ojeda@kernel.org, kees@kernel.org,
alexei.starovoitov@gmail.com, mhiramat@kernel.org, jmill@asu.edu
Hi Andrew,
I can elaborate, if only "a bit." Your intuition about branches is pretty accurate, and the difference between taken vs. not-taken should, on average, be marginal. I can quote from Intel's software optimization manual: "Conditional branches that are never taken do not consume BTB resources." Additionally, there are some more subtle reasons that not-taken branches can be preferable--these vary by microarchitecture.
Regards,
Scott Constable
-----Original Message-----
From: Andrew Cooper <andrew.cooper3@citrix.com>
Sent: Wednesday, February 19, 2025 9:15 AM
To: Peter Zijlstra <peterz@infradead.org>; x86@kernel.org
Cc: linux-kernel@vger.kernel.org; Milburn, Alyssa <alyssa.milburn@intel.com>; Constable, Scott D <scott.d.constable@intel.com>; joao@overdrivepizza.com; jpoimboe@kernel.org; jose.marchesi@oracle.com; hjl.tools@gmail.com; ndesaulniers@google.com; samitolvanen@google.com; nathan@kernel.org; ojeda@kernel.org; kees@kernel.org; alexei.starovoitov@gmail.com; mhiramat@kernel.org; jmill@asu.edu
Subject: Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> Scott notes that non-taken branches are faster. Abuse overlapping code
> that traps instead of explicit UD2 instructions.
>
> And LEA does not modify flags and will have less dependencies.
>
> Suggested-by: Scott Constable <scott.d.constable@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Can we get a bit more info on this "non-taken branches are faster" ?
For modern cores which have branch prediction pre-decode, a branch unknown to the predictor will behave as non-taken until the Jcc executes[1].
Something size of Linux is surely going to exceed the branch predictor capacity, so it's perhaps fair to say that there's a reasonable chance to miss in the predictor.
But, for a branch known to the predictor, taken branches ought to be bubble-less these days. At least, this is what the marketing material claims.
And, this doesn't account for branches which alias in the predictor and end up with a wrong prediction.
~Andrew
[1] Yes, I know RWC has the reintroduced 0xee prefix with the decode resteer.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
2025-02-19 16:21 ` [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence Peter Zijlstra
2025-02-19 17:15 ` Andrew Cooper
@ 2025-02-19 18:01 ` Kees Cook
2025-02-19 18:18 ` Peter Zijlstra
1 sibling, 1 reply; 38+ messages in thread
From: Kees Cook @ 2025-02-19 18:01 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:12PM +0100, Peter Zijlstra wrote:
> Scott notes that non-taken branches are faster. Abuse overlapping code
> that traps instead of explicit UD2 instructions.
Some kind of commenting is needed in here to explicitly call out the
embedded EA in the "subl" instruction. There is a tiny hint of it in the
disassembly dump of fineibt_preamble_start, but it's very subtle for
someone trying to understand this fresh.
> And LEA does not modify flags and will have less dependencies.
>
> Suggested-by: Scott Constable <scott.d.constable@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
But it works!
Reviewed-by: Kees Cook <kees@kernel.org>
> ---
> arch/x86/kernel/alternative.c | 58 ++++++++++++++++++++++++++----------------
> arch/x86/net/bpf_jit_comp.c | 5 +--
> 2 files changed, 39 insertions(+), 24 deletions(-)
>
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1054,9 +1054,9 @@ early_param("cfi", cfi_parse_cmdline);
> * __cfi_\func: __cfi_\func:
> * movl $0x12345678,%eax // 5 endbr64 // 4
> * nop subl $0x12345678,%r10d // 7
> - * nop jz 1f // 2
> - * nop ud2 // 2
> - * nop 1: nop // 1
> + * nop jne __cfi_\func+6 // 2
> + * nop nop3 // 3
> + * nop
> * nop
> * nop
> * nop
> @@ -1068,37 +1068,47 @@ early_param("cfi", cfi_parse_cmdline);
> *
> * caller: caller:
> * movl $(-0x12345678),%r10d // 6 movl $0x12345678,%r10d // 6
> - * addl $-15(%r11),%r10d // 4 sub $16,%r11 // 4
> + * addl $-15(%r11),%r10d // 4 lea -0x10(%r11),%r11 // 4
> * je 1f // 2 nop4 // 4
> * ud2 // 2
> - * 1: call __x86_indirect_thunk_r11 // 5 call *%r11; nop2; // 5
> + * 1: cs call __x86_indirect_thunk_r11 // 6 call *%r11; nop3; // 6
> *
> */
>
> -asm( ".pushsection .rodata \n"
> - "fineibt_preamble_start: \n"
> - " endbr64 \n"
> - " subl $0x12345678, %r10d \n"
> - " je fineibt_preamble_end \n"
> - "fineibt_preamble_ud2: \n"
> - " ud2 \n"
> - " nop \n"
> - "fineibt_preamble_end: \n"
> +/*
> + * <fineibt_preamble_start>:
> + * 0: f3 0f 1e fa endbr64
> + * 4: 41 81 <ea> 78 56 34 12 sub $0x12345678, %r10d
> + * b: 75 f9 jne 6 <fineibt_preamble_start+0x6>
> + * d: 0f 1f 00 nopl (%rax)
> + */
> +asm( ".pushsection .rodata \n"
> + "fineibt_preamble_start: \n"
> + " endbr64 \n"
> + " subl $0x12345678, %r10d \n"
> + " jne fineibt_preamble_start+6 \n"
> + ASM_NOP3
> + "fineibt_preamble_end: \n"
> ".popsection\n"
> );
>
> extern u8 fineibt_preamble_start[];
> -extern u8 fineibt_preamble_ud2[];
> extern u8 fineibt_preamble_end[];
>
> #define fineibt_preamble_size (fineibt_preamble_end - fineibt_preamble_start)
> -#define fineibt_preamble_ud2 (fineibt_preamble_ud2 - fineibt_preamble_start)
> +#define fineibt_preamble_ud 6
> #define fineibt_preamble_hash 7
>
> +/*
> + * <fineibt_caller_start>:
> + * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
> + * 6: 4d 8d 5b f0 lea -0x10(%r11), %r11
> + * a: 0f 1f 40 00 nopl 0x0(%rax)
> + */
> asm( ".pushsection .rodata \n"
> "fineibt_caller_start: \n"
> " movl $0x12345678, %r10d \n"
> - " sub $16, %r11 \n"
> + " lea -0x10(%r11), %r11 \n"
> ASM_NOP4
> "fineibt_caller_end: \n"
> ".popsection \n"
> @@ -1429,15 +1439,15 @@ static void poison_cfi(void *addr)
> }
>
> /*
> - * regs->ip points to a UD2 instruction, return true and fill out target and
> - * type when this UD2 is from a FineIBT preamble.
> + * When regs->ip points to a 0xEA byte in the FineIBT preamble,
> + * return true and fill out target and type.
> *
> * We check the preamble by checking for the ENDBR instruction relative to the
> - * UD2 instruction.
> + * 0xEA instruction.
> */
> bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type)
> {
> - unsigned long addr = regs->ip - fineibt_preamble_ud2;
> + unsigned long addr = regs->ip - fineibt_preamble_ud;
> u32 hash;
>
> if (!exact_endbr((void *)addr))
> @@ -1448,6 +1458,12 @@ bool decode_fineibt_insn(struct pt_regs
> __get_kernel_nofault(&hash, addr + fineibt_preamble_hash, u32, Efault);
> *type = (u32)regs->r10 + hash;
>
> + /*
> + * Since regs->ip points to the middle of an instruction; it cannot
> + * continue with the normal fixup.
> + */
> + regs->ip = *target;
> +
> return true;
>
> Efault:
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -417,9 +417,8 @@ static void emit_fineibt(u8 **pprog, u32
>
> EMIT_ENDBR();
> EMIT3_off32(0x41, 0x81, 0xea, hash); /* subl $hash, %r10d */
> - EMIT2(0x74, 0x07); /* jz.d8 +7 */
> - EMIT2(0x0f, 0x0b); /* ud2 */
> - EMIT1(0x90); /* nop */
> + EMIT2(0x75, 0xf9); /* jne.d8 .-7 */
> + EMIT3(0x0f, 0x1f, 0x00); /* nop3 */
> EMIT_ENDBR_POISON();
>
> *pprog = prog;
>
>
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
2025-02-19 18:01 ` Kees Cook
@ 2025-02-19 18:18 ` Peter Zijlstra
2025-02-19 18:23 ` Kees Cook
0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 18:18 UTC (permalink / raw)
To: Kees Cook
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 10:01:15AM -0800, Kees Cook wrote:
> On Wed, Feb 19, 2025 at 05:21:12PM +0100, Peter Zijlstra wrote:
> > Scott notes that non-taken branches are faster. Abuse overlapping code
> > that traps instead of explicit UD2 instructions.
>
> Some kind of commenting is needed in here to explicitly call out the
> embedded EA in the "subl" instruction. There is a tiny hint of it in the
> disassembly dump of fineibt_preamble_start, but it's very subtle for
> someone trying to understand this fresh.
Ah, but you found my clue :-)
How's this?
---
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1080,6 +1080,9 @@ early_param("cfi", cfi_parse_cmdline);
* 4: 41 81 <ea> 78 56 34 12 sub $0x12345678, %r10d
* b: 75 f9 jne 6 <fineibt_preamble_start+0x6>
* d: 0f 1f 00 nopl (%rax)
+ *
+ * Note that the JNE target is the 0xEA byte inside the SUB, this decodes as
+ * (bad) on x86_64 and raises #UD.
*/
asm( ".pushsection .rodata \n"
"fineibt_preamble_start: \n"
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence
2025-02-19 18:18 ` Peter Zijlstra
@ 2025-02-19 18:23 ` Kees Cook
0 siblings, 0 replies; 38+ messages in thread
From: Kees Cook @ 2025-02-19 18:23 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 07:18:33PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2025 at 10:01:15AM -0800, Kees Cook wrote:
> > On Wed, Feb 19, 2025 at 05:21:12PM +0100, Peter Zijlstra wrote:
> > > Scott notes that non-taken branches are faster. Abuse overlapping code
> > > that traps instead of explicit UD2 instructions.
> >
> > Some kind of commenting is needed in here to explicitly call out the
> > embedded EA in the "subl" instruction. There is a tiny hint of it in the
> > disassembly dump of fineibt_preamble_start, but it's very subtle for
> > someone trying to understand this fresh.
>
> Ah, but you found my clue :-)
>
> How's this?
>
> ---
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -1080,6 +1080,9 @@ early_param("cfi", cfi_parse_cmdline);
> * 4: 41 81 <ea> 78 56 34 12 sub $0x12345678, %r10d
> * b: 75 f9 jne 6 <fineibt_preamble_start+0x6>
> * d: 0f 1f 00 nopl (%rax)
> + *
> + * Note that the JNE target is the 0xEA byte inside the SUB, this decodes as
> + * (bad) on x86_64 and raises #UD.
> */
> asm( ".pushsection .rodata \n"
> "fineibt_preamble_start: \n"
Better! Thank you. :)
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (4 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 16:45 ` Peter Zijlstra
2025-02-19 18:20 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode Peter Zijlstra
` (5 subsequent siblings)
11 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
Because overlapping code sequences are all the rage.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/bug.h | 2 ++
arch/x86/kernel/traps.c | 30 +++++++++++++++++++++++++-----
2 files changed, 27 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/bug.h
+++ b/arch/x86/include/asm/bug.h
@@ -17,6 +17,7 @@
* In clang we have UD1s reporting UBSAN failures on X86, 64 and 32bit.
*/
#define INSN_ASOP 0x67
+#define INSN_LOCK 0xf0
#define OPCODE_ESCAPE 0x0f
#define SECOND_BYTE_OPCODE_UD1 0xb9
#define SECOND_BYTE_OPCODE_UD2 0x0b
@@ -26,6 +27,7 @@
#define BUG_UD1 0xfffd
#define BUG_UD1_UBSAN 0xfffc
#define BUG_EA 0xffea
+#define BUG_LOCK 0xfff0
#ifdef CONFIG_GENERIC_BUG
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -97,6 +97,7 @@ __always_inline int is_valid_bugaddr(uns
* If it's a UD1, further decode to determine its use:
*
* FineIBT: ea (bad)
+ * FineIBT: 0f 75 f9 lock jne . - 6
* UBSan{0}: 67 0f b9 00 ud1 (%eax),%eax
* UBSan{10}: 67 0f b9 40 10 ud1 0x10(%eax),%eax
* static_call: 0f b9 cc ud1 %esp,%ecx
@@ -106,6 +107,7 @@ __always_inline int is_valid_bugaddr(uns
__always_inline int decode_bug(unsigned long addr, s32 *imm, int *len)
{
unsigned long start = addr;
+ bool lock = false;
u8 v;
if (addr < TASK_SIZE_MAX)
@@ -114,12 +116,29 @@ __always_inline int decode_bug(unsigned
v = *(u8 *)(addr++);
if (v == INSN_ASOP)
v = *(u8 *)(addr++);
- if (v == 0xea) {
+
+ if (v == INSN_LOCK) {
+ lock = true;
+ v = *(u8 *)(addr++);
+ }
+
+ switch (v) {
+ case 0x70 ... 0x7f: /* Jcc.d8 */
+ addr += 1; /* d8 */
+ *len = addr - start;
+ WARN_ON_ONCE(!lock);
+ return BUG_LOCK;
+
+ case 0xea:
*len = addr - start;
return BUG_EA;
- }
- if (v != OPCODE_ESCAPE)
+
+ case OPCODE_ESCAPE:
+ break;
+
+ default:
return BUG_NONE;
+ }
v = *(u8 *)(addr++);
if (v == SECOND_BYTE_OPCODE_UD2) {
@@ -315,7 +334,8 @@ static noinstr bool handle_bug(struct pt
switch (ud_type) {
case BUG_EA:
- if (handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
+ case BUG_LOCK:
+ if (handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
if (regs->ip == addr)
regs->ip += ud_len;
handled = true;
@@ -324,7 +344,7 @@ static noinstr bool handle_bug(struct pt
case BUG_UD2:
if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ||
- handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
+ handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
if (regs->ip == addr)
regs->ip += ud_len;
handled = true;
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD
2025-02-19 16:21 ` [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD Peter Zijlstra
@ 2025-02-19 16:45 ` Peter Zijlstra
2025-02-19 18:20 ` Kees Cook
1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:45 UTC (permalink / raw)
To: x86
Cc: linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
On Wed, Feb 19, 2025 at 05:21:13PM +0100, Peter Zijlstra wrote:
> @@ -315,7 +334,8 @@ static noinstr bool handle_bug(struct pt
>
> switch (ud_type) {
> case BUG_EA:
> - if (handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
> + case BUG_LOCK:
> + if (handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
> if (regs->ip == addr)
> regs->ip += ud_len;
> handled = true;
> @@ -324,7 +344,7 @@ static noinstr bool handle_bug(struct pt
>
> case BUG_UD2:
> if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ||
> - handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
> + handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
> if (regs->ip == addr)
> regs->ip += ud_len;
> handled = true;
>
Damn, that ud_type change belongs to the next patch. Consider it fixed.
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD
2025-02-19 16:21 ` [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD Peter Zijlstra
2025-02-19 16:45 ` Peter Zijlstra
@ 2025-02-19 18:20 ` Kees Cook
2025-02-19 18:33 ` Peter Zijlstra
1 sibling, 1 reply; 38+ messages in thread
From: Kees Cook @ 2025-02-19 18:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:13PM +0100, Peter Zijlstra wrote:
> Because overlapping code sequences are all the rage.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Semi-pointless stream of consciousness below...
> ---
> arch/x86/include/asm/bug.h | 2 ++
> arch/x86/kernel/traps.c | 30 +++++++++++++++++++++++++-----
> 2 files changed, 27 insertions(+), 5 deletions(-)
>
> --- a/arch/x86/include/asm/bug.h
> +++ b/arch/x86/include/asm/bug.h
> @@ -17,6 +17,7 @@
> * In clang we have UD1s reporting UBSAN failures on X86, 64 and 32bit.
> */
> #define INSN_ASOP 0x67
> +#define INSN_LOCK 0xf0
> #define OPCODE_ESCAPE 0x0f
> #define SECOND_BYTE_OPCODE_UD1 0xb9
> #define SECOND_BYTE_OPCODE_UD2 0x0b
> @@ -26,6 +27,7 @@
> #define BUG_UD1 0xfffd
> #define BUG_UD1_UBSAN 0xfffc
> #define BUG_EA 0xffea
> +#define BUG_LOCK 0xfff0
>
> #ifdef CONFIG_GENERIC_BUG
>
> --- a/arch/x86/kernel/traps.c
> +++ b/arch/x86/kernel/traps.c
> @@ -97,6 +97,7 @@ __always_inline int is_valid_bugaddr(uns
> * If it's a UD1, further decode to determine its use:
> *
> * FineIBT: ea (bad)
> + * FineIBT: 0f 75 f9 lock jne . - 6
> * UBSan{0}: 67 0f b9 00 ud1 (%eax),%eax
> * UBSan{10}: 67 0f b9 40 10 ud1 0x10(%eax),%eax
> * static_call: 0f b9 cc ud1 %esp,%ecx
> @@ -106,6 +107,7 @@ __always_inline int is_valid_bugaddr(uns
> __always_inline int decode_bug(unsigned long addr, s32 *imm, int *len)
> {
> unsigned long start = addr;
> + bool lock = false;
> u8 v;
>
> if (addr < TASK_SIZE_MAX)
> @@ -114,12 +116,29 @@ __always_inline int decode_bug(unsigned
> v = *(u8 *)(addr++);
> if (v == INSN_ASOP)
> v = *(u8 *)(addr++);
> - if (v == 0xea) {
> +
> + if (v == INSN_LOCK) {
> + lock = true;
> + v = *(u8 *)(addr++);
> + }
> +
> + switch (v) {
> + case 0x70 ... 0x7f: /* Jcc.d8 */
> + addr += 1; /* d8 */
> + *len = addr - start;
> + WARN_ON_ONCE(!lock);
> + return BUG_LOCK;
> +
> + case 0xea:
> *len = addr - start;
> return BUG_EA;
> - }
> - if (v != OPCODE_ESCAPE)
> +
> + case OPCODE_ESCAPE:
> + break;
> +
> + default:
> return BUG_NONE;
> + }
>
> v = *(u8 *)(addr++);
> if (v == SECOND_BYTE_OPCODE_UD2) {
> @@ -315,7 +334,8 @@ static noinstr bool handle_bug(struct pt
>
> switch (ud_type) {
> case BUG_EA:
> - if (handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
> + case BUG_LOCK:
> + if (handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
> if (regs->ip == addr)
> regs->ip += ud_len;
> handled = true;
> @@ -324,7 +344,7 @@ static noinstr bool handle_bug(struct pt
>
> case BUG_UD2:
> if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ||
> - handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
> + handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
> if (regs->ip == addr)
> regs->ip += ud_len;
> handled = true;
I realize these are misplaced chunks, but passing ud_type into the
handler feels like a layering violation to me. I struggled with this
when making recommendations for the UBSAN handler too, so I'm not sure
I have any better idea. It feels like there should be a way to separate
this logic more cleanly. The handlers are all doing very similar things:
1- find the address where a bad thing happened
2- report about it
3- whether to continue execution
4- where to continue execution
The variability happens with 1 and 4, where it depends on the instruction
sequences. Meh, I dunno. I can't see anything cleaner, so passing down
ud_type does seem best.
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD
2025-02-19 18:20 ` Kees Cook
@ 2025-02-19 18:33 ` Peter Zijlstra
2025-02-19 19:44 ` Peter Zijlstra
0 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 18:33 UTC (permalink / raw)
To: Kees Cook
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 10:20:25AM -0800, Kees Cook wrote:
> I realize these are misplaced chunks, but passing ud_type into the
> handler feels like a layering violation to me. I struggled with this
> when making recommendations for the UBSAN handler too, so I'm not sure
> I have any better idea. It feels like there should be a way to separate
> this logic more cleanly. The handlers are all doing very similar things:
>
> 1- find the address where a bad thing happened
> 2- report about it
> 3- whether to continue execution
> 4- where to continue execution
>
> The variability happens with 1 and 4, where it depends on the instruction
> sequences. Meh, I dunno. I can't see anything cleaner, so passing down
> ud_type does seem best.
Yeah, agreed. I couldn't get rid of relying on ud_type entirely (it was
worse), I'll see if I can come up something.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD
2025-02-19 18:33 ` Peter Zijlstra
@ 2025-02-19 19:44 ` Peter Zijlstra
0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 19:44 UTC (permalink / raw)
To: Kees Cook
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 07:33:30PM +0100, Peter Zijlstra wrote:
> On Wed, Feb 19, 2025 at 10:20:25AM -0800, Kees Cook wrote:
>
> > I realize these are misplaced chunks, but passing ud_type into the
> > handler feels like a layering violation to me. I struggled with this
> > when making recommendations for the UBSAN handler too, so I'm not sure
> > I have any better idea. It feels like there should be a way to separate
> > this logic more cleanly. The handlers are all doing very similar things:
> >
> > 1- find the address where a bad thing happened
> > 2- report about it
> > 3- whether to continue execution
> > 4- where to continue execution
> >
> > The variability happens with 1 and 4, where it depends on the instruction
> > sequences. Meh, I dunno. I can't see anything cleaner, so passing down
> > ud_type does seem best.
>
> Yeah, agreed. I couldn't get rid of relying on ud_type entirely (it was
> worse), I'll see if I can come up something.
Completely untested, but I think it should work...
diff --git a/arch/x86/include/asm/cfi.h b/arch/x86/include/asm/cfi.h
index 896cf8cf4ea7..2f6a01f098b5 100644
--- a/arch/x86/include/asm/cfi.h
+++ b/arch/x86/include/asm/cfi.h
@@ -109,7 +109,7 @@ extern bhi_thunk __bhi_args_end[];
struct pt_regs;
#ifdef CONFIG_CFI_CLANG
-enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs);
+enum bug_trap_type handle_cfi_failure(struct pt_regs *regs);
#define __bpfcall
extern u32 cfi_bpf_hash;
extern u32 cfi_bpf_subprog_hash;
@@ -133,10 +133,10 @@ extern u32 cfi_get_func_hash(void *func);
extern int cfi_get_func_arity(void *func);
#ifdef CONFIG_FINEIBT
-extern bool decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u32 *type);
+extern bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type);
#else
static inline bool
-decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u32 *type)
+decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type)
{
return false;
}
@@ -144,7 +144,7 @@ decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u3
#endif
#else
-static inline enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs)
+static inline enum bug_trap_type handle_cfi_failure(struct pt_regs *regs)
{
return BUG_TRAP_TYPE_NONE;
}
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 5bf5c0283259..d00eabf092f2 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1631,7 +1631,7 @@ static void poison_cfi(void *addr)
* We check the preamble by checking for the ENDBR instruction relative to the
* 0xEA instruction.
*/
-static bool decode_fineibt_preamble(int ud_type, struct pt_regs *regs,
+static bool decode_fineibt_preamble(struct pt_regs *regs,
unsigned long *target, u32 *type)
{
unsigned long addr = regs->ip - fineibt_preamble_ud;
@@ -1660,7 +1660,7 @@ static bool decode_fineibt_preamble(int ud_type, struct pt_regs *regs,
/*
* regs->ip points to one of the UD2 in __bhi_args[].
*/
-static bool decode_fineibt_bhi(int ud_type, struct pt_regs *regs,
+static bool decode_fineibt_bhi(struct pt_regs *regs,
unsigned long *target, u32 *type)
{
unsigned long addr;
@@ -1696,12 +1696,15 @@ static bool decode_fineibt_bhi(int ud_type, struct pt_regs *regs,
* regs->ip points to a LOCK Jcc.d8 instruction from the fineibt_paranoid_start[]
* sequence.
*/
-static bool decode_fineibt_paranoid(int ud_type, struct pt_regs *regs,
+static bool decode_fineibt_paranoid(struct pt_regs *regs,
unsigned long *target, u32 *type)
{
unsigned long addr = regs->ip - fineibt_paranoid_ud;
u32 hash;
+ if (!is_cfi_trap(addr + fineibt_caller_size - LEN_UD2))
+ return false;
+
__get_kernel_nofault(&hash, addr + fineibt_caller_hash, u32, Efault);
*target = regs->r11 + fineibt_preamble_size;
*type = regs->r10;
@@ -1716,15 +1719,17 @@ static bool decode_fineibt_paranoid(int ud_type, struct pt_regs *regs,
return false;
}
-bool decode_fineibt_insn(int ud_type, struct pt_regs *regs,
+bool decode_fineibt_insn(struct pt_regs *regs,
unsigned long *target, u32 *type)
{
- if (ud_type == BUG_LOCK)
- return decode_fineibt_paranoid(ud_type, regs, target, type);
if (regs->ip > (unsigned long)__bhi_args &&
regs->ip < (unsigned long)__bhi_args_end)
- return decode_fineibt_bhi(ud_type, regs, target, type);
- return decode_fineibt_preamble(ud_type, regs, target, type);
+ return decode_fineibt_bhi(regs, target, type);
+
+ if (decode_fineibt_paranoid(regs, target, type))
+ return true;
+
+ return decode_fineibt_preamble(regs, target, type);
}
#else
diff --git a/arch/x86/kernel/cfi.c b/arch/x86/kernel/cfi.c
index a1b03255f4b9..77086cf565ec 100644
--- a/arch/x86/kernel/cfi.c
+++ b/arch/x86/kernel/cfi.c
@@ -65,7 +65,7 @@ static bool decode_cfi_insn(struct pt_regs *regs, unsigned long *target,
* Checks if a ud2 trap is because of a CFI failure, and handles the trap
* if needed. Returns a bug_trap_type value similarly to report_bug.
*/
-enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs)
+enum bug_trap_type handle_cfi_failure(struct pt_regs *regs)
{
unsigned long target, addr = regs->ip;
u32 type;
@@ -81,7 +81,7 @@ enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs)
break;
case CFI_FINEIBT:
- if (!decode_fineibt_insn(ud_type, regs, &target, &type))
+ if (!decode_fineibt_insn(regs, &target, &type))
return BUG_TRAP_TYPE_NONE;
break;
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 67fa578c451b..8945e3e23240 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -333,21 +333,18 @@ static noinstr bool handle_bug(struct pt_regs *regs)
raw_local_irq_enable();
switch (ud_type) {
- case BUG_EA:
- case BUG_LOCK:
- if (handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
- if (regs->ip == addr)
- regs->ip += ud_len;
+ case BUG_UD2:
+ if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ) {
handled = true;
+ break;
}
- break;
+ fallthrough;
- case BUG_UD2:
- if (report_bug(regs->ip, regs) == BUG_TRAP_TYPE_WARN ||
- handle_cfi_failure(ud_type, regs) == BUG_TRAP_TYPE_WARN) {
- if (regs->ip == addr)
- regs->ip += ud_len;
+ case BUG_EA:
+ case BUG_LOCK:
+ if (handle_cfi_failure(regs) == BUG_TRAP_TYPE_WARN) {
handled = true;
+ break;
}
break;
@@ -363,6 +360,10 @@ static noinstr bool handle_bug(struct pt_regs *regs)
break;
}
+ /* Continue after the trapping instruction, unless overridden. */
+ if (handled && regs->ip == addr)
+ regs->ip += ud_len;
+
/* Restore failure location if we're not continuing execution. */
if (!handled && regs->ip != addr)
regs->ip = addr;
^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (5 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 06/10] x86/traps: Decode LOCK Jcc.d8 #UD Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 17:31 ` Andrew Cooper
2025-02-19 18:05 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 08/10] x86: BHI stubs Peter Zijlstra
` (4 subsequent siblings)
11 siblings, 2 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
Due to concerns about circumvention attacks against FineIBT on 'naked'
ENDBR, add an additional caller side hash check to FineIBT. This
should make it impossible to pivot over such a 'naked' ENDBR
instruction at the cost of an additional load.
The specific pivot reported was against the SYSCALL entry site and
FRED will have all those holes fixed up.
This specific fineibt_paranoid_start[] sequence was concocted by
Scott.
Reported-by: Jennifer Miller <jmill@asu.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250215210729.GA25168@noisy.programming.kicks-ass.net
---
arch/x86/include/asm/cfi.h | 8 +-
arch/x86/kernel/alternative.c | 143 ++++++++++++++++++++++++++++++++++++++++--
arch/x86/kernel/cfi.c | 4 -
3 files changed, 143 insertions(+), 12 deletions(-)
--- a/arch/x86/include/asm/cfi.h
+++ b/arch/x86/include/asm/cfi.h
@@ -105,7 +105,7 @@ extern bool cfi_warn;
struct pt_regs;
#ifdef CONFIG_CFI_CLANG
-enum bug_trap_type handle_cfi_failure(struct pt_regs *regs);
+enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs);
#define __bpfcall
extern u32 cfi_bpf_hash;
extern u32 cfi_bpf_subprog_hash;
@@ -128,10 +128,10 @@ static inline int cfi_get_offset(void)
extern u32 cfi_get_func_hash(void *func);
#ifdef CONFIG_FINEIBT
-extern bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type);
+extern bool decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u32 *type);
#else
static inline bool
-decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type)
+decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u32 *type)
{
return false;
}
@@ -139,7 +139,7 @@ decode_fineibt_insn(struct pt_regs *regs
#endif
#else
-static inline enum bug_trap_type handle_cfi_failure(struct pt_regs *regs)
+static inline enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs)
{
return BUG_TRAP_TYPE_NONE;
}
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -741,6 +741,11 @@ void __init_or_module noinline apply_ret
op2 = insn.opcode.bytes[1];
switch (op1) {
+ case 0x70 ... 0x7f: /* Jcc.d8 */
+ /* See cfi_paranoid. */
+ WARN_ON_ONCE(cfi_mode != CFI_FINEIBT);
+ continue;
+
case CALL_INSN_OPCODE:
case JMP32_INSN_OPCODE:
break;
@@ -995,6 +1000,8 @@ u32 cfi_get_func_hash(void *func)
static bool cfi_rand __ro_after_init = true;
static u32 cfi_seed __ro_after_init;
+static bool cfi_paranoid __ro_after_init = false;
+
/*
* Re-hash the CFI hash with a boot-time seed while making sure the result is
* not a valid ENDBR instruction.
@@ -1037,6 +1044,12 @@ static __init int cfi_parse_cmdline(char
} else if (!strcmp(str, "warn")) {
pr_alert("CFI mismatch non-fatal!\n");
cfi_warn = true;
+ } else if (!strcmp(str, "paranoid")) {
+ if (cfi_mode == CFI_FINEIBT) {
+ cfi_paranoid = true;
+ } else {
+ pr_err("Ignoring paranoid; depends on fineibt.\n");
+ }
} else {
pr_err("Ignoring unknown cfi option (%s).", str);
}
@@ -1116,6 +1129,52 @@ extern u8 fineibt_caller_end[];
#define fineibt_caller_jmp (fineibt_caller_size - 2)
+/*
+ * Since FineIBT does hash validation on the callee side it is prone to
+ * circumvention attacks where a 'naked' ENDBR instruction exists that
+ * is not part of the fineibt_preamble sequence.
+ *
+ * Notably the x86 entry points must be ENDBR and equally cannot be
+ * fineibt_preamble.
+ *
+ * The fineibt_paranoid caller sequence adds additional caller side
+ * hash validation. This stops such circumvetion attacks dead, but at the cost
+ * of adding a load.
+ *
+ * <fineibt_paranoid_start>:
+ * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
+ * 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d
+ * a: 4d 8d 5b <f0> lea -0x10(%r11), %r11
+ * e: 75 fd jne d <fineibt_paranoid_start+0xd>
+ * 10: 41 ff d3 call *%r11
+ * 13: 90 nop
+ *
+ * Notably LEA does not modify flags and can be reordered with the CMP,
+ * avoiding a dependency. Again, using a non-taken (backwards) branch
+ * for the failure case, abusing LEA's immediate 0xf0 as LOCK prefix for the
+ * JCC.d8, causing #UD.
+ */
+asm( ".pushsection .rodata \n"
+ "fineibt_paranoid_start: \n"
+ " movl $0x12345678, %r10d \n"
+ " cmpl -9(%r11), %r10d \n"
+ " lea -0x10(%r11), %r11 \n"
+ " jne fineibt_paranoid_start+0xd \n"
+ "fineibt_paranoid_ind: \n"
+ " call *%r11 \n"
+ " nop \n"
+ "fineibt_paranoid_end: \n"
+ ".popsection \n"
+);
+
+extern u8 fineibt_paranoid_start[];
+extern u8 fineibt_paranoid_ind[];
+extern u8 fineibt_paranoid_end[];
+
+#define fineibt_paranoid_size (fineibt_paranoid_end - fineibt_paranoid_start)
+#define fineibt_paranoid_ind (fineibt_paranoid_ind - fineibt_paranoid_start)
+#define fineibt_paranoid_ud 0xd
+
static u32 decode_preamble_hash(void *addr)
{
u8 *p = addr;
@@ -1279,18 +1338,48 @@ static int cfi_rewrite_callers(s32 *star
{
s32 *s;
+ BUG_ON(fineibt_paranoid_size != 20);
+
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ struct insn insn;
+ u8 bytes[20];
u32 hash;
+ int ret;
+ u8 op;
addr -= fineibt_caller_size;
hash = decode_caller_hash(addr);
- if (hash) {
+ if (!hash)
+ continue;
+
+ if (!cfi_paranoid) {
text_poke_early(addr, fineibt_caller_start, fineibt_caller_size);
WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678);
text_poke_early(addr + fineibt_caller_hash, &hash, 4);
+ /* rely on apply_retpolines() */
+ continue;
}
- /* rely on apply_retpolines() */
+
+ /* cfi_paranoid */
+ ret = insn_decode_kernel(&insn, addr + fineibt_caller_size);
+ if (WARN_ON_ONCE(ret < 0))
+ continue;
+
+ op = insn.opcode.bytes[0];
+ if (op != CALL_INSN_OPCODE && op != JMP32_INSN_OPCODE) {
+ WARN_ON_ONCE(1);
+ continue;
+ }
+
+ memcpy(bytes, fineibt_paranoid_start, fineibt_paranoid_size);
+ memcpy(bytes + fineibt_caller_hash, &hash, 4);
+
+ ret = emit_indirect(op, 11, bytes + fineibt_paranoid_ind);
+ if (WARN_ON_ONCE(ret != 3))
+ continue;
+
+ text_poke_early(addr, bytes, fineibt_paranoid_size);
}
return 0;
@@ -1307,8 +1396,15 @@ static void __apply_fineibt(s32 *start_r
if (cfi_mode == CFI_AUTO) {
cfi_mode = CFI_KCFI;
- if (HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT))
+ if (HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT)) {
+ /*
+ * FRED has much saner context on exception entry and
+ * is less easy to take advantage of.
+ */
+ if (!cpu_feature_enabled(X86_FEATURE_FRED))
+ cfi_paranoid = true;
cfi_mode = CFI_FINEIBT;
+ }
}
/*
@@ -1365,8 +1461,10 @@ static void __apply_fineibt(s32 *start_r
/* now that nobody targets func()+0, remove ENDBR there */
cfi_rewrite_endbr(start_cfi, end_cfi);
- if (builtin)
- pr_info("Using FineIBT CFI\n");
+ if (builtin) {
+ pr_info("Using %sFineIBT CFI\n",
+ cfi_paranoid ? "paranoid " : "");
+ }
return;
default:
@@ -1439,7 +1537,8 @@ static void poison_cfi(void *addr)
* We check the preamble by checking for the ENDBR instruction relative to the
* 0xEA instruction.
*/
-bool decode_fineibt_insn(struct pt_regs *regs, unsigned long *target, u32 *type)
+static bool decode_fineibt_preamble(int ud_type, struct pt_regs *regs,
+ unsigned long *target, u32 *type)
{
unsigned long addr = regs->ip - fineibt_preamble_ud;
u32 hash;
@@ -1464,6 +1563,38 @@ bool decode_fineibt_insn(struct pt_regs
return false;
}
+/*
+ * regs->ip points to a LOCK Jcc.d8 instruction from the fineibt_paranoid_start[]
+ * sequence.
+ */
+static bool decode_fineibt_paranoid(int ud_type, struct pt_regs *regs,
+ unsigned long *target, u32 *type)
+{
+ unsigned long addr = regs->ip - fineibt_paranoid_ud;
+ u32 hash;
+
+ __get_kernel_nofault(&hash, addr + fineibt_caller_hash, u32, Efault);
+ *target = regs->r11 + fineibt_preamble_size;
+ *type = regs->r10;
+
+ /*
+ * Since the trapping instruction is the exact, but LOCK prefixed,
+ * Jcc.d8 that got us here, the normal fixup will work.
+ */
+ return true;
+
+Efault:
+ return false;
+}
+
+bool decode_fineibt_insn(int ud_type, struct pt_regs *regs,
+ unsigned long *target, u32 *type)
+{
+ if (ud_type == BUG_LOCK)
+ return decode_fineibt_paranoid(ud_type, regs, target, type);
+ return decode_fineibt_preamble(ud_type, regs, target, type);
+}
+
#else
static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
--- a/arch/x86/kernel/cfi.c
+++ b/arch/x86/kernel/cfi.c
@@ -65,7 +65,7 @@ static bool decode_cfi_insn(struct pt_re
* Checks if a ud2 trap is because of a CFI failure, and handles the trap
* if needed. Returns a bug_trap_type value similarly to report_bug.
*/
-enum bug_trap_type handle_cfi_failure(struct pt_regs *regs)
+enum bug_trap_type handle_cfi_failure(int ud_type, struct pt_regs *regs)
{
unsigned long target, addr = regs->ip;
enum bug_trap_type btt;
@@ -82,7 +82,7 @@ enum bug_trap_type handle_cfi_failure(st
break;
case CFI_FINEIBT:
- if (!decode_fineibt_insn(regs, &target, &type))
+ if (!decode_fineibt_insn(ud_type, regs, &target, &type))
return BUG_TRAP_TYPE_NONE;
break;
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode
2025-02-19 16:21 ` [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode Peter Zijlstra
@ 2025-02-19 17:31 ` Andrew Cooper
2025-02-19 20:07 ` Peter Zijlstra
2025-02-21 13:40 ` David Laight
2025-02-19 18:05 ` Kees Cook
1 sibling, 2 replies; 38+ messages in thread
From: Andrew Cooper @ 2025-02-19 17:31 UTC (permalink / raw)
To: Peter Zijlstra, x86
Cc: linux-kernel, alyssa.milburn, scott.d.constable, joao, jpoimboe,
jose.marchesi, hjl.tools, ndesaulniers, samitolvanen, nathan,
ojeda, kees, alexei.starovoitov, mhiramat, jmill
On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> --- a/arch/x86/include/asm/cfi.h
> +++ b/arch/x86/include/asm/cfi.h
> @@ -1116,6 +1129,52 @@ extern u8 fineibt_caller_end[];
>
> #define fineibt_caller_jmp (fineibt_caller_size - 2)
>
> +/*
> + * Since FineIBT does hash validation on the callee side it is prone to
> + * circumvention attacks where a 'naked' ENDBR instruction exists that
> + * is not part of the fineibt_preamble sequence.
> + *
> + * Notably the x86 entry points must be ENDBR and equally cannot be
> + * fineibt_preamble.
> + *
> + * The fineibt_paranoid caller sequence adds additional caller side
> + * hash validation. This stops such circumvetion attacks dead, but at the cost
> + * of adding a load.
> + *
> + * <fineibt_paranoid_start>:
> + * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
> + * 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d
> + * a: 4d 8d 5b <f0> lea -0x10(%r11), %r11
> + * e: 75 fd jne d <fineibt_paranoid_start+0xd>
> + * 10: 41 ff d3 call *%r11
> + * 13: 90 nop
> + *
> + * Notably LEA does not modify flags and can be reordered with the CMP,
> + * avoiding a dependency. Again, using a non-taken (backwards) branch
> + * for the failure case, abusing LEA's immediate 0xf0 as LOCK prefix for the
> + * JCC.d8, causing #UD.
> + */
I don't know what to say. This is equal parts horrifying and beautiful.
> +asm( ".pushsection .rodata \n"
> + "fineibt_paranoid_start: \n"
> + " movl $0x12345678, %r10d \n"
> + " cmpl -9(%r11), %r10d \n"
> + " lea -0x10(%r11), %r11 \n"
> + " jne fineibt_paranoid_start+0xd \n"
Maybe `jne . - 3` ?
Or perhaps `1: jne 1b - 1` ?
Both seem marginally less fragile than tying the reference to
fineibt_paranoid_start.
~Andrew
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode
2025-02-19 17:31 ` Andrew Cooper
@ 2025-02-19 20:07 ` Peter Zijlstra
2025-02-21 13:40 ` David Laight
1 sibling, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 20:07 UTC (permalink / raw)
To: Andrew Cooper
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
jpoimboe, jose.marchesi, hjl.tools, ndesaulniers, samitolvanen,
nathan, ojeda, kees, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:31:39PM +0000, Andrew Cooper wrote:
> On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> > --- a/arch/x86/include/asm/cfi.h
> > +++ b/arch/x86/include/asm/cfi.h
> > @@ -1116,6 +1129,52 @@ extern u8 fineibt_caller_end[];
> >
> > #define fineibt_caller_jmp (fineibt_caller_size - 2)
> >
> > +/*
> > + * Since FineIBT does hash validation on the callee side it is prone to
> > + * circumvention attacks where a 'naked' ENDBR instruction exists that
> > + * is not part of the fineibt_preamble sequence.
> > + *
> > + * Notably the x86 entry points must be ENDBR and equally cannot be
> > + * fineibt_preamble.
> > + *
> > + * The fineibt_paranoid caller sequence adds additional caller side
> > + * hash validation. This stops such circumvetion attacks dead, but at the cost
> > + * of adding a load.
> > + *
> > + * <fineibt_paranoid_start>:
> > + * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
> > + * 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d
> > + * a: 4d 8d 5b <f0> lea -0x10(%r11), %r11
> > + * e: 75 fd jne d <fineibt_paranoid_start+0xd>
> > + * 10: 41 ff d3 call *%r11
> > + * 13: 90 nop
> > + *
> > + * Notably LEA does not modify flags and can be reordered with the CMP,
> > + * avoiding a dependency. Again, using a non-taken (backwards) branch
> > + * for the failure case, abusing LEA's immediate 0xf0 as LOCK prefix for the
> > + * JCC.d8, causing #UD.
> > + */
>
> I don't know what to say. This is equal parts horrifying and beautiful.
>
> > +asm( ".pushsection .rodata \n"
> > + "fineibt_paranoid_start: \n"
> > + " movl $0x12345678, %r10d \n"
> > + " cmpl -9(%r11), %r10d \n"
> > + " lea -0x10(%r11), %r11 \n"
> > + " jne fineibt_paranoid_start+0xd \n"
>
> Maybe `jne . - 3` ?
>
> Or perhaps `1: jne 1b - 1` ?
>
> Both seem marginally less fragile than tying the reference to
> fineibt_paranoid_start.
Right, so I initially had '. - 3' (and '. - 7' in
fineibt_preamble_start), but I ended up going with this form because
that's what you end up with if you run it through an assembler and
disassembler.
So I figured this form was easier to compare vs disassembly, which is
what most people will see if they ever look at this.
Anyway, I'm happy to go '. - 3' again if that's preferred.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode
2025-02-19 17:31 ` Andrew Cooper
2025-02-19 20:07 ` Peter Zijlstra
@ 2025-02-21 13:40 ` David Laight
1 sibling, 0 replies; 38+ messages in thread
From: David Laight @ 2025-02-21 13:40 UTC (permalink / raw)
To: Andrew Cooper
Cc: Peter Zijlstra, x86, linux-kernel, alyssa.milburn,
scott.d.constable, joao, jpoimboe, jose.marchesi, hjl.tools,
ndesaulniers, samitolvanen, nathan, ojeda, kees,
alexei.starovoitov, mhiramat, jmill
On Wed, 19 Feb 2025 17:31:39 +0000
Andrew Cooper <andrew.cooper3@citrix.com> wrote:
> On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> > --- a/arch/x86/include/asm/cfi.h
> > +++ b/arch/x86/include/asm/cfi.h
> > @@ -1116,6 +1129,52 @@ extern u8 fineibt_caller_end[];
> >
> > #define fineibt_caller_jmp (fineibt_caller_size - 2)
> >
> > +/*
> > + * Since FineIBT does hash validation on the callee side it is prone to
> > + * circumvention attacks where a 'naked' ENDBR instruction exists that
> > + * is not part of the fineibt_preamble sequence.
> > + *
> > + * Notably the x86 entry points must be ENDBR and equally cannot be
> > + * fineibt_preamble.
> > + *
> > + * The fineibt_paranoid caller sequence adds additional caller side
> > + * hash validation. This stops such circumvetion attacks dead, but at the cost
> > + * of adding a load.
> > + *
> > + * <fineibt_paranoid_start>:
> > + * 0: 41 ba 78 56 34 12 mov $0x12345678, %r10d
> > + * 6: 45 3b 53 f7 cmp -0x9(%r11), %r10d
> > + * a: 4d 8d 5b <f0> lea -0x10(%r11), %r11
I think that 0x10 is the size of the cfi premable?
There should probably be at least a comment to that effect.
(Maybe there is, but I'm missing the actual patch email.)
> > + * e: 75 fd jne d <fineibt_paranoid_start+0xd>
> > + * 10: 41 ff d3 call *%r11
> > + * 13: 90 nop
> > + *
> > + * Notably LEA does not modify flags and can be reordered with the CMP,
> > + * avoiding a dependency.
Is that even worth saying?
Given that the cpu does 'register renaming' the lea might execute in the
same clock as the mov.
What you do get is a few clocks of stall (maybe 4 if in L1 cache, but
a data read of code memory is unlikely to be there - so it'll be from
the L2 cache) for the memory load.
That means that the jne is speculatively executed (and I think that is
separate from any prefetch speculation), I'll give it 50% taken.
(Or maybe 100% if backwards branches get predicted taken. I don't think
current Intel cpu do that - they just use whatever in in the branch
prediction slot.)
> > + * Again, using a non-taken (backwards) branch
> > + * for the failure case, abusing LEA's immediate 0xf0 as LOCK prefix for the
> > + * JCC.d8, causing #UD.
> > + */
>
> I don't know what to say. This is equal parts horrifying and beautiful.
Agreed.
Are you absolutely sure that all cpu have (and will) always #UD the unexpected
LOCK prefix on a Jcc instruction.
My 80386 book does say it will #UD, but I can imagine it being ignored
or even repurposed.
David
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode
2025-02-19 16:21 ` [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode Peter Zijlstra
2025-02-19 17:31 ` Andrew Cooper
@ 2025-02-19 18:05 ` Kees Cook
1 sibling, 0 replies; 38+ messages in thread
From: Kees Cook @ 2025-02-19 18:05 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:14PM +0100, Peter Zijlstra wrote:
> Due to concerns about circumvention attacks against FineIBT on 'naked'
> ENDBR, add an additional caller side hash check to FineIBT. This
> should make it impossible to pivot over such a 'naked' ENDBR
> instruction at the cost of an additional load.
>
> The specific pivot reported was against the SYSCALL entry site and
> FRED will have all those holes fixed up.
>
> This specific fineibt_paranoid_start[] sequence was concocted by
> Scott.
>
> Reported-by: Jennifer Miller <jmill@asu.edu>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
With patch 6's misplaced chunk moved, looks good:
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 08/10] x86: BHI stubs
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (6 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 07/10] x86/ibt: Add paranoid FineIBT mode Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 18:07 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 09/10] x86/ibt: Implement FineIBT-BHI mitigation Peter Zijlstra
` (3 subsequent siblings)
11 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/cfi.h | 4 +
arch/x86/lib/Makefile | 3
arch/x86/lib/bhi.S | 146 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 152 insertions(+), 1 deletion(-)
--- a/arch/x86/include/asm/cfi.h
+++ b/arch/x86/include/asm/cfi.h
@@ -102,6 +102,10 @@ enum cfi_mode {
extern enum cfi_mode cfi_mode;
extern bool cfi_warn;
+typedef u8 bhi_thunk[32];
+extern bhi_thunk __bhi_args[];
+extern bhi_thunk __bhi_args_end[];
+
struct pt_regs;
#ifdef CONFIG_CFI_CLANG
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -66,5 +66,6 @@ endif
lib-y += clear_page_64.o copy_page_64.o
lib-y += memmove_64.o memset_64.o
lib-y += copy_user_64.o copy_user_uncached_64.o
- lib-y += cmpxchg16b_emu.o
+ lib-y += cmpxchg16b_emu.o
+ lib-y += bhi.o
endif
--- /dev/null
+++ b/arch/x86/lib/bhi.S
@@ -0,0 +1,146 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/linkage.h>
+#include <asm/unwind_hints.h>
+#include <asm/nospec-branch.h>
+
+/*
+ * Notably, the FineIBT preamble calling these will have ZF set and r10 zero.
+ *
+ * The very last element is in fact larger than 32 bytes, but since its the
+ * last element, this does not matter,
+ *
+ * There are 2 #UD sites, located between 0,1-2,3 and 4,5-6,7 such that they
+ * can be reached using Jcc.d8, these elements (1 and 5) have sufficiently
+ * big alignment holes for this to not stagger the array.
+ */
+
+.pushsection .noinstr.text, "ax"
+
+ .align 32
+SYM_CODE_START(__bhi_args)
+
+#ifdef CONFIG_FINEIBT_BHI
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_0, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_1
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_1, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_1
+ cmovne %r10, %rdi
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 8
+ ANNOTATE_REACHABLE
+.Lud_1: ud2
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_2, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_1
+ cmovne %r10, %rdi
+ cmovne %r10, %rsi
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_3, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_1
+ cmovne %r10, %rdi
+ cmovne %r10, %rsi
+ cmovne %r10, %rdx
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_4, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_2
+ cmovne %r10, %rdi
+ cmovne %r10, %rsi
+ cmovne %r10, %rdx
+ cmovne %r10, %rcx
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_5, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_2
+ cmovne %r10, %rdi
+ cmovne %r10, %rsi
+ cmovne %r10, %rdx
+ cmovne %r10, %rcx
+ cmovne %r10, %r8
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 8
+ ANNOTATE_REACHABLE
+.Lud_2: ud2
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_6, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_2
+ cmovne %r10, %rdi
+ cmovne %r10, %rsi
+ cmovne %r10, %rdx
+ cmovne %r10, %rcx
+ cmovne %r10, %r8
+ cmovne %r10, %r9
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_7, SYM_L_LOCAL)
+ ANNOTATE_NOENDBR
+ UNWIND_HINT_FUNC
+ jne .Lud_2
+ cmovne %r10, %rdi
+ cmovne %r10, %rsi
+ cmovne %r10, %rdx
+ cmovne %r10, %rcx
+ cmovne %r10, %r8
+ cmovne %r10, %r9
+ cmovne %r10, %rsp
+ ANNOTATE_UNRET_SAFE
+ ret
+ int3
+
+#endif /* CONFIG_FINEIBT_BHI */
+
+ .align 32
+SYM_INNER_LABEL(__bhi_args_end, SYM_L_GLOBAL)
+ ANNOTATE_NOENDBR
+SYM_CODE_END(__bhi_args)
+
+.popsection
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 08/10] x86: BHI stubs
2025-02-19 16:21 ` [PATCH v3 08/10] x86: BHI stubs Peter Zijlstra
@ 2025-02-19 18:07 ` Kees Cook
2025-02-19 18:07 ` Peter Zijlstra
0 siblings, 1 reply; 38+ messages in thread
From: Kees Cook @ 2025-02-19 18:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:15PM +0100, Peter Zijlstra wrote:
> --- a/arch/x86/lib/Makefile
> +++ b/arch/x86/lib/Makefile
> @@ -66,5 +66,6 @@ endif
> lib-y += clear_page_64.o copy_page_64.o
> lib-y += memmove_64.o memset_64.o
> lib-y += copy_user_64.o copy_user_uncached_64.o
> - lib-y += cmpxchg16b_emu.o
> + lib-y += cmpxchg16b_emu.o
> + lib-y += bhi.o
> endif
Unrelated whitespace fix?
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 08/10] x86: BHI stubs
2025-02-19 18:07 ` Kees Cook
@ 2025-02-19 18:07 ` Peter Zijlstra
0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 18:07 UTC (permalink / raw)
To: Kees Cook
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 10:07:04AM -0800, Kees Cook wrote:
> On Wed, Feb 19, 2025 at 05:21:15PM +0100, Peter Zijlstra wrote:
> > --- a/arch/x86/lib/Makefile
> > +++ b/arch/x86/lib/Makefile
> > @@ -66,5 +66,6 @@ endif
> > lib-y += clear_page_64.o copy_page_64.o
> > lib-y += memmove_64.o memset_64.o
> > lib-y += copy_user_64.o copy_user_uncached_64.o
> > - lib-y += cmpxchg16b_emu.o
> > + lib-y += cmpxchg16b_emu.o
> > + lib-y += bhi.o
> > endif
>
> Unrelated whitespace fix?
Yes, I could not leave the cmpxchg16b thing mis-aligned :-)
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 09/10] x86/ibt: Implement FineIBT-BHI mitigation
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (7 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 08/10] x86: BHI stubs Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 18:11 ` Kees Cook
2025-02-19 16:21 ` [PATCH v3 10/10] x86/ibt: Optimize fineibt-bhi arity 1 case Peter Zijlstra
` (2 subsequent siblings)
11 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
While WAIT_FOR_ENDBR is specified to be a full speculation stop; it
has been shown that some implementations are 'leaky' to such an extend
that speculation can escape even the FineIBT preamble.
To deal with this, add additional hardening to the FineIBT preamble.
Notably, using a new LLVM feature:
https://github.com/llvm/llvm-project/commit/e223485c9b38a5579991b8cebb6a200153eee245
which encodes the number of arguments in the kCFI preamble's register.
Using this register<->arity mapping, have the FineIBT preamble CALL
into a stub clobbering the relevant argument registers in the
speculative case.
(This is where Scott goes and gives more details...)
Suggested-by: Scott Constable <scott.d.constable@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
Makefile | 3 +
arch/x86/Kconfig | 8 +++
arch/x86/include/asm/cfi.h | 6 ++
arch/x86/kernel/alternative.c | 101 +++++++++++++++++++++++++++++++++++++-----
arch/x86/net/bpf_jit_comp.c | 29 ++++++++----
5 files changed, 128 insertions(+), 19 deletions(-)
--- a/Makefile
+++ b/Makefile
@@ -1014,6 +1014,9 @@ CC_FLAGS_CFI := -fsanitize=kcfi
ifdef CONFIG_CFI_ICALL_NORMALIZE_INTEGERS
CC_FLAGS_CFI += -fsanitize-cfi-icall-experimental-normalize-integers
endif
+ifdef CONFIG_FINEIBT_BHI
+ CC_FLAGS_CFI += -fsanitize-kcfi-arity
+endif
ifdef CONFIG_RUST
# Always pass -Zsanitizer-cfi-normalize-integers as CONFIG_RUST selects
# CONFIG_CFI_ICALL_NORMALIZE_INTEGERS.
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2473,6 +2473,10 @@ config CC_HAS_RETURN_THUNK
config CC_HAS_ENTRY_PADDING
def_bool $(cc-option,-fpatchable-function-entry=16,16)
+config CC_HAS_KCFI_ARITY
+ def_bool $(cc-option,-fsanitize=kcfi -fsanitize-kcfi-arity)
+ depends on CC_IS_CLANG && !RUST
+
config FUNCTION_PADDING_CFI
int
default 59 if FUNCTION_ALIGNMENT_64B
@@ -2498,6 +2502,10 @@ config FINEIBT
depends on X86_KERNEL_IBT && CFI_CLANG && MITIGATION_RETPOLINE
select CALL_PADDING
+config FINEIBT_BHI
+ def_bool y
+ depends on FINEIBT && CC_HAS_KCFI_ARITY
+
config HAVE_CALL_THUNKS
def_bool y
depends on CC_HAS_ENTRY_PADDING && MITIGATION_RETHUNK && OBJTOOL
--- a/arch/x86/include/asm/cfi.h
+++ b/arch/x86/include/asm/cfi.h
@@ -101,6 +101,7 @@ enum cfi_mode {
extern enum cfi_mode cfi_mode;
extern bool cfi_warn;
+extern bool cfi_bhi;
typedef u8 bhi_thunk[32];
extern bhi_thunk __bhi_args[];
@@ -130,6 +131,7 @@ static inline int cfi_get_offset(void)
#define cfi_get_offset cfi_get_offset
extern u32 cfi_get_func_hash(void *func);
+extern int cfi_get_func_arity(void *func);
#ifdef CONFIG_FINEIBT
extern bool decode_fineibt_insn(int ud_type, struct pt_regs *regs, unsigned long *target, u32 *type);
@@ -153,6 +155,10 @@ static inline u32 cfi_get_func_hash(void
{
return 0;
}
+static inline int cfi_get_func_arity(void *func)
+{
+ return 0;
+}
#endif /* CONFIG_CFI_CLANG */
#if HAS_KERNEL_IBT == 1
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -933,6 +933,7 @@ void __init_or_module apply_seal_endbr(s
enum cfi_mode cfi_mode __ro_after_init = __CFI_DEFAULT;
bool cfi_warn __ro_after_init = false;
+bool cfi_bhi __ro_after_init = false;
#ifdef CONFIG_CFI_CLANG
struct bpf_insn;
@@ -993,6 +994,21 @@ u32 cfi_get_func_hash(void *func)
return hash;
}
+
+int cfi_get_func_arity(void *func)
+{
+ bhi_thunk *target;
+ s32 disp;
+
+ if (cfi_mode != CFI_FINEIBT && !cfi_bhi)
+ return 0;
+
+ if (get_kernel_nofault(disp, func - 4))
+ return 0;
+
+ target = func + disp;
+ return target - __bhi_args;
+}
#endif
#ifdef CONFIG_FINEIBT
@@ -1050,6 +1066,12 @@ static __init int cfi_parse_cmdline(char
} else {
pr_err("Ignoring paranoid; depends on fineibt.\n");
}
+ } else if (!strcmp(str, "bhi")) {
+ if (cfi_mode == CFI_FINEIBT) {
+ cfi_bhi = true;
+ } else {
+ pr_err("Ignoring bhi; depends on fineibt.\n");
+ }
} else {
pr_err("Ignoring unknown cfi option (%s).", str);
}
@@ -1099,6 +1121,7 @@ asm( ".pushsection .rodata \n"
"fineibt_preamble_start: \n"
" endbr64 \n"
" subl $0x12345678, %r10d \n"
+ "fineibt_preamble_bhi: \n"
" jne fineibt_preamble_start+6 \n"
ASM_NOP3
"fineibt_preamble_end: \n"
@@ -1106,9 +1129,11 @@ asm( ".pushsection .rodata \n"
);
extern u8 fineibt_preamble_start[];
+extern u8 fineibt_preamble_bhi[];
extern u8 fineibt_preamble_end[];
#define fineibt_preamble_size (fineibt_preamble_end - fineibt_preamble_start)
+#define fineibt_preamble_bhi (fineibt_preamble_bhi - fineibt_preamble_start)
#define fineibt_preamble_ud 6
#define fineibt_preamble_hash 7
@@ -1181,13 +1206,16 @@ extern u8 fineibt_paranoid_end[];
#define fineibt_paranoid_ind (fineibt_paranoid_ind - fineibt_paranoid_start)
#define fineibt_paranoid_ud 0xd
-static u32 decode_preamble_hash(void *addr)
+static u32 decode_preamble_hash(void *addr, int *reg)
{
u8 *p = addr;
- /* b8 78 56 34 12 mov $0x12345678,%eax */
- if (p[0] == 0xb8)
+ /* b8+reg 78 56 34 12 movl $0x12345678,\reg */
+ if (p[0] >= 0xb8 && p[0] < 0xc0) {
+ if (reg)
+ *reg = p[0] - 0xb8;
return *(u32 *)(addr + 1);
+ }
return 0; /* invalid hash value */
}
@@ -1196,11 +1224,11 @@ static u32 decode_caller_hash(void *addr
{
u8 *p = addr;
- /* 41 ba 78 56 34 12 mov $0x12345678,%r10d */
+ /* 41 ba 88 a9 cb ed mov $(-0x12345678),%r10d */
if (p[0] == 0x41 && p[1] == 0xba)
return -*(u32 *)(addr + 2);
- /* e8 0c 78 56 34 12 jmp.d8 +12 */
+ /* e8 0c 88 a9 cb ed jmp.d8 +12 */
if (p[0] == JMP8_INSN_OPCODE && p[1] == fineibt_caller_jmp)
return -*(u32 *)(addr + 2);
@@ -1265,7 +1293,7 @@ static int cfi_rand_preamble(s32 *start,
void *addr = (void *)s + *s;
u32 hash;
- hash = decode_preamble_hash(addr);
+ hash = decode_preamble_hash(addr, NULL);
if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
addr, addr, 5, addr))
return -EINVAL;
@@ -1283,6 +1311,7 @@ static int cfi_rewrite_preamble(s32 *sta
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
+ int arity;
u32 hash;
/*
@@ -1293,7 +1322,7 @@ static int cfi_rewrite_preamble(s32 *sta
if (!is_endbr(addr + 16))
continue;
- hash = decode_preamble_hash(addr);
+ hash = decode_preamble_hash(addr, &arity);
if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
addr, addr, 5, addr))
return -EINVAL;
@@ -1301,6 +1330,19 @@ static int cfi_rewrite_preamble(s32 *sta
text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size);
WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678);
text_poke_early(addr + fineibt_preamble_hash, &hash, 4);
+
+ WARN_ONCE(!IS_ENABLED(CONFIG_FINEIBT_BHI) && arity,
+ "kCFI preamble has wrong register at: %pS %*ph\n",
+ addr, 5, addr);
+
+ if (!cfi_bhi || !arity)
+ continue;
+
+ text_poke_early(addr + fineibt_preamble_bhi,
+ text_gen_insn(CALL_INSN_OPCODE,
+ addr + fineibt_preamble_bhi,
+ __bhi_args[arity]),
+ CALL_INSN_SIZE);
}
return 0;
@@ -1468,8 +1510,9 @@ static void __apply_fineibt(s32 *start_r
cfi_rewrite_endbr(start_cfi, end_cfi);
if (builtin) {
- pr_info("Using %sFineIBT CFI\n",
- cfi_paranoid ? "paranoid " : "");
+ pr_info("Using %sFineIBT%s CFI\n",
+ cfi_paranoid ? "paranoid " : "",
+ cfi_bhi ? "+BHI" : "");
}
return;
@@ -1520,7 +1563,7 @@ static void poison_cfi(void *addr)
/*
* kCFI prefix should start with a valid hash.
*/
- if (!decode_preamble_hash(addr))
+ if (!decode_preamble_hash(addr, NULL))
break;
/*
@@ -1570,6 +1613,41 @@ static bool decode_fineibt_preamble(int
}
/*
+ * regs->ip points to one of the UD2 in __bhi_args[].
+ */
+static bool decode_fineibt_bhi(int ud_type, struct pt_regs *regs,
+ unsigned long *target, u32 *type)
+{
+ unsigned long addr;
+ u32 hash;
+
+ /*
+ * Fetch the return address from the stack, this points to the
+ * FineIBT preamble. Since the CALL instruction is in the 5 last
+ * bytes of the preamble, the return address is in fact the target
+ * address.
+ */
+ __get_kernel_nofault(&addr, regs->sp, unsigned long, Efault);
+ *target = addr;
+
+ addr -= fineibt_preamble_size;
+ if (!exact_endbr((void *)addr))
+ return false;
+
+ __get_kernel_nofault(&hash, addr + fineibt_preamble_hash, u32, Efault);
+ *type = (u32)regs->r10 + hash;
+
+ /*
+ * The UD2 sites are constructed with a RET immediately following,
+ * as such the non-fatal case can use the regular fixup.
+ */
+ return true;
+
+Efault:
+ return false;
+}
+
+/*
* regs->ip points to a LOCK Jcc.d8 instruction from the fineibt_paranoid_start[]
* sequence.
*/
@@ -1598,6 +1676,9 @@ bool decode_fineibt_insn(int ud_type, st
{
if (ud_type == BUG_LOCK)
return decode_fineibt_paranoid(ud_type, regs, target, type);
+ if (regs->ip > (unsigned long)__bhi_args &&
+ regs->ip < (unsigned long)__bhi_args_end)
+ return decode_fineibt_bhi(ud_type, regs, target, type);
return decode_fineibt_preamble(ud_type, regs, target, type);
}
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -410,15 +410,20 @@ static void emit_nops(u8 **pprog, int le
* Emit the various CFI preambles, see asm/cfi.h and the comments about FineIBT
* in arch/x86/kernel/alternative.c
*/
+static int emit_call(u8 **prog, void *func, void *ip);
-static void emit_fineibt(u8 **pprog, u32 hash)
+static void emit_fineibt(u8 **pprog, u8 *ip, u32 hash, int arity)
{
u8 *prog = *pprog;
EMIT_ENDBR();
EMIT3_off32(0x41, 0x81, 0xea, hash); /* subl $hash, %r10d */
- EMIT2(0x75, 0xf9); /* jne.d8 .-7 */
- EMIT3(0x0f, 0x1f, 0x00); /* nop3 */
+ if (cfi_bhi) {
+ emit_call(&prog, __bhi_args[arity], ip + 11);
+ } else {
+ EMIT2(0x75, 0xf9); /* jne.d8 .-7 */
+ EMIT3(0x0f, 0x1f, 0x00); /* nop3 */
+ }
EMIT_ENDBR_POISON();
*pprog = prog;
@@ -447,13 +452,13 @@ static void emit_kcfi(u8 **pprog, u32 ha
*pprog = prog;
}
-static void emit_cfi(u8 **pprog, u32 hash)
+static void emit_cfi(u8 **pprog, u8 *ip, u32 hash, int arity)
{
u8 *prog = *pprog;
switch (cfi_mode) {
case CFI_FINEIBT:
- emit_fineibt(&prog, hash);
+ emit_fineibt(&prog, ip, hash, arity);
break;
case CFI_KCFI:
@@ -504,13 +509,17 @@ static void emit_prologue_tail_call(u8 *
* bpf_tail_call helper will skip the first X86_TAIL_CALL_OFFSET bytes
* while jumping to another program
*/
-static void emit_prologue(u8 **pprog, u32 stack_depth, bool ebpf_from_cbpf,
+static void emit_prologue(u8 **pprog, u8 *ip, u32 stack_depth, bool ebpf_from_cbpf,
bool tail_call_reachable, bool is_subprog,
bool is_exception_cb)
{
u8 *prog = *pprog;
- emit_cfi(&prog, is_subprog ? cfi_bpf_subprog_hash : cfi_bpf_hash);
+ if (is_subprog) {
+ emit_cfi(&prog, ip, cfi_bpf_subprog_hash, 5);
+ } else {
+ emit_cfi(&prog, ip, cfi_bpf_hash, 1);
+ }
/* BPF trampoline can be made to work without these nops,
* but let's waste 5 bytes for now and optimize later
*/
@@ -1479,7 +1488,7 @@ static int do_jit(struct bpf_prog *bpf_p
detect_reg_usage(insn, insn_cnt, callee_regs_used);
- emit_prologue(&prog, stack_depth,
+ emit_prologue(&prog, image, stack_depth,
bpf_prog_was_classic(bpf_prog), tail_call_reachable,
bpf_is_subprog(bpf_prog), bpf_prog->aux->exception_cb);
/* Exception callback will clobber callee regs for its own use, and
@@ -3046,7 +3055,9 @@ static int __arch_prepare_bpf_trampoline
/*
* Indirect call for bpf_struct_ops
*/
- emit_cfi(&prog, cfi_get_func_hash(func_addr));
+ emit_cfi(&prog, image,
+ cfi_get_func_hash(func_addr),
+ cfi_get_func_arity(func_addr));
} else {
/*
* Direct-call fentry stub, as such it needs accounting for the
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 09/10] x86/ibt: Implement FineIBT-BHI mitigation
2025-02-19 16:21 ` [PATCH v3 09/10] x86/ibt: Implement FineIBT-BHI mitigation Peter Zijlstra
@ 2025-02-19 18:11 ` Kees Cook
0 siblings, 0 replies; 38+ messages in thread
From: Kees Cook @ 2025-02-19 18:11 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:16PM +0100, Peter Zijlstra wrote:
> While WAIT_FOR_ENDBR is specified to be a full speculation stop; it
> has been shown that some implementations are 'leaky' to such an extend
> that speculation can escape even the FineIBT preamble.
>
> To deal with this, add additional hardening to the FineIBT preamble.
>
> Notably, using a new LLVM feature:
>
> https://github.com/llvm/llvm-project/commit/e223485c9b38a5579991b8cebb6a200153eee245
>
> which encodes the number of arguments in the kCFI preamble's register.
>
> Using this register<->arity mapping, have the FineIBT preamble CALL
> into a stub clobbering the relevant argument registers in the
> speculative case.
>
> (This is where Scott goes and gives more details...)
>
> Suggested-by: Scott Constable <scott.d.constable@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v3 10/10] x86/ibt: Optimize fineibt-bhi arity 1 case
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (8 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 09/10] x86/ibt: Implement FineIBT-BHI mitigation Peter Zijlstra
@ 2025-02-19 16:21 ` Peter Zijlstra
2025-02-19 18:21 ` Kees Cook
2025-02-19 17:36 ` [PATCH v3 00/10] x86/ibt: FineIBT-BHI Kees Cook
2025-02-20 11:27 ` Peter Zijlstra
11 siblings, 1 reply; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-19 16:21 UTC (permalink / raw)
To: x86
Cc: linux-kernel, peterz, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
Saves a CALL to an out-of-line thunk for the common case of 1
argument.
Suggested-by: Scott Constable <scott.d.constable@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
arch/x86/include/asm/ibt.h | 4 ++
arch/x86/kernel/alternative.c | 61 +++++++++++++++++++++++++++++++++++-------
2 files changed, 56 insertions(+), 9 deletions(-)
--- a/arch/x86/include/asm/ibt.h
+++ b/arch/x86/include/asm/ibt.h
@@ -70,6 +70,10 @@ static inline bool __is_endbr(u32 val)
if (val == gen_endbr_poison())
return true;
+ /* See cfi_fineibt_bhi_preamble() */
+ if (IS_ENABLED(CONFIG_FINEIBT_BHI) && val == 0x001f0ff5)
+ return true;
+
val &= ~0x01000000U; /* ENDBR32 -> ENDBR64 */
return val == gen_endbr();
}
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1299,6 +1299,55 @@ static int cfi_rand_preamble(s32 *start,
return 0;
}
+static void cfi_fineibt_bhi_preamble(void *addr, int arity)
+{
+ bool warn = IS_ENABLED(CONFIG_CFI_PREMISSIVE) || cfi_warn;
+
+ if (!arity)
+ return;
+
+ if (!warn && arity == 1) {
+ /*
+ * Crazy scheme to allow arity-1 inline:
+ *
+ * __cfi_foo:
+ * 0: f3 0f 1e fa endbr64
+ * 4: 41 81 <ea> 78 56 34 12 sub 0x12345678, %r10d
+ * b: 49 0f 45 fa cmovne %r10, %rdi
+ * f: 75 f5 jne __cfi_foo+6
+ * 11: 0f 1f 00 nopl (%rax)
+ *
+ * Code that direct calls to foo()+0, decodes the tail end as:
+ *
+ * foo:
+ * 0: f5 cmc
+ * 1: 0f 1f 00 nopl (%rax)
+ *
+ * which clobbers CF, but does not affect anything ABI
+ * wise.
+ *
+ * Notably, this scheme is incompatible with permissive CFI
+ * because the cmov is unconditional and RDI will have been
+ * clobbered.
+ */
+ const u8 magic[9] = {
+ 0x49, 0x0f, 0x45, 0xfa,
+ 0x75, 0xf5,
+ BYTES_NOP3,
+ };
+
+ text_poke_early(addr + fineibt_preamble_bhi, magic, 9);
+
+ return;
+ }
+
+ text_poke_early(addr + fineibt_preamble_bhi,
+ text_gen_insn(CALL_INSN_OPCODE,
+ addr + fineibt_preamble_bhi,
+ __bhi_args[arity]),
+ CALL_INSN_SIZE);
+}
+
static int cfi_rewrite_preamble(s32 *start, s32 *end)
{
s32 *s;
@@ -1329,14 +1378,8 @@ static int cfi_rewrite_preamble(s32 *sta
"kCFI preamble has wrong register at: %pS %*ph\n",
addr, 5, addr);
- if (!cfi_bhi || !arity)
- continue;
-
- text_poke_early(addr + fineibt_preamble_bhi,
- text_gen_insn(CALL_INSN_OPCODE,
- addr + fineibt_preamble_bhi,
- __bhi_args[arity]),
- CALL_INSN_SIZE);
+ if (cfi_bhi)
+ cfi_fineibt_bhi_preamble(addr, arity);
}
return 0;
@@ -1349,7 +1392,7 @@ static void cfi_rewrite_endbr(s32 *start
for (s = start; s < end; s++) {
void *addr = (void *)s + *s;
- if (!is_endbr(addr + 16))
+ if (!exact_endbr(addr + 16))
continue;
poison_endbr(addr + 16);
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 10/10] x86/ibt: Optimize fineibt-bhi arity 1 case
2025-02-19 16:21 ` [PATCH v3 10/10] x86/ibt: Optimize fineibt-bhi arity 1 case Peter Zijlstra
@ 2025-02-19 18:21 ` Kees Cook
0 siblings, 0 replies; 38+ messages in thread
From: Kees Cook @ 2025-02-19 18:21 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:17PM +0100, Peter Zijlstra wrote:
> Saves a CALL to an out-of-line thunk for the common case of 1
> argument.
>
> Suggested-by: Scott Constable <scott.d.constable@intel.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> arch/x86/include/asm/ibt.h | 4 ++
> arch/x86/kernel/alternative.c | 61 +++++++++++++++++++++++++++++++++++-------
> 2 files changed, 56 insertions(+), 9 deletions(-)
>
> --- a/arch/x86/include/asm/ibt.h
> +++ b/arch/x86/include/asm/ibt.h
> @@ -70,6 +70,10 @@ static inline bool __is_endbr(u32 val)
> if (val == gen_endbr_poison())
> return true;
>
> + /* See cfi_fineibt_bhi_preamble() */
> + if (IS_ENABLED(CONFIG_FINEIBT_BHI) && val == 0x001f0ff5)
> + return true;
Does this magic value need some better documentation?
Reviewed-by: Kees Cook <kees@kernel.org>
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v3 00/10] x86/ibt: FineIBT-BHI
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (9 preceding siblings ...)
2025-02-19 16:21 ` [PATCH v3 10/10] x86/ibt: Optimize fineibt-bhi arity 1 case Peter Zijlstra
@ 2025-02-19 17:36 ` Kees Cook
2025-02-20 11:27 ` Peter Zijlstra
11 siblings, 0 replies; 38+ messages in thread
From: Kees Cook @ 2025-02-19 17:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: x86, linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, alexei.starovoitov, mhiramat, jmill
On Wed, Feb 19, 2025 at 05:21:07PM +0100, Peter Zijlstra wrote:
> Also note that LKDTM's CFI_FORWARD_PROTO test will do a double splat for
> paranoid in warn/permissive mode, since both the caller and callee hash check
> will fail.
Hah! I will declare that as Working As Intended, though I guess it is a
bit noisy. But given permissive is mainly a debugging feature, I think
this is fine.
--
Kees Cook
^ permalink raw reply [flat|nested] 38+ messages in thread* Re: [PATCH v3 00/10] x86/ibt: FineIBT-BHI
2025-02-19 16:21 [PATCH v3 00/10] x86/ibt: FineIBT-BHI Peter Zijlstra
` (10 preceding siblings ...)
2025-02-19 17:36 ` [PATCH v3 00/10] x86/ibt: FineIBT-BHI Kees Cook
@ 2025-02-20 11:27 ` Peter Zijlstra
11 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2025-02-20 11:27 UTC (permalink / raw)
To: x86
Cc: linux-kernel, alyssa.milburn, scott.d.constable, joao,
andrew.cooper3, jpoimboe, jose.marchesi, hjl.tools, ndesaulniers,
samitolvanen, nathan, ojeda, kees, alexei.starovoitov, mhiramat,
jmill
On Wed, Feb 19, 2025 at 05:21:07PM +0100, Peter Zijlstra wrote:
> Hi all!
>
> Having landed much of the previous series in tip/x86/core, I was hoping for an
> easy time landing the final two patches.. alas.
>
> This whole FineIBT SYSCALL pivot thing showed up, which got me to develop the
> paranoid FineIBT variant. And because testing I added a cfi=warn knob, and then
> I migrated bhi to an option etc..
>
> Then just as I was to post this stuff, Scott out-nerds me with a whole new
> instruction sequence. Which got me to rework the entire pile once again, and
> it is now another 10 patches again :/
>
> Anyway, be warned, Scott loves overlapping instructions.
>
> This is tested with:
>
> cfi=fineibt,warn
> cfi=fineibt,warn,paranoid
> cfi=fineibt,warn,bhi
> cfi=fineibt,warn,paranoid,bhi
> cfi=fineibt,paranoid,bhi
>
> Also note that LKDTM's CFI_FORWARD_PROTO test will do a double splat for
> paranoid in warn/permissive mode, since both the caller and callee hash check
> will fail.
>
> Also available at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/fineibt-bhi2
I've updated this tree with the latest version of the patches.
Notably, Kees, I've not taken your Reviewed-by tag for patches that saw
significant rework -- even when in response to your own feedback :)
(ud_type propagation is now gone)
I'll repost in a few days, to give people a chance to catch up.
^ permalink raw reply [flat|nested] 38+ messages in thread