[PATCH v2 00/13] objtool: Detect and warn about indirect calls in _

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
@ 2025-04-30 11:07 Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 01/13] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra
                   ` (14 more replies)
  0 siblings, 15 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Hi!

On kCFI (CONFIG_CFI_CLANG=y) builds all indirect calls should have the CFI
check on (with very few exceptions). Not having the CFI checks undermines the
protection provided by CFI and will make these sites candidates for people
wanting to steal your cookies.

Specifically the ABI changes are so that doing indirect calls without the CFI
magic, to a CFI adorned function is not compatible (although it happens to work
for some setups, it very much does not for FineIBT).

Rust people tripped over this the other day, since their 'core' happened to
have some no_sanitize(kcfi) bits in, which promptly exploded when ran with
FineIBT on.

Since this is very much not a supported model -- on purpose, have objtool
detect and warn about such constructs.

This effort [1] found all existing [2] non-cfi indirect calls in the kernel.

Notably the KVM fastop emulation stuff -- which I've completely rewritten for
this version -- the generated code doesn't look horrific, but is slightly more
verbose. I'm running on the assumption that instruction emulation is not super
performance critical these days of zero VM-exit VMs etc.

KVM has another; the VMX interrupt injection stuff calls the IDT handler
directly.  Is there an alternative? Can we keep a table of Linux functions
slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?

HyperV hypercall page stuff, which I've previously suggested use direct calls,
and which I've now converted (after getting properly annoyed with that code).

Also available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/core

Changes since v1:

 - complete rewrite of the fastop stuff
 - HyperV tweaks (Michael)
 - objtool changes (Josh)

[1] https://lkml.kernel.org/r/20250410154556.GB9003@noisy.programming.kicks-ass.net
[2] https://lkml.kernel.org/r/20250410194334.GA3248459@google.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 01/13] x86/kvm/emulate: Implement test_cc() in C
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 02/13] x86/kvm/emulate: Introduce COP1 Peter Zijlstra
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Current test_cc() uses the fastop infrastructure to test flags using
SETcc instructions. However, int3_emulate_jcc() already fully
implements the flags->CC mapping, use that.

Removes a pile of gnarly asm.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |   20 +++++++++++++-------
 arch/x86/kvm/emulate.c               |   34 ++--------------------------------
 2 files changed, 15 insertions(+), 39 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -177,9 +177,9 @@ void int3_emulate_ret(struct pt_regs *re
 }
 
 static __always_inline
-void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp)
+bool __emulate_cc(unsigned long flags, u8 cc)
 {
-	static const unsigned long jcc_mask[6] = {
+	static const unsigned long cc_mask[6] = {
 		[0] = X86_EFLAGS_OF,
 		[1] = X86_EFLAGS_CF,
 		[2] = X86_EFLAGS_ZF,
@@ -192,15 +192,21 @@ void int3_emulate_jcc(struct pt_regs *re
 	bool match;
 
 	if (cc < 0xc) {
-		match = regs->flags & jcc_mask[cc >> 1];
+		match = flags & cc_mask[cc >> 1];
 	} else {
-		match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
-			((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
+		match = ((flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
+			((flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
 		if (cc >= 0xe)
-			match = match || (regs->flags & X86_EFLAGS_ZF);
+			match = match || (flags & X86_EFLAGS_ZF);
 	}
 
-	if ((match && !invert) || (!match && invert))
+	return (match && !invert) || (!match && invert);
+}
+
+static __always_inline
+void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp)
+{
+	if (__emulate_cc(regs->flags, cc))
 		ip += disp;
 
 	int3_emulate_jmp(regs, ip);
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -26,6 +26,7 @@
 #include <asm/debugreg.h>
 #include <asm/nospec-branch.h>
 #include <asm/ibt.h>
+#include <asm/text-patching.h>
 
 #include "x86.h"
 #include "tss.h"
@@ -416,31 +417,6 @@ static int fastop(struct x86_emulate_ctx
 	ON64(FOP3E(op##q, rax, rdx, cl)) \
 	FOP_END
 
-/* Special case for SETcc - 1 instruction per cc */
-#define FOP_SETCC(op) \
-	FOP_FUNC(op) \
-	#op " %al \n\t" \
-	FOP_RET(op)
-
-FOP_START(setcc)
-FOP_SETCC(seto)
-FOP_SETCC(setno)
-FOP_SETCC(setc)
-FOP_SETCC(setnc)
-FOP_SETCC(setz)
-FOP_SETCC(setnz)
-FOP_SETCC(setbe)
-FOP_SETCC(setnbe)
-FOP_SETCC(sets)
-FOP_SETCC(setns)
-FOP_SETCC(setp)
-FOP_SETCC(setnp)
-FOP_SETCC(setl)
-FOP_SETCC(setnl)
-FOP_SETCC(setle)
-FOP_SETCC(setnle)
-FOP_END;
-
 FOP_START(salc)
 FOP_FUNC(salc)
 "pushf; sbb %al, %al; popf \n\t"
@@ -1068,13 +1044,7 @@ static int em_bsr_c(struct x86_emulate_c
 
 static __always_inline u8 test_cc(unsigned int condition, unsigned long flags)
 {
-	u8 rc;
-	void (*fop)(void) = (void *)em_setcc + FASTOP_SIZE * (condition & 0xf);
-
-	flags = (flags & EFLAGS_MASK) | X86_EFLAGS_IF;
-	asm("push %[flags]; popf; " CALL_NOSPEC
-	    : "=a"(rc), ASM_CALL_CONSTRAINT : [thunk_target]"r"(fop), [flags]"r"(flags));
-	return rc;
+	return __emulate_cc(flags, condition & 0xf);
 }
 
 static void fetch_register_operand(struct operand *op)



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 02/13] x86/kvm/emulate: Introduce COP1
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 01/13] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 16:19   ` Josh Poimboeuf
  2025-04-30 11:07 ` [PATCH v2 03/13] x86/kvm/emulate: Introduce COP2 Peter Zijlstra
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Replace fastops with C-ops. There are a bunch of problems with the
current fastop infrastructure, most all related to their special
calling convention, which bypasses the normal C-ABI.

There are two immediate problems with this at present:

 - it relies on RET preserving EFLAGS; whereas C-ABI does not.

 - it circumvents compiler based control-flow-integrity checking
   because its all asm magic.

The first is a problem for some mitigations where the
x86_indirect_return_thunk needs to include non-trivial work that
clobbers EFLAGS (eg. the Skylake call depth tracking thing).

The second is a problem because it presents a 'naked' indirect call on
kCFI builds, making it a prime target for control flow hijacking.

Additionally, given that a large chunk of virtual machine performance
relies on absolutely avoiding vmexit these days, this emulation stuff
just isn't that critical for performance anymore.

As such, replace the fastop calls with a normal C function using the
'execute' member.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   69 ++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 57 insertions(+), 12 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -267,11 +267,56 @@ static void invalidate_registers(struct
 		     X86_EFLAGS_PF|X86_EFLAGS_CF)
 
 #ifdef CONFIG_X86_64
-#define ON64(x) x
+#define ON64(x...) x
 #else
 #define ON64(x)
 #endif
 
+#define COP_START(op) \
+static int em_##op(struct x86_emulate_ctxt *ctxt) \
+{ \
+	unsigned long flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF; \
+	int bytes = 1, ok = 1; \
+	if (!(ctxt->d & ByteOp)) \
+		bytes = ctxt->dst.bytes; \
+	switch (bytes) {
+
+#define COP_ASM(str) \
+		asm("push %[flags]; popf \n\t" \
+		    "10: " str \
+		    "pushf; pop %[flags] \n\t" \
+		    "11: \n\t" \
+		    : "+a" (ctxt->dst.val), \
+		      "+d" (ctxt->src.val), \
+		      [flags] "+D" (flags), \
+		      "+S" (ok) \
+		    : "c" (ctxt->src2.val))
+
+#define COP_ASM1(op, dst) \
+		COP_ASM(#op " %%" #dst " \n\t")
+
+#define COP_ASM1_EX(op, dst) \
+		COP_ASM(#op " %%" #dst " \n\t" \
+			_ASM_EXTABLE_TYPE_REG(10b, 11f, EX_TYPE_ZERO_REG, %%esi))
+
+#define COP_ASM2(op, dst, src) \
+		COP_ASM(#op " %" #src ", %" #dst " \n\t")
+
+#define COP_END \
+	} \
+	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); \
+	return !ok ? emulate_de(ctxt) : X86EMUL_CONTINUE; \
+}
+
+/* 1-operand, using "a" (dst) */
+#define COP1(op) \
+	COP_START(op) \
+	case 1: COP_ASM1(op##b, al); break; \
+	case 2: COP_ASM1(op##w, ax); break; \
+	case 4: COP_ASM1(op##l, eax); break; \
+	ON64(case 8: COP_ASM1(op##q, rax); break;) \
+	COP_END
+
 /*
  * fastop functions have a special calling convention:
  *
@@ -1002,10 +1047,10 @@ FASTOP3WCL(shrd);
 
 FASTOP2W(imul);
 
-FASTOP1(not);
-FASTOP1(neg);
-FASTOP1(inc);
-FASTOP1(dec);
+COP1(not);
+COP1(neg);
+COP1(inc);
+COP1(dec);
 
 FASTOP2CL(rol);
 FASTOP2CL(ror);
@@ -4021,8 +4066,8 @@ static const struct opcode group2[] = {
 static const struct opcode group3[] = {
 	F(DstMem | SrcImm | NoWrite, em_test),
 	F(DstMem | SrcImm | NoWrite, em_test),
-	F(DstMem | SrcNone | Lock, em_not),
-	F(DstMem | SrcNone | Lock, em_neg),
+	I(DstMem | SrcNone | Lock, em_not),
+	I(DstMem | SrcNone | Lock, em_neg),
 	F(DstXacc | Src2Mem, em_mul_ex),
 	F(DstXacc | Src2Mem, em_imul_ex),
 	F(DstXacc | Src2Mem, em_div_ex),
@@ -4030,14 +4075,14 @@ static const struct opcode group3[] = {
 };
 
 static const struct opcode group4[] = {
-	F(ByteOp | DstMem | SrcNone | Lock, em_inc),
-	F(ByteOp | DstMem | SrcNone | Lock, em_dec),
+	I(ByteOp | DstMem | SrcNone | Lock, em_inc),
+	I(ByteOp | DstMem | SrcNone | Lock, em_dec),
 	N, N, N, N, N, N,
 };
 
 static const struct opcode group5[] = {
-	F(DstMem | SrcNone | Lock,		em_inc),
-	F(DstMem | SrcNone | Lock,		em_dec),
+	I(DstMem | SrcNone | Lock,		em_inc),
+	I(DstMem | SrcNone | Lock,		em_dec),
 	I(SrcMem | NearBranch | IsBranch,       em_call_near_abs),
 	I(SrcMemFAddr | ImplicitOps | IsBranch, em_call_far),
 	I(SrcMem | NearBranch | IsBranch,       em_jmp_abs),
@@ -4237,7 +4282,7 @@ static const struct opcode opcode_table[
 	/* 0x38 - 0x3F */
 	F6ALU(NoWrite, em_cmp), N, N,
 	/* 0x40 - 0x4F */
-	X8(F(DstReg, em_inc)), X8(F(DstReg, em_dec)),
+	X8(I(DstReg, em_inc)), X8(I(DstReg, em_dec)),
 	/* 0x50 - 0x57 */
 	X8(I(SrcReg | Stack, em_push)),
 	/* 0x58 - 0x5F */



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 03/13] x86/kvm/emulate: Introduce COP2
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 01/13] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 02/13] x86/kvm/emulate: Introduce COP1 Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 04/13] x86/kvm/emulate: Introduce COP2R Peter Zijlstra
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Replace the FASTOP2 instructions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   87 +++++++++++++++++++++++++++----------------------
 1 file changed, 48 insertions(+), 39 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -300,7 +300,7 @@ static int em_##op(struct x86_emulate_ct
 			_ASM_EXTABLE_TYPE_REG(10b, 11f, EX_TYPE_ZERO_REG, %%esi))
 
 #define COP_ASM2(op, dst, src) \
-		COP_ASM(#op " %" #src ", %" #dst " \n\t")
+		COP_ASM(#op " %%" #src ", %%" #dst " \n\t")
 
 #define COP_END \
 	} \
@@ -317,6 +317,15 @@ static int em_##op(struct x86_emulate_ct
 	ON64(case 8: COP_ASM1(op##q, rax); break;) \
 	COP_END
 
+/* 2-operand, using "a" (dst), "d" (src) */
+#define COP2(op) \
+	COP_START(op) \
+	case 1: COP_ASM2(op##b, al, dl); break; \
+	case 2: COP_ASM2(op##w, ax, dx); break; \
+	case 4: COP_ASM2(op##l, eax, edx); break; \
+	ON64(case 8: COP_ASM2(op##q, rax, rdx); break;) \
+	COP_END
+
 /*
  * fastop functions have a special calling convention:
  *
@@ -1027,15 +1036,16 @@ static int read_descriptor(struct x86_em
 	return rc;
 }
 
-FASTOP2(add);
-FASTOP2(or);
-FASTOP2(adc);
-FASTOP2(sbb);
-FASTOP2(and);
-FASTOP2(sub);
-FASTOP2(xor);
-FASTOP2(cmp);
-FASTOP2(test);
+COP2(add);
+COP2(or);
+COP2(adc);
+COP2(sbb);
+COP2(and);
+COP2(sub);
+COP2(xor);
+COP2(cmp);
+COP2(test);
+COP2(xadd);
 
 FASTOP1SRC2(mul, mul_ex);
 FASTOP1SRC2(imul, imul_ex);
@@ -1067,7 +1077,6 @@ FASTOP2W(bts);
 FASTOP2W(btr);
 FASTOP2W(btc);
 
-FASTOP2(xadd);
 
 FASTOP2R(cmp, cmp_r);
 
@@ -2304,7 +2313,7 @@ static int em_cmpxchg(struct x86_emulate
 	ctxt->dst.val = reg_read(ctxt, VCPU_REGS_RAX);
 	ctxt->src.orig_val = ctxt->src.val;
 	ctxt->src.val = ctxt->dst.orig_val;
-	fastop(ctxt, em_cmp);
+	em_cmp(ctxt);
 
 	if (ctxt->eflags & X86_EFLAGS_ZF) {
 		/* Success: write back to memory; no update of EAX */
@@ -3069,7 +3078,7 @@ static int em_das(struct x86_emulate_ctx
 	ctxt->src.type = OP_IMM;
 	ctxt->src.val = 0;
 	ctxt->src.bytes = 1;
-	fastop(ctxt, em_or);
+	em_or(ctxt);
 	ctxt->eflags &= ~(X86_EFLAGS_AF | X86_EFLAGS_CF);
 	if (cf)
 		ctxt->eflags |= X86_EFLAGS_CF;
@@ -3095,7 +3104,7 @@ static int em_aam(struct x86_emulate_ctx
 	ctxt->src.type = OP_IMM;
 	ctxt->src.val = 0;
 	ctxt->src.bytes = 1;
-	fastop(ctxt, em_or);
+	em_or(ctxt);
 
 	return X86EMUL_CONTINUE;
 }
@@ -3113,7 +3122,7 @@ static int em_aad(struct x86_emulate_ctx
 	ctxt->src.type = OP_IMM;
 	ctxt->src.val = 0;
 	ctxt->src.bytes = 1;
-	fastop(ctxt, em_or);
+	em_or(ctxt);
 
 	return X86EMUL_CONTINUE;
 }
@@ -3998,9 +4007,9 @@ static int check_perm_out(struct x86_emu
 #define I2bvIP(_f, _e, _i, _p) \
 	IIP((_f) | ByteOp, _e, _i, _p), IIP(_f, _e, _i, _p)
 
-#define F6ALU(_f, _e) F2bv((_f) | DstMem | SrcReg | ModRM, _e),		\
-		F2bv(((_f) | DstReg | SrcMem | ModRM) & ~Lock, _e),	\
-		F2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
+#define I6ALU(_f, _e) I2bv((_f) | DstMem | SrcReg | ModRM, _e),		\
+		I2bv(((_f) | DstReg | SrcMem | ModRM) & ~Lock, _e),	\
+		I2bv(((_f) & ~Lock) | DstAcc | SrcImm, _e)
 
 static const struct opcode group7_rm0[] = {
 	N,
@@ -4038,14 +4047,14 @@ static const struct opcode group7_rm7[]
 };
 
 static const struct opcode group1[] = {
-	F(Lock, em_add),
-	F(Lock | PageTable, em_or),
-	F(Lock, em_adc),
-	F(Lock, em_sbb),
-	F(Lock | PageTable, em_and),
-	F(Lock, em_sub),
-	F(Lock, em_xor),
-	F(NoWrite, em_cmp),
+	I(Lock, em_add),
+	I(Lock | PageTable, em_or),
+	I(Lock, em_adc),
+	I(Lock, em_sbb),
+	I(Lock | PageTable, em_and),
+	I(Lock, em_sub),
+	I(Lock, em_xor),
+	I(NoWrite, em_cmp),
 };
 
 static const struct opcode group1A[] = {
@@ -4064,8 +4073,8 @@ static const struct opcode group2[] = {
 };
 
 static const struct opcode group3[] = {
-	F(DstMem | SrcImm | NoWrite, em_test),
-	F(DstMem | SrcImm | NoWrite, em_test),
+	I(DstMem | SrcImm | NoWrite, em_test),
+	I(DstMem | SrcImm | NoWrite, em_test),
 	I(DstMem | SrcNone | Lock, em_not),
 	I(DstMem | SrcNone | Lock, em_neg),
 	F(DstXacc | Src2Mem, em_mul_ex),
@@ -4258,29 +4267,29 @@ static const struct instr_dual instr_dua
 
 static const struct opcode opcode_table[256] = {
 	/* 0x00 - 0x07 */
-	F6ALU(Lock, em_add),
+	I6ALU(Lock, em_add),
 	I(ImplicitOps | Stack | No64 | Src2ES, em_push_sreg),
 	I(ImplicitOps | Stack | No64 | Src2ES, em_pop_sreg),
 	/* 0x08 - 0x0F */
-	F6ALU(Lock | PageTable, em_or),
+	I6ALU(Lock | PageTable, em_or),
 	I(ImplicitOps | Stack | No64 | Src2CS, em_push_sreg),
 	N,
 	/* 0x10 - 0x17 */
-	F6ALU(Lock, em_adc),
+	I6ALU(Lock, em_adc),
 	I(ImplicitOps | Stack | No64 | Src2SS, em_push_sreg),
 	I(ImplicitOps | Stack | No64 | Src2SS, em_pop_sreg),
 	/* 0x18 - 0x1F */
-	F6ALU(Lock, em_sbb),
+	I6ALU(Lock, em_sbb),
 	I(ImplicitOps | Stack | No64 | Src2DS, em_push_sreg),
 	I(ImplicitOps | Stack | No64 | Src2DS, em_pop_sreg),
 	/* 0x20 - 0x27 */
-	F6ALU(Lock | PageTable, em_and), N, N,
+	I6ALU(Lock | PageTable, em_and), N, N,
 	/* 0x28 - 0x2F */
-	F6ALU(Lock, em_sub), N, I(ByteOp | DstAcc | No64, em_das),
+	I6ALU(Lock, em_sub), N, I(ByteOp | DstAcc | No64, em_das),
 	/* 0x30 - 0x37 */
-	F6ALU(Lock, em_xor), N, N,
+	I6ALU(Lock, em_xor), N, N,
 	/* 0x38 - 0x3F */
-	F6ALU(NoWrite, em_cmp), N, N,
+	I6ALU(NoWrite, em_cmp), N, N,
 	/* 0x40 - 0x4F */
 	X8(I(DstReg, em_inc)), X8(I(DstReg, em_dec)),
 	/* 0x50 - 0x57 */
@@ -4306,7 +4315,7 @@ static const struct opcode opcode_table[
 	G(DstMem | SrcImm, group1),
 	G(ByteOp | DstMem | SrcImm | No64, group1),
 	G(DstMem | SrcImmByte, group1),
-	F2bv(DstMem | SrcReg | ModRM | NoWrite, em_test),
+	I2bv(DstMem | SrcReg | ModRM | NoWrite, em_test),
 	I2bv(DstMem | SrcReg | ModRM | Lock | PageTable, em_xchg),
 	/* 0x88 - 0x8F */
 	I2bv(DstMem | SrcReg | ModRM | Mov | PageTable, em_mov),
@@ -4329,7 +4338,7 @@ static const struct opcode opcode_table[
 	I2bv(SrcSI | DstDI | Mov | String | TwoMemOp, em_mov),
 	F2bv(SrcSI | DstDI | String | NoWrite | TwoMemOp, em_cmp_r),
 	/* 0xA8 - 0xAF */
-	F2bv(DstAcc | SrcImm | NoWrite, em_test),
+	I2bv(DstAcc | SrcImm | NoWrite, em_test),
 	I2bv(SrcAcc | DstDI | Mov | String, em_mov),
 	I2bv(SrcSI | DstAcc | Mov | String, em_mov),
 	F2bv(SrcAcc | DstDI | String | NoWrite, em_cmp_r),
@@ -4467,7 +4476,7 @@ static const struct opcode twobyte_table
 	I(DstReg | SrcMem | ModRM, em_bsr_c),
 	D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),
 	/* 0xC0 - 0xC7 */
-	F2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd),
+	I2bv(DstMem | SrcReg | ModRM | SrcWrite | Lock, em_xadd),
 	N, ID(0, &instr_dual_0f_c3),
 	N, N, N, GD(0, &group9),
 	/* 0xC8 - 0xCF */



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 04/13] x86/kvm/emulate: Introduce COP2R
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (2 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 03/13] x86/kvm/emulate: Introduce COP2 Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 05/13] x86/kvm/emulate: Introduce COP2W Peter Zijlstra
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Replace the FASTOP2R instruction.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -326,6 +326,15 @@ static int em_##op(struct x86_emulate_ct
 	ON64(case 8: COP_ASM2(op##q, rax, rdx); break;) \
 	COP_END
 
+/* 2-operand, reversed */
+#define COP2R(op, name) \
+	COP_START(name) \
+	case 1: COP_ASM2(op##b, dl, al); break; \
+	case 2: COP_ASM2(op##w, dx, ax); break; \
+	case 4: COP_ASM2(op##l, edx, eax); break; \
+	ON64(case 8: COP_ASM2(op##q, rdx, rax); break;) \
+	COP_END
+
 /*
  * fastop functions have a special calling convention:
  *
@@ -1077,8 +1086,7 @@ FASTOP2W(bts);
 FASTOP2W(btr);
 FASTOP2W(btc);
 
-
-FASTOP2R(cmp, cmp_r);
+COP2R(cmp, cmp_r);
 
 static int em_bsf_c(struct x86_emulate_ctxt *ctxt)
 {
@@ -4336,12 +4344,12 @@ static const struct opcode opcode_table[
 	I2bv(DstAcc | SrcMem | Mov | MemAbs, em_mov),
 	I2bv(DstMem | SrcAcc | Mov | MemAbs | PageTable, em_mov),
 	I2bv(SrcSI | DstDI | Mov | String | TwoMemOp, em_mov),
-	F2bv(SrcSI | DstDI | String | NoWrite | TwoMemOp, em_cmp_r),
+	I2bv(SrcSI | DstDI | String | NoWrite | TwoMemOp, em_cmp_r),
 	/* 0xA8 - 0xAF */
 	I2bv(DstAcc | SrcImm | NoWrite, em_test),
 	I2bv(SrcAcc | DstDI | Mov | String, em_mov),
 	I2bv(SrcSI | DstAcc | Mov | String, em_mov),
-	F2bv(SrcAcc | DstDI | String | NoWrite, em_cmp_r),
+	I2bv(SrcAcc | DstDI | String | NoWrite, em_cmp_r),
 	/* 0xB0 - 0xB7 */
 	X8(I(ByteOp | DstReg | SrcImm | Mov, em_mov)),
 	/* 0xB8 - 0xBF */



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 05/13] x86/kvm/emulate: Introduce COP2W
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (3 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 04/13] x86/kvm/emulate: Introduce COP2R Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 06/13] x86/kvm/emulate: Introduce COP2CL Peter Zijlstra
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Replace the FASTOP2W instructions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -335,6 +335,15 @@ static int em_##op(struct x86_emulate_ct
 	ON64(case 8: COP_ASM2(op##q, rdx, rax); break;) \
 	COP_END
 
+/* 2-operand, word only (no byte op) */
+#define COP2W(op) \
+	COP_START(op) \
+	case 1: break; \
+	case 2: COP_ASM2(op##w, ax, dx); break; \
+	case 4: COP_ASM2(op##l, eax, edx); break; \
+	ON64(case 8: COP_ASM2(op##q, rax, rdx); break;) \
+	COP_END
+
 /*
  * fastop functions have a special calling convention:
  *
@@ -1064,7 +1073,7 @@ FASTOP1SRC2EX(idiv, idiv_ex);
 FASTOP3WCL(shld);
 FASTOP3WCL(shrd);
 
-FASTOP2W(imul);
+COP2W(imul);
 
 COP1(not);
 COP1(neg);
@@ -1079,12 +1088,12 @@ FASTOP2CL(shl);
 FASTOP2CL(shr);
 FASTOP2CL(sar);
 
-FASTOP2W(bsf);
-FASTOP2W(bsr);
-FASTOP2W(bt);
-FASTOP2W(bts);
-FASTOP2W(btr);
-FASTOP2W(btc);
+COP2W(bsf);
+COP2W(bsr);
+COP2W(bt);
+COP2W(bts);
+COP2W(btr);
+COP2W(btc);
 
 COP2R(cmp, cmp_r);
 
@@ -1093,7 +1102,7 @@ static int em_bsf_c(struct x86_emulate_c
 	/* If src is zero, do not writeback, but update flags */
 	if (ctxt->src.val == 0)
 		ctxt->dst.type = OP_NONE;
-	return fastop(ctxt, em_bsf);
+	return em_bsf(ctxt);
 }
 
 static int em_bsr_c(struct x86_emulate_ctxt *ctxt)
@@ -1101,7 +1110,7 @@ static int em_bsr_c(struct x86_emulate_c
 	/* If src is zero, do not writeback, but update flags */
 	if (ctxt->src.val == 0)
 		ctxt->dst.type = OP_NONE;
-	return fastop(ctxt, em_bsr);
+	return em_bsr(ctxt);
 }
 
 static __always_inline u8 test_cc(unsigned int condition, unsigned long flags)
@@ -3221,7 +3230,7 @@ static int em_xchg(struct x86_emulate_ct
 static int em_imul_3op(struct x86_emulate_ctxt *ctxt)
 {
 	ctxt->dst.val = ctxt->src2.val;
-	return fastop(ctxt, em_imul);
+	return em_imul(ctxt);
 }
 
 static int em_cwd(struct x86_emulate_ctxt *ctxt)
@@ -4135,10 +4144,10 @@ static const struct group_dual group7 =
 
 static const struct opcode group8[] = {
 	N, N, N, N,
-	F(DstMem | SrcImmByte | NoWrite,		em_bt),
-	F(DstMem | SrcImmByte | Lock | PageTable,	em_bts),
-	F(DstMem | SrcImmByte | Lock,			em_btr),
-	F(DstMem | SrcImmByte | Lock | PageTable,	em_btc),
+	I(DstMem | SrcImmByte | NoWrite,		em_bt),
+	I(DstMem | SrcImmByte | Lock | PageTable,	em_bts),
+	I(DstMem | SrcImmByte | Lock,			em_btr),
+	I(DstMem | SrcImmByte | Lock | PageTable,	em_btc),
 };
 
 /*
@@ -4459,27 +4468,27 @@ static const struct opcode twobyte_table
 	/* 0xA0 - 0xA7 */
 	I(Stack | Src2FS, em_push_sreg), I(Stack | Src2FS, em_pop_sreg),
 	II(ImplicitOps, em_cpuid, cpuid),
-	F(DstMem | SrcReg | ModRM | BitOp | NoWrite, em_bt),
+	I(DstMem | SrcReg | ModRM | BitOp | NoWrite, em_bt),
 	F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shld),
 	F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
 	/* 0xA8 - 0xAF */
 	I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
 	II(EmulateOnUD | ImplicitOps, em_rsm, rsm),
-	F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
+	I(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
 	F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
 	F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
-	GD(0, &group15), F(DstReg | SrcMem | ModRM, em_imul),
+	GD(0, &group15), I(DstReg | SrcMem | ModRM, em_imul),
 	/* 0xB0 - 0xB7 */
 	I2bv(DstMem | SrcReg | ModRM | Lock | PageTable | SrcWrite, em_cmpxchg),
 	I(DstReg | SrcMemFAddr | ModRM | Src2SS, em_lseg),
-	F(DstMem | SrcReg | ModRM | BitOp | Lock, em_btr),
+	I(DstMem | SrcReg | ModRM | BitOp | Lock, em_btr),
 	I(DstReg | SrcMemFAddr | ModRM | Src2FS, em_lseg),
 	I(DstReg | SrcMemFAddr | ModRM | Src2GS, em_lseg),
 	D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),
 	/* 0xB8 - 0xBF */
 	N, N,
 	G(BitOp, group8),
-	F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_btc),
+	I(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_btc),
 	I(DstReg | SrcMem | ModRM, em_bsf_c),
 	I(DstReg | SrcMem | ModRM, em_bsr_c),
 	D(DstReg | SrcMem8 | ModRM | Mov), D(DstReg | SrcMem16 | ModRM | Mov),



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 06/13] x86/kvm/emulate: Introduce COP2CL
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (4 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 05/13] x86/kvm/emulate: Introduce COP2W Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 07/13] x86/kvm/emulate: Introduce COP1SRC2 Peter Zijlstra
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Replace the FASTOP2CL instructions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   39 ++++++++++++++++++++++++---------------
 1 file changed, 24 insertions(+), 15 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -344,6 +344,15 @@ static int em_##op(struct x86_emulate_ct
 	ON64(case 8: COP_ASM2(op##q, rax, rdx); break;) \
 	COP_END
 
+/* 2-operand, using "a" (dst) and CL (src2) */
+#define COP2CL(op) \
+	COP_START(op) \
+	case 1: COP_ASM2(op##b, al, cl); break; \
+	case 2: COP_ASM2(op##w, ax, cl); break; \
+	case 4: COP_ASM2(op##l, eax, cl); break; \
+	ON64(case 8: COP_ASM2(op##q, rax, cl); break;) \
+	COP_END
+
 /*
  * fastop functions have a special calling convention:
  *
@@ -1080,13 +1089,13 @@ COP1(neg);
 COP1(inc);
 COP1(dec);
 
-FASTOP2CL(rol);
-FASTOP2CL(ror);
-FASTOP2CL(rcl);
-FASTOP2CL(rcr);
-FASTOP2CL(shl);
-FASTOP2CL(shr);
-FASTOP2CL(sar);
+COP2CL(rol);
+COP2CL(ror);
+COP2CL(rcl);
+COP2CL(rcr);
+COP2CL(shl);
+COP2CL(shr);
+COP2CL(sar);
 
 COP2W(bsf);
 COP2W(bsr);
@@ -4079,14 +4088,14 @@ static const struct opcode group1A[] = {
 };
 
 static const struct opcode group2[] = {
-	F(DstMem | ModRM, em_rol),
-	F(DstMem | ModRM, em_ror),
-	F(DstMem | ModRM, em_rcl),
-	F(DstMem | ModRM, em_rcr),
-	F(DstMem | ModRM, em_shl),
-	F(DstMem | ModRM, em_shr),
-	F(DstMem | ModRM, em_shl),
-	F(DstMem | ModRM, em_sar),
+	I(DstMem | ModRM, em_rol),
+	I(DstMem | ModRM, em_ror),
+	I(DstMem | ModRM, em_rcl),
+	I(DstMem | ModRM, em_rcr),
+	I(DstMem | ModRM, em_shl),
+	I(DstMem | ModRM, em_shr),
+	I(DstMem | ModRM, em_shl),
+	I(DstMem | ModRM, em_sar),
 };
 
 static const struct opcode group3[] = {



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 07/13] x86/kvm/emulate: Introduce COP1SRC2
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (5 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 06/13] x86/kvm/emulate: Introduce COP2CL Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 08/13] x86/kvm/emulate: Introduce COP3WCL Peter Zijlstra
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Replace the FASTOP1SRC2*() instructions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -317,6 +317,24 @@ static int em_##op(struct x86_emulate_ct
 	ON64(case 8: COP_ASM1(op##q, rax); break;) \
 	COP_END
 
+/* 1-operand, using "c" (src2) */
+#define COP1SRC2(op, name) \
+	COP_START(name) \
+	case 1: COP_ASM1(op##b, cl); break; \
+	case 2: COP_ASM1(op##w, cx); break; \
+	case 4: COP_ASM1(op##l, ecx); break; \
+	ON64(case 8: COP_ASM1(op##q, rcx); break;) \
+	COP_END
+
+/* 1-operand, using "c" (src2) with exception */
+#define COP1SRC2EX(op, name) \
+	COP_START(name) \
+	case 1: COP_ASM1_EX(op##b, cl); break; \
+	case 2: COP_ASM1_EX(op##w, cx); break; \
+	case 4: COP_ASM1_EX(op##l, ecx); break; \
+	ON64(case 8: COP_ASM1(op##q, rcx); break;) \
+	COP_END
+
 /* 2-operand, using "a" (dst), "d" (src) */
 #define COP2(op) \
 	COP_START(op) \
@@ -1074,10 +1092,10 @@ COP2(cmp);
 COP2(test);
 COP2(xadd);
 
-FASTOP1SRC2(mul, mul_ex);
-FASTOP1SRC2(imul, imul_ex);
-FASTOP1SRC2EX(div, div_ex);
-FASTOP1SRC2EX(idiv, idiv_ex);
+COP1SRC2(mul, mul_ex);
+COP1SRC2(imul, imul_ex);
+COP1SRC2EX(div, div_ex);
+COP1SRC2EX(idiv, idiv_ex);
 
 FASTOP3WCL(shld);
 FASTOP3WCL(shrd);
@@ -4103,10 +4121,10 @@ static const struct opcode group3[] = {
 	I(DstMem | SrcImm | NoWrite, em_test),
 	I(DstMem | SrcNone | Lock, em_not),
 	I(DstMem | SrcNone | Lock, em_neg),
-	F(DstXacc | Src2Mem, em_mul_ex),
-	F(DstXacc | Src2Mem, em_imul_ex),
-	F(DstXacc | Src2Mem, em_div_ex),
-	F(DstXacc | Src2Mem, em_idiv_ex),
+	I(DstXacc | Src2Mem, em_mul_ex),
+	I(DstXacc | Src2Mem, em_imul_ex),
+	I(DstXacc | Src2Mem, em_div_ex),
+	I(DstXacc | Src2Mem, em_idiv_ex),
 };
 
 static const struct opcode group4[] = {



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 08/13] x86/kvm/emulate: Introduce COP3WCL
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (6 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 07/13] x86/kvm/emulate: Introduce COP1SRC2 Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 09/13] x86/kvm/emulate: Convert em_salc() to C Peter Zijlstra
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Replace the FASTOP3WCL instructions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -302,6 +302,9 @@ static int em_##op(struct x86_emulate_ct
 #define COP_ASM2(op, dst, src) \
 		COP_ASM(#op " %%" #src ", %%" #dst " \n\t")
 
+#define COP_ASM3(op, dst, src, src2) \
+		COP_ASM(#op " %%" #src2 ", %%" #src ", %%" #dst " \n\t")
+
 #define COP_END \
 	} \
 	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK); \
@@ -371,6 +374,16 @@ static int em_##op(struct x86_emulate_ct
 	ON64(case 8: COP_ASM2(op##q, rax, cl); break;) \
 	COP_END
 
+/* 3-operand, using "a" (dst), "d" (src) and CL (src2) */
+#define COP3WCL(op) \
+	COP_START(op) \
+	case 1: break; \
+	case 2: COP_ASM3(op##w, ax, dx, cl); break; \
+	case 4: COP_ASM3(op##l, eax, edx, cl); break; \
+	ON64(case 8: COP_ASM3(op##q, rax, rdx, cl); break;) \
+	COP_END
+
+
 /*
  * fastop functions have a special calling convention:
  *
@@ -1097,8 +1110,8 @@ COP1SRC2(imul, imul_ex);
 COP1SRC2EX(div, div_ex);
 COP1SRC2EX(idiv, idiv_ex);
 
-FASTOP3WCL(shld);
-FASTOP3WCL(shrd);
+COP3WCL(shld);
+COP3WCL(shrd);
 
 COP2W(imul);
 
@@ -4496,14 +4509,14 @@ static const struct opcode twobyte_table
 	I(Stack | Src2FS, em_push_sreg), I(Stack | Src2FS, em_pop_sreg),
 	II(ImplicitOps, em_cpuid, cpuid),
 	I(DstMem | SrcReg | ModRM | BitOp | NoWrite, em_bt),
-	F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shld),
-	F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
+	I(DstMem | SrcReg | Src2ImmByte | ModRM, em_shld),
+	I(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
 	/* 0xA8 - 0xAF */
 	I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
 	II(EmulateOnUD | ImplicitOps, em_rsm, rsm),
 	I(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
-	F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
-	F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
+	I(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
+	I(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
 	GD(0, &group15), I(DstReg | SrcMem | ModRM, em_imul),
 	/* 0xB0 - 0xB7 */
 	I2bv(DstMem | SrcReg | ModRM | Lock | PageTable | SrcWrite, em_cmpxchg),



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 09/13] x86/kvm/emulate: Convert em_salc() to C
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (7 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 08/13] x86/kvm/emulate: Introduce COP3WCL Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 10/13] x86/kvm/emulate: Remove fastops Peter Zijlstra
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Implement the SALC (Set AL if Carry) instruction in C.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |   15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -529,11 +529,14 @@ static int fastop(struct x86_emulate_ctx
 	ON64(FOP3E(op##q, rax, rdx, cl)) \
 	FOP_END
 
-FOP_START(salc)
-FOP_FUNC(salc)
-"pushf; sbb %al, %al; popf \n\t"
-FOP_RET(salc)
-FOP_END;
+static int em_salc(struct x86_emulate_ctxt *ctxt)
+{
+	/*
+	 * Set AL 0xFF if CF is set, or 0x00 when clear.
+	 */
+	ctxt->dst.val = 0xFF * !!(ctxt->eflags & X86_EFLAGS_CF);
+	return X86EMUL_CONTINUE;
+}
 
 /*
  * XXX: inoutclob user must know where the argument is being expanded.
@@ -4423,7 +4426,7 @@ static const struct opcode opcode_table[
 	G(Src2CL | ByteOp, group2), G(Src2CL, group2),
 	I(DstAcc | SrcImmUByte | No64, em_aam),
 	I(DstAcc | SrcImmUByte | No64, em_aad),
-	F(DstAcc | ByteOp | No64, em_salc),
+	I(DstAcc | ByteOp | No64, em_salc),
 	I(DstAcc | SrcXLat | ByteOp, em_mov),
 	/* 0xD8 - 0xDF */
 	N, E(0, &escape_d9), N, E(0, &escape_db), N, E(0, &escape_dd), N, N,



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 10/13] x86/kvm/emulate: Remove fastops
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (8 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 09/13] x86/kvm/emulate: Convert em_salc() to C Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 11:07 ` [PATCH v2 11/13] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

No more FASTOPs, remove the remains.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c |  172 -------------------------------------------------
 1 file changed, 1 insertion(+), 171 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -167,7 +167,6 @@
 #define Unaligned   ((u64)2 << 41)  /* Explicitly unaligned (e.g. MOVDQU) */
 #define Avx         ((u64)3 << 41)  /* Advanced Vector Extensions */
 #define Aligned16   ((u64)4 << 41)  /* Aligned to 16 byte boundary (e.g. FXSAVE) */
-#define Fastop      ((u64)1 << 44)  /* Use opcode::u.fastop */
 #define NoWrite     ((u64)1 << 45)  /* No writeback */
 #define SrcWrite    ((u64)1 << 46)  /* Write back src operand */
 #define NoMod	    ((u64)1 << 47)  /* Mod field is ignored */
@@ -203,7 +202,6 @@ struct opcode {
 		const struct escape *esc;
 		const struct instr_dual *idual;
 		const struct mode_dual *mdual;
-		void (*fastop)(struct fastop *fake);
 	} u;
 	int (*check_perm)(struct x86_emulate_ctxt *ctxt);
 };
@@ -383,152 +381,6 @@ static int em_##op(struct x86_emulate_ct
 	ON64(case 8: COP_ASM3(op##q, rax, rdx, cl); break;) \
 	COP_END
 
-
-/*
- * fastop functions have a special calling convention:
- *
- * dst:    rax        (in/out)
- * src:    rdx        (in/out)
- * src2:   rcx        (in)
- * flags:  rflags     (in/out)
- * ex:     rsi        (in:fastop pointer, out:zero if exception)
- *
- * Moreover, they are all exactly FASTOP_SIZE bytes long, so functions for
- * different operand sizes can be reached by calculation, rather than a jump
- * table (which would be bigger than the code).
- *
- * The 16 byte alignment, considering 5 bytes for the RET thunk, 3 for ENDBR
- * and 1 for the straight line speculation INT3, leaves 7 bytes for the
- * body of the function.  Currently none is larger than 4.
- */
-static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop);
-
-#define FASTOP_SIZE	16
-
-#define __FOP_FUNC(name) \
-	".align " __stringify(FASTOP_SIZE) " \n\t" \
-	".type " name ", @function \n\t" \
-	name ":\n\t" \
-	ASM_ENDBR \
-	IBT_NOSEAL(name)
-
-#define FOP_FUNC(name) \
-	__FOP_FUNC(#name)
-
-#define __FOP_RET(name) \
-	"11: " ASM_RET \
-	".size " name ", .-" name "\n\t"
-
-#define FOP_RET(name) \
-	__FOP_RET(#name)
-
-#define __FOP_START(op, align) \
-	extern void em_##op(struct fastop *fake); \
-	asm(".pushsection .text, \"ax\" \n\t" \
-	    ".global em_" #op " \n\t" \
-	    ".align " __stringify(align) " \n\t" \
-	    "em_" #op ":\n\t"
-
-#define FOP_START(op) __FOP_START(op, FASTOP_SIZE)
-
-#define FOP_END \
-	    ".popsection")
-
-#define __FOPNOP(name) \
-	__FOP_FUNC(name) \
-	__FOP_RET(name)
-
-#define FOPNOP() \
-	__FOPNOP(__stringify(__UNIQUE_ID(nop)))
-
-#define FOP1E(op,  dst) \
-	__FOP_FUNC(#op "_" #dst) \
-	"10: " #op " %" #dst " \n\t" \
-	__FOP_RET(#op "_" #dst)
-
-#define FOP1EEX(op,  dst) \
-	FOP1E(op, dst) _ASM_EXTABLE_TYPE_REG(10b, 11b, EX_TYPE_ZERO_REG, %%esi)
-
-#define FASTOP1(op) \
-	FOP_START(op) \
-	FOP1E(op##b, al) \
-	FOP1E(op##w, ax) \
-	FOP1E(op##l, eax) \
-	ON64(FOP1E(op##q, rax))	\
-	FOP_END
-
-/* 1-operand, using src2 (for MUL/DIV r/m) */
-#define FASTOP1SRC2(op, name) \
-	FOP_START(name) \
-	FOP1E(op, cl) \
-	FOP1E(op, cx) \
-	FOP1E(op, ecx) \
-	ON64(FOP1E(op, rcx)) \
-	FOP_END
-
-/* 1-operand, using src2 (for MUL/DIV r/m), with exceptions */
-#define FASTOP1SRC2EX(op, name) \
-	FOP_START(name) \
-	FOP1EEX(op, cl) \
-	FOP1EEX(op, cx) \
-	FOP1EEX(op, ecx) \
-	ON64(FOP1EEX(op, rcx)) \
-	FOP_END
-
-#define FOP2E(op,  dst, src)	   \
-	__FOP_FUNC(#op "_" #dst "_" #src) \
-	#op " %" #src ", %" #dst " \n\t" \
-	__FOP_RET(#op "_" #dst "_" #src)
-
-#define FASTOP2(op) \
-	FOP_START(op) \
-	FOP2E(op##b, al, dl) \
-	FOP2E(op##w, ax, dx) \
-	FOP2E(op##l, eax, edx) \
-	ON64(FOP2E(op##q, rax, rdx)) \
-	FOP_END
-
-/* 2 operand, word only */
-#define FASTOP2W(op) \
-	FOP_START(op) \
-	FOPNOP() \
-	FOP2E(op##w, ax, dx) \
-	FOP2E(op##l, eax, edx) \
-	ON64(FOP2E(op##q, rax, rdx)) \
-	FOP_END
-
-/* 2 operand, src is CL */
-#define FASTOP2CL(op) \
-	FOP_START(op) \
-	FOP2E(op##b, al, cl) \
-	FOP2E(op##w, ax, cl) \
-	FOP2E(op##l, eax, cl) \
-	ON64(FOP2E(op##q, rax, cl)) \
-	FOP_END
-
-/* 2 operand, src and dest are reversed */
-#define FASTOP2R(op, name) \
-	FOP_START(name) \
-	FOP2E(op##b, dl, al) \
-	FOP2E(op##w, dx, ax) \
-	FOP2E(op##l, edx, eax) \
-	ON64(FOP2E(op##q, rdx, rax)) \
-	FOP_END
-
-#define FOP3E(op,  dst, src, src2) \
-	__FOP_FUNC(#op "_" #dst "_" #src "_" #src2) \
-	#op " %" #src2 ", %" #src ", %" #dst " \n\t"\
-	__FOP_RET(#op "_" #dst "_" #src "_" #src2)
-
-/* 3-operand, word-only, src2=cl */
-#define FASTOP3WCL(op) \
-	FOP_START(op) \
-	FOPNOP() \
-	FOP3E(op##w, ax, dx, cl) \
-	FOP3E(op##l, eax, edx, cl) \
-	ON64(FOP3E(op##q, rax, rdx, cl)) \
-	FOP_END
-
 static int em_salc(struct x86_emulate_ctxt *ctxt)
 {
 	/*
@@ -4052,7 +3904,6 @@ static int check_perm_out(struct x86_emu
 #define MD(_f, _m) { .flags = ((_f) | ModeDual), .u.mdual = (_m) }
 #define E(_f, _e) { .flags = ((_f) | Escape | ModRM), .u.esc = (_e) }
 #define I(_f, _e) { .flags = (_f), .u.execute = (_e) }
-#define F(_f, _e) { .flags = (_f) | Fastop, .u.fastop = (_e) }
 #define II(_f, _e, _i) \
 	{ .flags = (_f)|Intercept, .u.execute = (_e), .intercept = x86_intercept_##_i }
 #define IIP(_f, _e, _i, _p) \
@@ -5158,24 +5009,6 @@ static void fetch_possible_mmx_operand(s
 		kvm_read_mmx_reg(op->addr.mm, &op->mm_val);
 }
 
-static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
-{
-	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
-
-	if (!(ctxt->d & ByteOp))
-		fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
-
-	asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
-	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
-	      [thunk_target]"+S"(fop), ASM_CALL_CONSTRAINT
-	    : "c"(ctxt->src2.val));
-
-	ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
-	if (!fop) /* exception is returned in fop variable */
-		return emulate_de(ctxt);
-	return X86EMUL_CONTINUE;
-}
-
 void init_decode_cache(struct x86_emulate_ctxt *ctxt)
 {
 	/* Clear fields that are set conditionally but read without a guard. */
@@ -5340,10 +5173,7 @@ int x86_emulate_insn(struct x86_emulate_
 		ctxt->eflags &= ~X86_EFLAGS_RF;
 
 	if (ctxt->execute) {
-		if (ctxt->d & Fastop)
-			rc = fastop(ctxt, ctxt->fop);
-		else
-			rc = ctxt->execute(ctxt);
+		rc = ctxt->execute(ctxt);
 		if (rc != X86EMUL_CONTINUE)
 			goto done;
 		goto writeback;



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 11/13] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (9 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 10/13] x86/kvm/emulate: Remove fastops Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-05-01  2:36   ` Michael Kelley
  2025-04-30 11:07 ` [PATCH v2 12/13] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

What used to be a simple few instructions has turned into a giant mess
(for x86_64). Not only does it use static_branch wrong, it mixes it
with dynamic branches for no apparent reason.

Notably it uses static_branch through an out-of-line function call,
which completely defeats the purpose, since instead of a simple
JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is
absolutely idiotic.

Add to that a dynamic test of hyperv_paravisor_present, something
which is set once and never changed.

Replace all this idiocy with a single direct function call to the
right hypercall variant.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/hyperv/hv_init.c       |   20 +++++
 arch/x86/hyperv/ivm.c           |   15 ++++
 arch/x86/include/asm/mshyperv.h |  137 +++++++++++-----------------------------
 arch/x86/kernel/cpu/mshyperv.c  |   19 +++--
 4 files changed, 89 insertions(+), 102 deletions(-)

--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -35,7 +35,27 @@
 #include <linux/highmem.h>
 
 void *hv_hypercall_pg;
+
+#ifdef CONFIG_X86_64
+u64 hv_std_hypercall(u64 control, u64 param1, u64 param2)
+{
+	u64 hv_status;
+
+	if (!hv_hypercall_pg)
+		return U64_MAX;
+
+	register u64 __r8 asm("r8") = param2;
+	asm volatile (CALL_NOSPEC
+		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+		        "+c" (control), "+d" (param1), "+r" (__r8)
+		      : THUNK_TARGET(hv_hypercall_pg)
+		      : "cc", "memory", "r9", "r10", "r11");
+
+	return hv_status;
+}
+#else
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
+#endif
 
 union hv_ghcb * __percpu *hv_ghcb_pg;
 
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -376,9 +376,23 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon
 	return ret;
 }
 
+u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2)
+{
+	u64 hv_status;
+
+	register u64 __r8 asm("r8") = param2;
+	asm volatile("vmmcall"
+		     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+		       "+c" (control), "+d" (param1), "+r" (__r8)
+		     : : "cc", "memory", "r9", "r10", "r11");
+
+	return hv_status;
+}
+
 #else
 static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
 static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
+u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) { return U64_MAX; }
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #ifdef CONFIG_INTEL_TDX_GUEST
@@ -428,6 +442,7 @@ u64 hv_tdx_hypercall(u64 control, u64 pa
 #else
 static inline void hv_tdx_msr_write(u64 msr, u64 value) {}
 static inline void hv_tdx_msr_read(u64 msr, u64 *value) {}
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2) { return U64_MAX; }
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
 #if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST)
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -6,6 +6,7 @@
 #include <linux/nmi.h>
 #include <linux/msi.h>
 #include <linux/io.h>
+#include <linux/static_call.h>
 #include <asm/nospec-branch.h>
 #include <asm/paravirt.h>
 #include <hyperv/hvhdk.h>
@@ -38,16 +39,21 @@ static inline unsigned char hv_get_nmi_r
 	return 0;
 }
 
-#if IS_ENABLED(CONFIG_HYPERV)
-extern bool hyperv_paravisor_present;
+extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
+extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2);
 
+#if IS_ENABLED(CONFIG_HYPERV)
 extern void *hv_hypercall_pg;
 
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
 bool hv_isolation_type_snp(void);
 bool hv_isolation_type_tdx(void);
-u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+
+#ifdef CONFIG_X86_64
+DECLARE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
+#endif
 
 /*
  * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
@@ -64,37 +70,15 @@ static inline u64 hv_do_hypercall(u64 co
 {
 	u64 input_address = input ? virt_to_phys(input) : 0;
 	u64 output_address = output ? virt_to_phys(output) : 0;
-	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
-		return hv_tdx_hypercall(control, input_address, output_address);
-
-	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
-		__asm__ __volatile__("mov %[output_address], %%r8\n"
-				     "vmmcall"
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input_address)
-				     : [output_address] "r" (output_address)
-				     : "cc", "memory", "r8", "r9", "r10", "r11");
-		return hv_status;
-	}
-
-	if (!hv_hypercall_pg)
-		return U64_MAX;
-
-	__asm__ __volatile__("mov %[output_address], %%r8\n"
-			     CALL_NOSPEC
-			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-			       "+c" (control), "+d" (input_address)
-			     : [output_address] "r" (output_address),
-			       THUNK_TARGET(hv_hypercall_pg)
-			     : "cc", "memory", "r8", "r9", "r10", "r11");
+	return static_call_mod(hv_hypercall)(control, input_address, output_address);
 #else
 	u32 input_address_hi = upper_32_bits(input_address);
 	u32 input_address_lo = lower_32_bits(input_address);
 	u32 output_address_hi = upper_32_bits(output_address);
 	u32 output_address_lo = lower_32_bits(output_address);
+	u64 hv_status;
 
 	if (!hv_hypercall_pg)
 		return U64_MAX;
@@ -107,8 +91,8 @@ static inline u64 hv_do_hypercall(u64 co
 			       "D"(output_address_hi), "S"(output_address_lo),
 			       THUNK_TARGET(hv_hypercall_pg)
 			     : "cc", "memory");
-#endif /* !x86_64 */
 	return hv_status;
+#endif /* !x86_64 */
 }
 
 /* Hypercall to the L0 hypervisor */
@@ -120,41 +104,23 @@ static inline u64 hv_do_nested_hypercall
 /* Fast hypercall with 8 bytes of input and no output */
 static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 {
-	u64 hv_status;
-
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
-		return hv_tdx_hypercall(control, input1, 0);
-
-	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
-		__asm__ __volatile__(
-				"vmmcall"
-				: "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				"+c" (control), "+d" (input1)
-				:: "cc", "r8", "r9", "r10", "r11");
-	} else {
-		__asm__ __volatile__(CALL_NOSPEC
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input1)
-				     : THUNK_TARGET(hv_hypercall_pg)
-				     : "cc", "r8", "r9", "r10", "r11");
-	}
+	return static_call_mod(hv_hypercall)(control, input1, 0);
 #else
-	{
-		u32 input1_hi = upper_32_bits(input1);
-		u32 input1_lo = lower_32_bits(input1);
-
-		__asm__ __volatile__ (CALL_NOSPEC
-				      : "=A"(hv_status),
-					"+c"(input1_lo),
-					ASM_CALL_CONSTRAINT
-				      :	"A" (control),
-					"b" (input1_hi),
-					THUNK_TARGET(hv_hypercall_pg)
-				      : "cc", "edi", "esi");
-	}
-#endif
+	u32 input1_hi = upper_32_bits(input1);
+	u32 input1_lo = lower_32_bits(input1);
+	u64 hv_status;
+
+	__asm__ __volatile__ (CALL_NOSPEC
+			      : "=A"(hv_status),
+			      "+c"(input1_lo),
+			      ASM_CALL_CONSTRAINT
+			      :	"A" (control),
+			      "b" (input1_hi),
+			      THUNK_TARGET(hv_hypercall_pg)
+			      : "cc", "edi", "esi");
 	return hv_status;
+#endif
 }
 
 static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
@@ -174,45 +140,24 @@ static inline u64 hv_do_fast_nested_hype
 /* Fast hypercall with 16 bytes of input */
 static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 {
-	u64 hv_status;
-
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
-		return hv_tdx_hypercall(control, input1, input2);
-
-	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
-		__asm__ __volatile__("mov %[input2], %%r8\n"
-				     "vmmcall"
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input1)
-				     : [input2] "r" (input2)
-				     : "cc", "r8", "r9", "r10", "r11");
-	} else {
-		__asm__ __volatile__("mov %[input2], %%r8\n"
-				     CALL_NOSPEC
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input1)
-				     : [input2] "r" (input2),
-				       THUNK_TARGET(hv_hypercall_pg)
-				     : "cc", "r8", "r9", "r10", "r11");
-	}
+	return static_call_mod(hv_hypercall)(control, input1, input2);
 #else
-	{
-		u32 input1_hi = upper_32_bits(input1);
-		u32 input1_lo = lower_32_bits(input1);
-		u32 input2_hi = upper_32_bits(input2);
-		u32 input2_lo = lower_32_bits(input2);
-
-		__asm__ __volatile__ (CALL_NOSPEC
-				      : "=A"(hv_status),
-					"+c"(input1_lo), ASM_CALL_CONSTRAINT
-				      :	"A" (control), "b" (input1_hi),
-					"D"(input2_hi), "S"(input2_lo),
-					THUNK_TARGET(hv_hypercall_pg)
-				      : "cc");
-	}
-#endif
+	u32 input1_hi = upper_32_bits(input1);
+	u32 input1_lo = lower_32_bits(input1);
+	u32 input2_hi = upper_32_bits(input2);
+	u32 input2_lo = lower_32_bits(input2);
+	u64 hv_status;
+
+	__asm__ __volatile__ (CALL_NOSPEC
+			      : "=A"(hv_status),
+			      "+c"(input1_lo), ASM_CALL_CONSTRAINT
+			      :	"A" (control), "b" (input1_hi),
+			      "D"(input2_hi), "S"(input2_lo),
+			      THUNK_TARGET(hv_hypercall_pg)
+			      : "cc");
 	return hv_status;
+#endif
 }
 
 static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -37,10 +37,6 @@
 bool hv_nested;
 struct ms_hyperv_info ms_hyperv;
 
-/* Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h */
-bool hyperv_paravisor_present __ro_after_init;
-EXPORT_SYMBOL_GPL(hyperv_paravisor_present);
-
 #if IS_ENABLED(CONFIG_HYPERV)
 static inline unsigned int hv_get_nested_msr(unsigned int reg)
 {
@@ -287,8 +283,18 @@ static void __init x86_setup_ops_for_tsc
 	old_restore_sched_clock_state = x86_platform.restore_sched_clock_state;
 	x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state;
 }
+
+#ifdef CONFIG_X86_64
+DEFINE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
+EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall);
+#define hypercall_update(hc) static_call_update(hv_hypercall, hc)
+#endif
 #endif /* CONFIG_HYPERV */
 
+#ifndef hypercall_update
+#define hypercall_update(hc) (void)hc
+#endif
+
 static uint32_t  __init ms_hyperv_platform(void)
 {
 	u32 eax;
@@ -483,14 +489,14 @@ static void __init ms_hyperv_init_platfo
 			ms_hyperv.shared_gpa_boundary =
 				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
 
-		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
-
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
 
 
 		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
 			static_branch_enable(&isolation_type_snp);
+			if (!ms_hyperv.paravisor_present)
+				hypercall_update(hv_snp_hypercall);
 		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
 			static_branch_enable(&isolation_type_tdx);
 
@@ -498,6 +504,7 @@ static void __init ms_hyperv_init_platfo
 			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
 
 			if (!ms_hyperv.paravisor_present) {
+				hypercall_update(hv_tdx_hypercall);
 				/*
 				 * Mark the Hyper-V TSC page feature as disabled
 				 * in a TDX VM without paravisor so that the



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 12/13] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (10 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 11/13] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-05-01  2:36   ` Michael Kelley
  2025-04-30 11:07 ` [PATCH v2 13/13] objtool: Validate kCFI calls Peter Zijlstra
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Instead of using an indirect call to the hypercall page, use a direct
call instead. This avoids all CFI problems, including the one where
the hypercall page doesn't have IBT on.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/hyperv/hv_init.c |   60 +++++++++++++++++++++++-----------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,23 +37,41 @@
 void *hv_hypercall_pg;
 
 #ifdef CONFIG_X86_64
+static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
+{
+	return U64_MAX;
+}
+
+DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail);
+
 u64 hv_std_hypercall(u64 control, u64 param1, u64 param2)
 {
 	u64 hv_status;
 
-	if (!hv_hypercall_pg)
-		return U64_MAX;
-
 	register u64 __r8 asm("r8") = param2;
-	asm volatile (CALL_NOSPEC
+	asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall)
 		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
 		        "+c" (control), "+d" (param1), "+r" (__r8)
-		      : THUNK_TARGET(hv_hypercall_pg)
-		      : "cc", "memory", "r9", "r10", "r11");
+		      : : "cc", "memory", "r9", "r10", "r11");
 
 	return hv_status;
 }
+
+typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2);
+
+static inline void hv_set_hypercall_pg(void *ptr)
+{
+	hv_hypercall_pg = ptr;
+
+	if (!ptr)
+		ptr = &__hv_hyperfail;
+	static_call_update(__hv_hypercall, (hv_hypercall_f)ptr);
+}
 #else
+static inline void hv_set_hypercall_pg(void *ptr)
+{
+	hv_hypercall_pg = ptr;
+}
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 #endif
 
@@ -348,7 +366,7 @@ static int hv_suspend(void)
 	 * pointer is restored on resume.
 	 */
 	hv_hypercall_pg_saved = hv_hypercall_pg;
-	hv_hypercall_pg = NULL;
+	hv_set_hypercall_pg(NULL);
 
 	/* Disable the hypercall page in the hypervisor */
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -374,7 +392,7 @@ static void hv_resume(void)
 		vmalloc_to_pfn(hv_hypercall_pg_saved);
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
-	hv_hypercall_pg = hv_hypercall_pg_saved;
+	hv_set_hypercall_pg(hv_hypercall_pg_saved);
 	hv_hypercall_pg_saved = NULL;
 
 	/*
@@ -528,8 +546,8 @@ void __init hyperv_init(void)
 	if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present)
 		goto skip_hypercall_pg_init;
 
-	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
-			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
+	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
+			MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
 			__builtin_return_address(0));
 	if (hv_hypercall_pg == NULL)
@@ -567,27 +585,9 @@ void __init hyperv_init(void)
 		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	}
 
-skip_hypercall_pg_init:
-	/*
-	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
-	 * in that there's no ENDBR64 instruction at the entry to the
-	 * hypercall page. Because hypercalls are invoked via an indirect call
-	 * to the hypercall page, all hypercall attempts fail when IBT is
-	 * enabled, and Linux panics. For such buggy versions, disable IBT.
-	 *
-	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
-	 * page, so if future Linux kernel versions enable IBT for 32-bit
-	 * builds, additional hypercall page hackery will be required here
-	 * to provide an ENDBR32.
-	 */
-#ifdef CONFIG_X86_KERNEL_IBT
-	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
-	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
-		setup_clear_cpu_cap(X86_FEATURE_IBT);
-		pr_warn("Disabling IBT because of Hyper-V bug\n");
-	}
-#endif
+	hv_set_hypercall_pg(hv_hypercall_pg);
 
+skip_hypercall_pg_init:
 	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
 	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and



^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH v2 13/13] objtool: Validate kCFI calls
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (11 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 12/13] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
@ 2025-04-30 11:07 ` Peter Zijlstra
  2025-04-30 15:59   ` Josh Poimboeuf
  2025-04-30 14:24 ` [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions H. Peter Anvin
  2025-05-01 18:33 ` Paolo Bonzini
  14 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 11:07 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda

Validate that all indirect calls adhere to kCFI rules. Notably doing
nocfi indirect call to a cfi function is broken.

Apparently some Rust 'core' code violates this and explodes when ran
with FineIBT.

All the ANNOTATE_NOCFI_SYM sites are prime targets for attackers.

 - runtime EFI is especially henous because it also needs to disable
   IBT. Basically calling unknown code without CFI protection at
   runtime is a massice security issue.

 - Kexec image handover; if you can exploit this, you get to keep it :-)

 - KVM, for the interrupt injection calling IDT gates directly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/machine_kexec_64.c  |    4 +++
 arch/x86/kvm/vmx/vmenter.S          |    5 ++++
 arch/x86/platform/efi/efi_stub_64.S |    4 +++
 drivers/misc/lkdtm/perms.c          |    5 ++++
 include/linux/objtool.h             |   10 ++++++++
 include/linux/objtool_types.h       |    1 
 tools/include/linux/objtool_types.h |    1 
 tools/objtool/check.c               |   41 ++++++++++++++++++++++++++++++++++++
 tools/objtool/include/objtool/elf.h |    1 
 9 files changed, 72 insertions(+)

--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -421,6 +421,10 @@ void __nocfi machine_kexec(struct kimage
 
 	__ftrace_enabled_restore(save_ftrace_enabled);
 }
+/*
+ * Handover to the next kernel, no CFI concern.
+ */
+ANNOTATE_NOCFI_SYM(machine_kexec);
 
 /* arch-dependent functionality related to kexec file-based syscall */
 
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -363,5 +363,10 @@ SYM_FUNC_END(vmread_error_trampoline)
 .section .text, "ax"
 
 SYM_FUNC_START(vmx_do_interrupt_irqoff)
+	/*
+	 * Calling an IDT gate directly; annotate away the CFI concern for now.
+	 * Should be fixed if possible.
+	 */
+	ANNOTATE_NOCFI_SYM
 	VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
 SYM_FUNC_END(vmx_do_interrupt_irqoff)
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -11,6 +11,10 @@
 #include <asm/nospec-branch.h>
 
 SYM_FUNC_START(__efi_call)
+	/*
+	 * The EFI code doesn't have any CFI, annotate away the CFI violation.
+	 */
+	ANNOTATE_NOCFI_SYM
 	pushq %rbp
 	movq %rsp, %rbp
 	and $~0xf, %rsp
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mman.h>
 #include <linux/uaccess.h>
+#include <linux/objtool.h>
 #include <asm/cacheflush.h>
 #include <asm/sections.h>
 
@@ -86,6 +87,10 @@ static noinline __nocfi void execute_loc
 	func();
 	pr_err("FAIL: func returned\n");
 }
+/*
+ * Explicitly doing the wrong thing for testing.
+ */
+ANNOTATE_NOCFI_SYM(execute_location);
 
 static void execute_user_location(void *dst)
 {
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -184,6 +184,15 @@
  * WARN using UD2.
  */
 #define ANNOTATE_REACHABLE(label)	__ASM_ANNOTATE(label, ANNOTYPE_REACHABLE)
+/*
+ * This should not be used; it annotates away CFI violations. There are a few
+ * valid use cases like kexec handover to the next kernel image, and there is
+ * no security concern there.
+ *
+ * There are also a few real issues annotated away, like EFI because we can't
+ * control the EFI code.
+ */
+#define ANNOTATE_NOCFI_SYM(sym)		asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOCFI))
 
 #else
 #define ANNOTATE_NOENDBR		ANNOTATE type=ANNOTYPE_NOENDBR
@@ -194,6 +203,7 @@
 #define ANNOTATE_INTRA_FUNCTION_CALL	ANNOTATE type=ANNOTYPE_INTRA_FUNCTION_CALL
 #define ANNOTATE_UNRET_BEGIN		ANNOTATE type=ANNOTYPE_UNRET_BEGIN
 #define ANNOTATE_REACHABLE		ANNOTATE type=ANNOTYPE_REACHABLE
+#define ANNOTATE_NOCFI_SYM		ANNOTATE type=ANNOTYPE_NOCFI
 #endif
 
 #if defined(CONFIG_NOINSTR_VALIDATION) && \
--- a/include/linux/objtool_types.h
+++ b/include/linux/objtool_types.h
@@ -65,5 +65,6 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_NOCFI			9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/include/linux/objtool_types.h
+++ b/tools/include/linux/objtool_types.h
@@ -65,5 +65,6 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_NOCFI			9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2388,6 +2388,8 @@ static int __annotate_ifc(struct objtool
 
 static int __annotate_late(struct objtool_file *file, int type, struct instruction *insn)
 {
+	struct symbol *sym;
+
 	switch (type) {
 	case ANNOTYPE_NOENDBR:
 		/* early */
@@ -2429,6 +2431,15 @@ static int __annotate_late(struct objtoo
 		insn->dead_end = false;
 		break;
 
+	case ANNOTYPE_NOCFI:
+		sym = insn->sym;
+		if (!sym) {
+			ERROR_INSN(insn, "dodgy NOCFI annotation");
+			break;
+		}
+		insn->sym->nocfi = 1;
+		break;
+
 	default:
 		ERROR_INSN(insn, "Unknown annotation type: %d", type);
 		return -1;
@@ -3998,6 +4009,36 @@ static int validate_retpoline(struct obj
 		warnings++;
 	}
 
+	if (!opts.cfi)
+		return warnings;
+
+	/*
+	 * kCFI call sites look like:
+	 *
+	 *     movl $(-0x12345678), %r10d
+	 *     addl -4(%r11), %r10d
+	 *     jz 1f
+	 *     ud2
+	 *  1: cs call __x86_indirect_thunk_r11
+	 *
+	 * Verify all indirect calls are kCFI adorned by checking for the
+	 * UD2. Notably, doing __nocfi calls to regular (cfi) functions is
+	 * broken.
+	 */
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
+		struct symbol *sym = insn->sym;
+
+		if (sym && sym->type == STT_FUNC && !sym->nocfi) {
+			struct instruction *prev =
+				prev_insn_same_sym(file, insn);
+
+			if (!prev || prev->type != INSN_BUG) {
+				WARN_INSN(insn, "no-cfi indirect call!");
+				warnings++;
+			}
+		}
+	}
+
 	return warnings;
 }
 
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -70,6 +70,7 @@ struct symbol {
 	u8 local_label       : 1;
 	u8 frame_pointer     : 1;
 	u8 ignore	     : 1;
+	u8 nocfi             : 1;
 	struct list_head pv_target;
 	struct reloc *relocs;
 };



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (12 preceding siblings ...)
  2025-04-30 11:07 ` [PATCH v2 13/13] objtool: Validate kCFI calls Peter Zijlstra
@ 2025-04-30 14:24 ` H. Peter Anvin
  2025-04-30 19:06   ` Peter Zijlstra
  2025-05-01 18:33 ` Paolo Bonzini
  14 siblings, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2025-04-30 14:24 UTC (permalink / raw)
  To: Peter Zijlstra, x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	peterz, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen,
	ojeda, xin

On April 30, 2025 4:07:34 AM PDT, Peter Zijlstra <peterz@infradead.org> wrote:
>Hi!
>
>On kCFI (CONFIG_CFI_CLANG=y) builds all indirect calls should have the CFI
>check on (with very few exceptions). Not having the CFI checks undermines the
>protection provided by CFI and will make these sites candidates for people
>wanting to steal your cookies.
>
>Specifically the ABI changes are so that doing indirect calls without the CFI
>magic, to a CFI adorned function is not compatible (although it happens to work
>for some setups, it very much does not for FineIBT).
>
>Rust people tripped over this the other day, since their 'core' happened to
>have some no_sanitize(kcfi) bits in, which promptly exploded when ran with
>FineIBT on.
>
>Since this is very much not a supported model -- on purpose, have objtool
>detect and warn about such constructs.
>
>This effort [1] found all existing [2] non-cfi indirect calls in the kernel.
>
>Notably the KVM fastop emulation stuff -- which I've completely rewritten for
>this version -- the generated code doesn't look horrific, but is slightly more
>verbose. I'm running on the assumption that instruction emulation is not super
>performance critical these days of zero VM-exit VMs etc.
>
>KVM has another; the VMX interrupt injection stuff calls the IDT handler
>directly.  Is there an alternative? Can we keep a table of Linux functions
>slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?
>
>HyperV hypercall page stuff, which I've previously suggested use direct calls,
>and which I've now converted (after getting properly annoyed with that code).
>
>Also available at:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/core
>
>Changes since v1:
>
> - complete rewrite of the fastop stuff
> - HyperV tweaks (Michael)
> - objtool changes (Josh)
>
>
>[1] https://lkml.kernel.org/r/20250410154556.GB9003@noisy.programming.kicks-ass.net
>[2] https://lkml.kernel.org/r/20250410194334.GA3248459@google.com
>

We do have a table of handlers higher up in the stack in the form of the dispatch tables for FRED. They don't in general even need the assembly entry stubs, either.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 13/13] objtool: Validate kCFI calls
  2025-04-30 11:07 ` [PATCH v2 13/13] objtool: Validate kCFI calls Peter Zijlstra
@ 2025-04-30 15:59   ` Josh Poimboeuf
  2025-04-30 19:03     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2025-04-30 15:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On Wed, Apr 30, 2025 at 01:07:47PM +0200, Peter Zijlstra wrote:
> +	case ANNOTYPE_NOCFI:
> +		sym = insn->sym;
> +		if (!sym) {
> +			ERROR_INSN(insn, "dodgy NOCFI annotation");
> +			break;

return -1;

> +	/*
> +	 * kCFI call sites look like:
> +	 *
> +	 *     movl $(-0x12345678), %r10d
> +	 *     addl -4(%r11), %r10d
> +	 *     jz 1f
> +	 *     ud2
> +	 *  1: cs call __x86_indirect_thunk_r11
> +	 *
> +	 * Verify all indirect calls are kCFI adorned by checking for the
> +	 * UD2. Notably, doing __nocfi calls to regular (cfi) functions is
> +	 * broken.
> +	 */
> +	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
> +		struct symbol *sym = insn->sym;
> +
> +		if (sym && sym->type == STT_FUNC && !sym->nocfi) {
> +			struct instruction *prev =
> +				prev_insn_same_sym(file, insn);
> +
> +			if (!prev || prev->type != INSN_BUG) {
> +				WARN_INSN(insn, "no-cfi indirect call!");
> +				warnings++;

Do we not care about indirect calls from !STT_FUNC?

-- 
Josh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 02/13] x86/kvm/emulate: Introduce COP1
  2025-04-30 11:07 ` [PATCH v2 02/13] x86/kvm/emulate: Introduce COP1 Peter Zijlstra
@ 2025-04-30 16:19   ` Josh Poimboeuf
  2025-04-30 19:05     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2025-04-30 16:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On Wed, Apr 30, 2025 at 01:07:36PM +0200, Peter Zijlstra wrote:
> +++ b/arch/x86/kvm/emulate.c
> @@ -267,11 +267,56 @@ static void invalidate_registers(struct
>  		     X86_EFLAGS_PF|X86_EFLAGS_CF)
>  
>  #ifdef CONFIG_X86_64
> -#define ON64(x) x
> +#define ON64(x...) x
>  #else
>  #define ON64(x)

Doesn't the 32-bit version need to be 

  #define ON64(x...)

since it now accepts multiple "args"?

> -FASTOP1(not);
> -FASTOP1(neg);
> -FASTOP1(inc);
> -FASTOP1(dec);
> +COP1(not);
> +COP1(neg);
> +COP1(inc);
> +COP1(dec);

I assume COP stands for "C op", but that will never be obvious.

s/COP/EMULATE/?

-- 
Josh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 13/13] objtool: Validate kCFI calls
  2025-04-30 15:59   ` Josh Poimboeuf
@ 2025-04-30 19:03     ` Peter Zijlstra
  2025-05-01 15:56       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 19:03 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On Wed, Apr 30, 2025 at 08:59:53AM -0700, Josh Poimboeuf wrote:
> On Wed, Apr 30, 2025 at 01:07:47PM +0200, Peter Zijlstra wrote:
> > +	case ANNOTYPE_NOCFI:
> > +		sym = insn->sym;
> > +		if (!sym) {
> > +			ERROR_INSN(insn, "dodgy NOCFI annotation");
> > +			break;
> 
> return -1;

Oh right.

> > +	/*
> > +	 * kCFI call sites look like:
> > +	 *
> > +	 *     movl $(-0x12345678), %r10d
> > +	 *     addl -4(%r11), %r10d
> > +	 *     jz 1f
> > +	 *     ud2
> > +	 *  1: cs call __x86_indirect_thunk_r11
> > +	 *
> > +	 * Verify all indirect calls are kCFI adorned by checking for the
> > +	 * UD2. Notably, doing __nocfi calls to regular (cfi) functions is
> > +	 * broken.
> > +	 */
> > +	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
> > +		struct symbol *sym = insn->sym;
> > +
> > +		if (sym && sym->type == STT_FUNC && !sym->nocfi) {
> > +			struct instruction *prev =
> > +				prev_insn_same_sym(file, insn);
> > +
> > +			if (!prev || prev->type != INSN_BUG) {
> > +				WARN_INSN(insn, "no-cfi indirect call!");
> > +				warnings++;
> 
> Do we not care about indirect calls from !STT_FUNC?

Let me try, see what happens.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 02/13] x86/kvm/emulate: Introduce COP1
  2025-04-30 16:19   ` Josh Poimboeuf
@ 2025-04-30 19:05     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 19:05 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On Wed, Apr 30, 2025 at 09:19:38AM -0700, Josh Poimboeuf wrote:
> On Wed, Apr 30, 2025 at 01:07:36PM +0200, Peter Zijlstra wrote:
> > +++ b/arch/x86/kvm/emulate.c
> > @@ -267,11 +267,56 @@ static void invalidate_registers(struct
> >  		     X86_EFLAGS_PF|X86_EFLAGS_CF)
> >  
> >  #ifdef CONFIG_X86_64
> > -#define ON64(x) x
> > +#define ON64(x...) x
> >  #else
> >  #define ON64(x)
> 
> Doesn't the 32-bit version need to be 
> 
>   #define ON64(x...)
> 
> since it now accepts multiple "args"?

Right, so far the robot hasn't complained, but yeah, consistency would
demand this :-)

> > -FASTOP1(not);
> > -FASTOP1(neg);
> > -FASTOP1(inc);
> > -FASTOP1(dec);
> > +COP1(not);
> > +COP1(neg);
> > +COP1(inc);
> > +COP1(dec);
> 
> I assume COP stands for "C op", but that will never be obvious.

Aww :-)

Right before sending I wondered if EM_ASM_*() would be a better
namespace.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-04-30 14:24 ` [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions H. Peter Anvin
@ 2025-04-30 19:06   ` Peter Zijlstra
  2025-05-01 10:30     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-04-30 19:06 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda,
	xin

On Wed, Apr 30, 2025 at 07:24:15AM -0700, H. Peter Anvin wrote:

> >KVM has another; the VMX interrupt injection stuff calls the IDT handler
> >directly.  Is there an alternative? Can we keep a table of Linux functions
> >slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?

> We do have a table of handlers higher up in the stack in the form of
> the dispatch tables for FRED. They don't in general even need the
> assembly entry stubs, either.

Oh, right. I'll go have a look at those.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v2 11/13] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-30 11:07 ` [PATCH v2 11/13] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
@ 2025-05-01  2:36   ` Michael Kelley
  0 siblings, 0 replies; 52+ messages in thread
From: Michael Kelley @ 2025-05-01  2:36 UTC (permalink / raw)
  To: Peter Zijlstra, x86@kernel.org
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org,
	kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org,
	jpoimboe@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-efi@vger.kernel.org, samitolvanen@google.com,
	ojeda@kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, April 30, 2025 4:08 AM
> 
> What used to be a simple few instructions has turned into a giant mess
> (for x86_64). Not only does it use static_branch wrong, it mixes it
> with dynamic branches for no apparent reason.
> 
> Notably it uses static_branch through an out-of-line function call,
> which completely defeats the purpose, since instead of a simple
> JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is
> absolutely idiotic.
> 
> Add to that a dynamic test of hyperv_paravisor_present, something
> which is set once and never changed.
> 
> Replace all this idiocy with a single direct function call to the
> right hypercall variant.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

I've done these tests on Hyper-V VMs with this patch series. My focus
is the Hyper-V changes in Patches 11 and 12, not the other changes.

* Normal VM boot and basic smoke test
* TDX and SEV-SNP VMs boot and basic smoke test. These VMs have
   a paravisor
* Normal VM taking a panic and running the kdump kernel
* Normal VM suspending for hibernation, then resuming from
   hibernation
* Verified that IBT is enabled in a normal VM. It's not offered in a TDX
   VM on Hyper-V when a paravisor is used. I don't know about the case
   without a paravisor.
* Building a 64-bit kernel with and without CONFIG_AMD_MEM_ENCRYPT
   and CONFIG_INTEL_TDX_GUEST.
* Building a 32-bit kernel (but I did not try to run it)

TDX and SEV-SNP VMs without a paravisor are not tested, so updating
the static call, and the new direct call path, has not been tested for
TDX and SNP hypercalls. I don't have a hardware environment where I
can test without a paravisor.

Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>

> ---
>  arch/x86/hyperv/hv_init.c       |   20 +++++
>  arch/x86/hyperv/ivm.c           |   15 ++++
>  arch/x86/include/asm/mshyperv.h |  137 +++++++++++-----------------------------
>  arch/x86/kernel/cpu/mshyperv.c  |   19 +++--
>  4 files changed, 89 insertions(+), 102 deletions(-)
> 
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -35,7 +35,27 @@
>  #include <linux/highmem.h>
> 
>  void *hv_hypercall_pg;
> +
> +#ifdef CONFIG_X86_64
> +u64 hv_std_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	u64 hv_status;
> +
> +	if (!hv_hypercall_pg)
> +		return U64_MAX;
> +
> +	register u64 __r8 asm("r8") = param2;
> +	asm volatile (CALL_NOSPEC
> +		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +		        "+c" (control), "+d" (param1), "+r" (__r8)
> +		      : THUNK_TARGET(hv_hypercall_pg)
> +		      : "cc", "memory", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}
> +#else
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> +#endif
> 
>  union hv_ghcb * __percpu *hv_ghcb_pg;
> 
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -376,9 +376,23 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon
>  	return ret;
>  }
> 
> +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	u64 hv_status;
> +
> +	register u64 __r8 asm("r8") = param2;
> +	asm volatile("vmmcall"
> +		     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +		       "+c" (control), "+d" (param1), "+r" (__r8)
> +		     : : "cc", "memory", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}
> +
>  #else
>  static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
>  static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
> +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) { return U64_MAX; }
>  #endif /* CONFIG_AMD_MEM_ENCRYPT */
> 
>  #ifdef CONFIG_INTEL_TDX_GUEST
> @@ -428,6 +442,7 @@ u64 hv_tdx_hypercall(u64 control, u64 pa
>  #else
>  static inline void hv_tdx_msr_write(u64 msr, u64 value) {}
>  static inline void hv_tdx_msr_read(u64 msr, u64 *value) {}
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2) { return U64_MAX; }
>  #endif /* CONFIG_INTEL_TDX_GUEST */
> 
>  #if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST)
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -6,6 +6,7 @@
>  #include <linux/nmi.h>
>  #include <linux/msi.h>
>  #include <linux/io.h>
> +#include <linux/static_call.h>
>  #include <asm/nospec-branch.h>
>  #include <asm/paravirt.h>
>  #include <hyperv/hvhdk.h>
> @@ -38,16 +39,21 @@ static inline unsigned char hv_get_nmi_r
>  	return 0;
>  }
> 
> -#if IS_ENABLED(CONFIG_HYPERV)
> -extern bool hyperv_paravisor_present;
> +extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
> +extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2);
> 
> +#if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
> 
>  extern union hv_ghcb * __percpu *hv_ghcb_pg;
> 
>  bool hv_isolation_type_snp(void);
>  bool hv_isolation_type_tdx(void);
> -u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +
> +#ifdef CONFIG_X86_64
> +DECLARE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
> +#endif
> 
>  /*
>   * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> @@ -64,37 +70,15 @@ static inline u64 hv_do_hypercall(u64 co
>  {
>  	u64 input_address = input ? virt_to_phys(input) : 0;
>  	u64 output_address = output ? virt_to_phys(output) : 0;
> -	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
> -		return hv_tdx_hypercall(control, input_address, output_address);
> -
> -	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
> -		__asm__ __volatile__("mov %[output_address], %%r8\n"
> -				     "vmmcall"
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input_address)
> -				     : [output_address] "r" (output_address)
> -				     : "cc", "memory", "r8", "r9", "r10", "r11");
> -		return hv_status;
> -	}
> -
> -	if (!hv_hypercall_pg)
> -		return U64_MAX;
> -
> -	__asm__ __volatile__("mov %[output_address], %%r8\n"
> -			     CALL_NOSPEC
> -			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -			       "+c" (control), "+d" (input_address)
> -			     : [output_address] "r" (output_address),
> -			       THUNK_TARGET(hv_hypercall_pg)
> -			     : "cc", "memory", "r8", "r9", "r10", "r11");
> +	return static_call_mod(hv_hypercall)(control, input_address, output_address);
>  #else
>  	u32 input_address_hi = upper_32_bits(input_address);
>  	u32 input_address_lo = lower_32_bits(input_address);
>  	u32 output_address_hi = upper_32_bits(output_address);
>  	u32 output_address_lo = lower_32_bits(output_address);
> +	u64 hv_status;
> 
>  	if (!hv_hypercall_pg)
>  		return U64_MAX;
> @@ -107,8 +91,8 @@ static inline u64 hv_do_hypercall(u64 co
>  			       "D"(output_address_hi), "S"(output_address_lo),
>  			       THUNK_TARGET(hv_hypercall_pg)
>  			     : "cc", "memory");
> -#endif /* !x86_64 */
>  	return hv_status;
> +#endif /* !x86_64 */
>  }
> 
>  /* Hypercall to the L0 hypervisor */
> @@ -120,41 +104,23 @@ static inline u64 hv_do_nested_hypercall
>  /* Fast hypercall with 8 bytes of input and no output */
>  static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
>  {
> -	u64 hv_status;
> -
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
> -		return hv_tdx_hypercall(control, input1, 0);
> -
> -	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
> -		__asm__ __volatile__(
> -				"vmmcall"
> -				: "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				"+c" (control), "+d" (input1)
> -				:: "cc", "r8", "r9", "r10", "r11");
> -	} else {
> -		__asm__ __volatile__(CALL_NOSPEC
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input1)
> -				     : THUNK_TARGET(hv_hypercall_pg)
> -				     : "cc", "r8", "r9", "r10", "r11");
> -	}
> +	return static_call_mod(hv_hypercall)(control, input1, 0);
>  #else
> -	{
> -		u32 input1_hi = upper_32_bits(input1);
> -		u32 input1_lo = lower_32_bits(input1);
> -
> -		__asm__ __volatile__ (CALL_NOSPEC
> -				      : "=A"(hv_status),
> -					"+c"(input1_lo),
> -					ASM_CALL_CONSTRAINT
> -				      :	"A" (control),
> -					"b" (input1_hi),
> -					THUNK_TARGET(hv_hypercall_pg)
> -				      : "cc", "edi", "esi");
> -	}
> -#endif
> +	u32 input1_hi = upper_32_bits(input1);
> +	u32 input1_lo = lower_32_bits(input1);
> +	u64 hv_status;
> +
> +	__asm__ __volatile__ (CALL_NOSPEC
> +			      : "=A"(hv_status),
> +			      "+c"(input1_lo),
> +			      ASM_CALL_CONSTRAINT
> +			      :	"A" (control),
> +			      "b" (input1_hi),
> +			      THUNK_TARGET(hv_hypercall_pg)
> +			      : "cc", "edi", "esi");
>  	return hv_status;
> +#endif
>  }
> 
>  static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> @@ -174,45 +140,24 @@ static inline u64 hv_do_fast_nested_hype
>  /* Fast hypercall with 16 bytes of input */
>  static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
>  {
> -	u64 hv_status;
> -
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
> -		return hv_tdx_hypercall(control, input1, input2);
> -
> -	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
> -		__asm__ __volatile__("mov %[input2], %%r8\n"
> -				     "vmmcall"
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input1)
> -				     : [input2] "r" (input2)
> -				     : "cc", "r8", "r9", "r10", "r11");
> -	} else {
> -		__asm__ __volatile__("mov %[input2], %%r8\n"
> -				     CALL_NOSPEC
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input1)
> -				     : [input2] "r" (input2),
> -				       THUNK_TARGET(hv_hypercall_pg)
> -				     : "cc", "r8", "r9", "r10", "r11");
> -	}
> +	return static_call_mod(hv_hypercall)(control, input1, input2);
>  #else
> -	{
> -		u32 input1_hi = upper_32_bits(input1);
> -		u32 input1_lo = lower_32_bits(input1);
> -		u32 input2_hi = upper_32_bits(input2);
> -		u32 input2_lo = lower_32_bits(input2);
> -
> -		__asm__ __volatile__ (CALL_NOSPEC
> -				      : "=A"(hv_status),
> -					"+c"(input1_lo), ASM_CALL_CONSTRAINT
> -				      :	"A" (control), "b" (input1_hi),
> -					"D"(input2_hi), "S"(input2_lo),
> -					THUNK_TARGET(hv_hypercall_pg)
> -				      : "cc");
> -	}
> -#endif
> +	u32 input1_hi = upper_32_bits(input1);
> +	u32 input1_lo = lower_32_bits(input1);
> +	u32 input2_hi = upper_32_bits(input2);
> +	u32 input2_lo = lower_32_bits(input2);
> +	u64 hv_status;
> +
> +	__asm__ __volatile__ (CALL_NOSPEC
> +			      : "=A"(hv_status),
> +			      "+c"(input1_lo), ASM_CALL_CONSTRAINT
> +			      :	"A" (control), "b" (input1_hi),
> +			      "D"(input2_hi), "S"(input2_lo),
> +			      THUNK_TARGET(hv_hypercall_pg)
> +			      : "cc");
>  	return hv_status;
> +#endif
>  }
> 
>  static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -37,10 +37,6 @@
>  bool hv_nested;
>  struct ms_hyperv_info ms_hyperv;
> 
> -/* Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h */
> -bool hyperv_paravisor_present __ro_after_init;
> -EXPORT_SYMBOL_GPL(hyperv_paravisor_present);
> -
>  #if IS_ENABLED(CONFIG_HYPERV)
>  static inline unsigned int hv_get_nested_msr(unsigned int reg)
>  {
> @@ -287,8 +283,18 @@ static void __init x86_setup_ops_for_tsc
>  	old_restore_sched_clock_state = x86_platform.restore_sched_clock_state;
>  	x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state;
>  }
> +
> +#ifdef CONFIG_X86_64
> +DEFINE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
> +EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall);
> +#define hypercall_update(hc) static_call_update(hv_hypercall, hc)
> +#endif
>  #endif /* CONFIG_HYPERV */
> 
> +#ifndef hypercall_update
> +#define hypercall_update(hc) (void)hc
> +#endif
> +
>  static uint32_t  __init ms_hyperv_platform(void)
>  {
>  	u32 eax;
> @@ -483,14 +489,14 @@ static void __init ms_hyperv_init_platfo
>  			ms_hyperv.shared_gpa_boundary =
>  				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> 
> -		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
> -
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> 
> 
>  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>  			static_branch_enable(&isolation_type_snp);
> +			if (!ms_hyperv.paravisor_present)
> +				hypercall_update(hv_snp_hypercall);
>  		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
>  			static_branch_enable(&isolation_type_tdx);
> 
> @@ -498,6 +504,7 @@ static void __init ms_hyperv_init_platfo
>  			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> 
>  			if (!ms_hyperv.paravisor_present) {
> +				hypercall_update(hv_tdx_hypercall);
>  				/*
>  				 * Mark the Hyper-V TSC page feature as disabled
>  				 * in a TDX VM without paravisor so that the
> 
> 


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [PATCH v2 12/13] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-30 11:07 ` [PATCH v2 12/13] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
@ 2025-05-01  2:36   ` Michael Kelley
  2025-05-01  8:59     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Michael Kelley @ 2025-05-01  2:36 UTC (permalink / raw)
  To: Peter Zijlstra, x86@kernel.org
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org,
	kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org,
	jpoimboe@kernel.org, linux-hyperv@vger.kernel.org,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	linux-efi@vger.kernel.org, samitolvanen@google.com,
	ojeda@kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, April 30, 2025 4:08 AM
> 
> Instead of using an indirect call to the hypercall page, use a direct
> call instead. This avoids all CFI problems, including the one where
> the hypercall page doesn't have IBT on.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/hyperv/hv_init.c |   60 +++++++++++++++++++++++-----------------------
>  1 file changed, 30 insertions(+), 30 deletions(-)
> 
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,23 +37,41 @@
>  void *hv_hypercall_pg;
> 
>  #ifdef CONFIG_X86_64
> +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
> +{
> +	return U64_MAX;
> +}
> +
> +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail);
> +
>  u64 hv_std_hypercall(u64 control, u64 param1, u64 param2)
>  {
>  	u64 hv_status;
> 
> -	if (!hv_hypercall_pg)
> -		return U64_MAX;
> -
>  	register u64 __r8 asm("r8") = param2;
> -	asm volatile (CALL_NOSPEC
> +	asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall)
>  		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
>  		        "+c" (control), "+d" (param1), "+r" (__r8)
> -		      : THUNK_TARGET(hv_hypercall_pg)
> -		      : "cc", "memory", "r9", "r10", "r11");
> +		      : : "cc", "memory", "r9", "r10", "r11");
> 
>  	return hv_status;
>  }
> +
> +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2);
> +
> +static inline void hv_set_hypercall_pg(void *ptr)
> +{
> +	hv_hypercall_pg = ptr;
> +
> +	if (!ptr)
> +		ptr = &__hv_hyperfail;
> +	static_call_update(__hv_hypercall, (hv_hypercall_f)ptr);
> +}
>  #else
> +static inline void hv_set_hypercall_pg(void *ptr)
> +{
> +	hv_hypercall_pg = ptr;
> +}
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
>  #endif
> 
> @@ -348,7 +366,7 @@ static int hv_suspend(void)
>  	 * pointer is restored on resume.
>  	 */
>  	hv_hypercall_pg_saved = hv_hypercall_pg;
> -	hv_hypercall_pg = NULL;
> +	hv_set_hypercall_pg(NULL);
> 
>  	/* Disable the hypercall page in the hypervisor */
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -374,7 +392,7 @@ static void hv_resume(void)
>  		vmalloc_to_pfn(hv_hypercall_pg_saved);
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> 
> -	hv_hypercall_pg = hv_hypercall_pg_saved;
> +	hv_set_hypercall_pg(hv_hypercall_pg_saved);
>  	hv_hypercall_pg_saved = NULL;
> 
>  	/*
> @@ -528,8 +546,8 @@ void __init hyperv_init(void)
>  	if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present)
>  		goto skip_hypercall_pg_init;
> 
> -	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
> -			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
> +	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
> +			MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX,

Curiosity question (which I forgot ask about in v1):  Is this change so that the
hypercall page kernel address is "close enough" for the direct call to work from
built-in code and from module code?  Or is there some other reason?

>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>  			__builtin_return_address(0));
>  	if (hv_hypercall_pg == NULL)
> @@ -567,27 +585,9 @@ void __init hyperv_init(void)
>  		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	}
> 
> -skip_hypercall_pg_init:
> -	/*
> -	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> -	 * in that there's no ENDBR64 instruction at the entry to the
> -	 * hypercall page. Because hypercalls are invoked via an indirect call
> -	 * to the hypercall page, all hypercall attempts fail when IBT is
> -	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> -	 *
> -	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> -	 * page, so if future Linux kernel versions enable IBT for 32-bit
> -	 * builds, additional hypercall page hackery will be required here
> -	 * to provide an ENDBR32.
> -	 */
> -#ifdef CONFIG_X86_KERNEL_IBT
> -	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> -	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> -		setup_clear_cpu_cap(X86_FEATURE_IBT);
> -		pr_warn("Disabling IBT because of Hyper-V bug\n");
> -	}
> -#endif

Nit: With this IBT code removed, the #include <asm/ibt.h> at the top
of this source code file should be removed.

> +	hv_set_hypercall_pg(hv_hypercall_pg);
> 
> +skip_hypercall_pg_init:
>  	/*
>  	 * hyperv_init() is called before LAPIC is initialized: see
>  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> 
> 

The nit notwithstanding,

Reviewed-by: Michael Kelley <mhklinux@outlook.com>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 12/13] x86_64,hyperv: Use direct call to hypercall-page
  2025-05-01  2:36   ` Michael Kelley
@ 2025-05-01  8:59     ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-01  8:59 UTC (permalink / raw)
  To: Michael Kelley
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, seanjc@google.com, pbonzini@redhat.com,
	ardb@kernel.org, kees@kernel.org, Arnd Bergmann,
	gregkh@linuxfoundation.org, jpoimboe@kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

On Thu, May 01, 2025 at 02:36:26AM +0000, Michael Kelley wrote:
> From: Peter Zijlstra <peterz@infradead.org> Sent: Wednesday, April 30, 2025 4:08 AM
> > @@ -528,8 +546,8 @@ void __init hyperv_init(void)
> >  	if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present)
> >  		goto skip_hypercall_pg_init;
> > 
> > -	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
> > -			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
> > +	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
> > +			MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX,
> 
> Curiosity question (which I forgot ask about in v1):  Is this change so that the
> hypercall page kernel address is "close enough" for the direct call to work from
> built-in code and from module code?  Or is there some other reason?

No, you nailed it. Because we want to do a direct CALL, the hypercall
page must be in the disp32 range relative to the call site. The module
address space ensures this.

> >  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
> >  			__builtin_return_address(0));
> >  	if (hv_hypercall_pg == NULL)
> > @@ -567,27 +585,9 @@ void __init hyperv_init(void)
> >  		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> >  	}
> > 
> > -skip_hypercall_pg_init:
> > -	/*
> > -	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> > -	 * in that there's no ENDBR64 instruction at the entry to the
> > -	 * hypercall page. Because hypercalls are invoked via an indirect call
> > -	 * to the hypercall page, all hypercall attempts fail when IBT is
> > -	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> > -	 *
> > -	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> > -	 * page, so if future Linux kernel versions enable IBT for 32-bit
> > -	 * builds, additional hypercall page hackery will be required here
> > -	 * to provide an ENDBR32.
> > -	 */
> > -#ifdef CONFIG_X86_KERNEL_IBT
> > -	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> > -	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> > -		setup_clear_cpu_cap(X86_FEATURE_IBT);
> > -		pr_warn("Disabling IBT because of Hyper-V bug\n");
> > -	}
> > -#endif
> 
> Nit: With this IBT code removed, the #include <asm/ibt.h> at the top
> of this source code file should be removed.

Indeed so.

> 
> > +	hv_set_hypercall_pg(hv_hypercall_pg);
> > 
> > +skip_hypercall_pg_init:
> >  	/*
> >  	 * hyperv_init() is called before LAPIC is initialized: see
> >  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> > 
> > 
> 
> The nit notwithstanding,
> 
> Reviewed-by: Michael Kelley <mhklinux@outlook.com>

Thanks!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-04-30 19:06   ` Peter Zijlstra
@ 2025-05-01 10:30     ` Peter Zijlstra
  2025-05-01 15:38       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-01 10:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda,
	xin

On Wed, Apr 30, 2025 at 09:06:00PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 30, 2025 at 07:24:15AM -0700, H. Peter Anvin wrote:
> 
> > >KVM has another; the VMX interrupt injection stuff calls the IDT handler
> > >directly.  Is there an alternative? Can we keep a table of Linux functions
> > >slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?
> 
> > We do have a table of handlers higher up in the stack in the form of
> > the dispatch tables for FRED. They don't in general even need the
> > assembly entry stubs, either.
> 
> Oh, right. I'll go have a look at those.

Right, so perhaps the easiest way around this is to setup the FRED entry
tables unconditionally, have VMX mandate CONFIG_FRED and then have it
always use the FRED entry points.

Let me see how ugly that gets.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 10:30     ` Peter Zijlstra
@ 2025-05-01 15:38       ` Peter Zijlstra
  2025-05-01 18:30         ` Sean Christopherson
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-01 15:38 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda,
	xin

On Thu, May 01, 2025 at 12:30:38PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 30, 2025 at 09:06:00PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 30, 2025 at 07:24:15AM -0700, H. Peter Anvin wrote:
> > 
> > > >KVM has another; the VMX interrupt injection stuff calls the IDT handler
> > > >directly.  Is there an alternative? Can we keep a table of Linux functions
> > > >slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?
> > 
> > > We do have a table of handlers higher up in the stack in the form of
> > > the dispatch tables for FRED. They don't in general even need the
> > > assembly entry stubs, either.
> > 
> > Oh, right. I'll go have a look at those.
> 
> Right, so perhaps the easiest way around this is to setup the FRED entry
> tables unconditionally, have VMX mandate CONFIG_FRED and then have it
> always use the FRED entry points.
> 
> Let me see how ugly that gets.

Something like so... except this is broken. Its reporting spurious
interrupts on vector 0x00, so something is buggered passing that vector
along.

I'll stare more at it more another time.

---

diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c
index f004a4dc74c2..d2322e3723f5 100644
--- a/arch/x86/entry/entry_fred.c
+++ b/arch/x86/entry/entry_fred.c
@@ -156,9 +156,8 @@ void __init fred_complete_exception_setup(void)
 	fred_setup_done = true;
 }
 
-static noinstr void fred_extint(struct pt_regs *regs)
+noinstr void fred_extint(struct pt_regs *regs, unsigned int vector)
 {
-	unsigned int vector = regs->fred_ss.vector;
 	unsigned int index = array_index_nospec(vector - FIRST_SYSTEM_VECTOR,
 						NR_SYSTEM_VECTORS);
 
@@ -230,7 +229,7 @@ __visible noinstr void fred_entry_from_user(struct pt_regs *regs)
 
 	switch (regs->fred_ss.type) {
 	case EVENT_TYPE_EXTINT:
-		return fred_extint(regs);
+		return fred_extint(regs, regs->fred_ss.vector);
 	case EVENT_TYPE_NMI:
 		if (likely(regs->fred_ss.vector == X86_TRAP_NMI))
 			return fred_exc_nmi(regs);
@@ -262,7 +261,7 @@ __visible noinstr void fred_entry_from_kernel(struct pt_regs *regs)
 
 	switch (regs->fred_ss.type) {
 	case EVENT_TYPE_EXTINT:
-		return fred_extint(regs);
+		return fred_extint(regs, regs->fred_ss.vector);
 	case EVENT_TYPE_NMI:
 		if (likely(regs->fred_ss.vector == X86_TRAP_NMI))
 			return fred_exc_nmi(regs);
@@ -286,7 +285,7 @@ __visible noinstr void __fred_entry_from_kvm(struct pt_regs *regs)
 {
 	switch (regs->fred_ss.type) {
 	case EVENT_TYPE_EXTINT:
-		return fred_extint(regs);
+		return fred_extint(regs, regs->fred_ss.vector);
 	case EVENT_TYPE_NMI:
 		return fred_exc_nmi(regs);
 	default:
diff --git a/arch/x86/include/asm/fred.h b/arch/x86/include/asm/fred.h
index 2a29e5216881..e6ec5e1a0f54 100644
--- a/arch/x86/include/asm/fred.h
+++ b/arch/x86/include/asm/fred.h
@@ -66,6 +66,8 @@ void asm_fred_entrypoint_user(void);
 void asm_fred_entrypoint_kernel(void);
 void asm_fred_entry_from_kvm(struct fred_ss);
 
+void fred_extint(struct pt_regs *regs, unsigned int vector);
+
 __visible void fred_entry_from_user(struct pt_regs *regs);
 __visible void fred_entry_from_kernel(struct pt_regs *regs);
 __visible void __fred_entry_from_kvm(struct pt_regs *regs);
diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index a4ec27c67988..1de8c8b6d4c6 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -468,9 +468,9 @@ static inline void fred_install_sysvec(unsigned int vector, const idtentry_t fun
 #endif
 
 #define sysvec_install(vector, function) {				\
-	if (cpu_feature_enabled(X86_FEATURE_FRED))			\
+	if (IS_ENABLED(CONFIG_X86_FRED))				\
 		fred_install_sysvec(vector, function);			\
-	else								\
+	if (!cpu_feature_enabled(X86_FEATURE_FRED))			\
 		idt_install_sysvec(vector, asm_##function);		\
 }
 
@@ -647,6 +647,9 @@ DECLARE_IDTENTRY_RAW(X86_TRAP_MC,	xenpv_exc_machine_check);
  * to avoid more ifdeffery.
  */
 DECLARE_IDTENTRY(X86_TRAP_NMI,		exc_nmi_kvm_vmx);
+#ifdef CONFIG_X86_FRED
+DECLARE_IDTENTRY_ERRORCODE(X86_TRAP_NMI, exc_fred_kvm_vmx);
+#endif
 #endif
 
 DECLARE_IDTENTRY_NMI(X86_TRAP_NMI,	exc_nmi);
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index f79c5edc0b89..654f8e835417 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -97,9 +97,9 @@ void __init native_init_IRQ(void)
 	/* Execute any quirks before the call gates are initialised: */
 	x86_init.irqs.pre_vector_init();
 
-	if (cpu_feature_enabled(X86_FEATURE_FRED))
+	if (IS_ENABLED(CONFIG_X86_FRED))
 		fred_complete_exception_setup();
-	else
+	if (!cpu_feature_enabled(X86_FEATURE_FRED))
 		idt_setup_apic_and_irq_gates();
 
 	lapic_assign_system_vectors();
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index 9a95d00f1423..4690787bdd13 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -613,8 +613,17 @@ DEFINE_IDTENTRY_RAW(exc_nmi_kvm_vmx)
 {
 	exc_nmi(regs);
 }
+#ifdef CONFIG_X86_FRED
+DEFINE_IDTENTRY_RAW_ERRORCODE(exc_fred_kvm_vmx)
+{
+	fred_extint(regs, error_code);
+}
+#endif
 #if IS_MODULE(CONFIG_KVM_INTEL)
 EXPORT_SYMBOL_GPL(asm_exc_nmi_kvm_vmx);
+#ifdef CONFIG_X86_FRED
+EXPORT_SYMBOL_GPL(asm_exc_fred_kvm_vmx);
+#endif
 #endif
 #endif
 
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index fe8ea8c097de..9db38ae3450e 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -95,6 +95,7 @@ config KVM_SW_PROTECTED_VM
 config KVM_INTEL
 	tristate "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
+	select X86_FRED if X86_64
 	help
 	  Provides support for KVM on processors equipped with Intel's VT
 	  extensions, a.k.a. Virtual Machine Extensions (VMX).
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 090393bf829d..3aaacbbedf5c 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -31,7 +31,7 @@
 #define VCPU_R15	__VCPU_REGS_R15 * WORD_SIZE
 #endif
 
-.macro VMX_DO_EVENT_IRQOFF call_insn call_target
+.macro VMX_DO_EVENT_IRQOFF call_insn call_target has_error=0
 	/*
 	 * Unconditionally create a stack frame, getting the correct RSP on the
 	 * stack (for x86-64) would take two instructions anyways, and RBP can
@@ -40,6 +40,8 @@
 	push %_ASM_BP
 	mov %_ASM_SP, %_ASM_BP
 
+	UNWIND_HINT_SAVE
+
 #ifdef CONFIG_X86_64
 	/*
 	 * Align RSP to a 16-byte boundary (to emulate CPU behavior) before
@@ -51,7 +53,17 @@
 #endif
 	pushf
 	push $__KERNEL_CS
+
+	.if \has_error
+	lea 1f(%rip), %rax
+	push %rax
+	UNWIND_HINT_IRET_REGS
+	push %_ASM_ARG1
+	.endif
+
 	\call_insn \call_target
+1:
+	UNWIND_HINT_RESTORE
 
 	/*
 	 * "Restore" RSP from RBP, even though IRET has already unwound RSP to
@@ -363,10 +375,9 @@ SYM_FUNC_END(vmread_error_trampoline)
 .section .text, "ax"
 
 SYM_FUNC_START(vmx_do_interrupt_irqoff)
-	/*
-	 * Calling an IDT gate directly; annotate away the CFI concern for now.
-	 * Should be fixed if possible.
-	 */
-	ANNOTATE_NOCFI_SYM
+#ifdef CONFIG_X86_FRED
+	VMX_DO_EVENT_IRQOFF jmp asm_exc_fred_kvm_vmx has_error=1
+#else
 	VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
+#endif
 SYM_FUNC_END(vmx_do_interrupt_irqoff)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5c5766467a61..7ccd2f66d55d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7034,6 +7034,7 @@ void vmx_apicv_pre_state_restore(struct kvm_vcpu *vcpu)
 }
 
 void vmx_do_interrupt_irqoff(unsigned long entry);
+
 void vmx_do_nmi_irqoff(void);
 
 static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
@@ -7080,6 +7081,8 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
 	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
 	if (cpu_feature_enabled(X86_FEATURE_FRED))
 		fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
+	else if (IS_ENABLED(CONFIG_FRED))
+		vmx_do_interrupt_irqoff(vector);
 	else
 		vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
 	kvm_after_interrupt(vcpu);

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 13/13] objtool: Validate kCFI calls
  2025-04-30 19:03     ` Peter Zijlstra
@ 2025-05-01 15:56       ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-01 15:56 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On Wed, Apr 30, 2025 at 09:03:29PM +0200, Peter Zijlstra wrote:

> > > +	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
> > > +		struct symbol *sym = insn->sym;
> > > +
> > > +		if (sym && sym->type == STT_FUNC && !sym->nocfi) {
> > > +			struct instruction *prev =
> > > +				prev_insn_same_sym(file, insn);
> > > +
> > > +			if (!prev || prev->type != INSN_BUG) {
> > > +				WARN_INSN(insn, "no-cfi indirect call!");
> > > +				warnings++;
> > 
> > Do we not care about indirect calls from !STT_FUNC?

I extended to also cover STT_NOTYPE, no additional warns.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 15:38       ` Peter Zijlstra
@ 2025-05-01 18:30         ` Sean Christopherson
  2025-05-01 18:42           ` H. Peter Anvin
                             ` (3 more replies)
  0 siblings, 4 replies; 52+ messages in thread
From: Sean Christopherson @ 2025-05-01 18:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Thu, May 01, 2025, Peter Zijlstra wrote:
> On Thu, May 01, 2025 at 12:30:38PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 30, 2025 at 09:06:00PM +0200, Peter Zijlstra wrote:
> > > On Wed, Apr 30, 2025 at 07:24:15AM -0700, H. Peter Anvin wrote:
> > > 
> > > > >KVM has another; the VMX interrupt injection stuff calls the IDT handler
> > > > >directly.  Is there an alternative? Can we keep a table of Linux functions
> > > > >slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?
> > > 
> > > > We do have a table of handlers higher up in the stack in the form of
> > > > the dispatch tables for FRED. They don't in general even need the
> > > > assembly entry stubs, either.
> > > 
> > > Oh, right. I'll go have a look at those.
> > 
> > Right, so perhaps the easiest way around this is to setup the FRED entry
> > tables unconditionally, have VMX mandate CONFIG_FRED and then have it
> > always use the FRED entry points.
> > 
> > Let me see how ugly that gets.
> 
> Something like so... except this is broken. Its reporting spurious
> interrupts on vector 0x00, so something is buggered passing that vector
> along.

Uh, aren't you making this way more complex than it needs to be?  IIUC, KVM never
uses the FRED hardware entry points, i.e. the FRED entry tables don't need to be
in place because they'll never be used.  The only bits of code KVM needs is the
__fred_entry_from_kvm() glue.

Lightly tested, but this combo works for IRQs and NMIs on non-FRED hardware.

--
From 664468143109ab7c525c0babeba62195fa4c657e Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 1 May 2025 11:20:29 -0700
Subject: [PATCH 1/2] x86/fred: Play nice with invoking
 asm_fred_entry_from_kvm() on non-FRED hardware

Modify asm_fred_entry_from_kvm() to allow it to be invoked by KVM even
when FRED isn't fully enabled, e.g. when running with CONFIG_X86_FRED=y
on non-FRED hardware.  This will allow forcing KVM to always use the FRED
entry points for 64-bit kernels, which in turn will eliminate a rather
gross non-CFI indirect call that KVM uses to trampoline IRQs by doing IDT
lookups.

When FRED isn't enabled, simply skip ERETS and restore RBP and RSP from
the stack frame prior to doing a "regular" RET back to KVM (in quotes
because of all the RET mitigation horrors).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/entry/entry_64_fred.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 29c5c32c16c3..7aff2f0a285f 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	movq %rsp, %rdi				/* %rdi -> pt_regs */
 	call __fred_entry_from_kvm		/* Call the C entry point */
 	POP_REGS
-	ERETS
+
+	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
 1:
 	/*
 	 * Objtool doesn't understand what ERETS does, this hint tells it that
@@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	 * isn't strictly needed, but it's the simplest form.
 	 */
 	UNWIND_HINT_RESTORE
-	pop %rbp
+	leave
 	RET
 
 SYM_FUNC_END(asm_fred_entry_from_kvm)

base-commit: 45eb29140e68ffe8e93a5471006858a018480a45
-- 
2.49.0.906.g1f30a19c02-goog

From c50fb5a8a46058bbcfdcac0a100c2aa0f7f68f1c Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 1 May 2025 11:10:39 -0700
Subject: [PATCH 2/2] x86/fred: KVM: VMX: Always use FRED for IRQ+NMI when
 CONFIG_X86_FRED=y

Now that FRED provides C-code entry points for handling IRQ and NMI exits,
use the FRED infrastructure for forwarding all such events even if FRED
isn't supported in hardware.  Avoiding the non-FRED assembly trampolines
into the IDT handlers for IRQs eliminates the associated non-CFI indirect
call (KVM performs a CALL by doing a lookup on the IDT using the IRQ
vector).

Force FRED for 64-bit kernels if KVM_INTEL is enabled, as the benefits of
eliminating the IRQ trampoline usage far outwieghts the code overhead for
FRED.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/Kconfig   | 1 +
 arch/x86/kvm/vmx/vmx.c | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2eeffcec5382..712a2ff28ce4 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -95,6 +95,7 @@ config KVM_SW_PROTECTED_VM
 config KVM_INTEL
 	tristate "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
+	select X86_FRED if X86_64
 	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
 	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
 	help
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef2d7208dd20..2ea89985107d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6995,7 +6995,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
 		return;
 
 	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
-	if (cpu_feature_enabled(X86_FEATURE_FRED))
+	if (IS_ENABLED(CONFIG_X86_FRED))
 		fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
 	else
 		vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
@@ -7268,7 +7268,7 @@ noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu)
 		return;
 
 	kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
-	if (cpu_feature_enabled(X86_FEATURE_FRED))
+	if (IS_ENABLED(CONFIG_X86_FRED))
 		fred_entry_from_kvm(EVENT_TYPE_NMI, NMI_VECTOR);
 	else
 		vmx_do_nmi_irqoff();
-- 
2.49.0.906.g1f30a19c02-goog

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (13 preceding siblings ...)
  2025-04-30 14:24 ` [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions H. Peter Anvin
@ 2025-05-01 18:33 ` Paolo Bonzini
  2025-05-02 11:08   ` Peter Zijlstra
  14 siblings, 1 reply; 52+ messages in thread
From: Paolo Bonzini @ 2025-05-01 18:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, seanjc, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On Wed, Apr 30, 2025 at 1:26 PM Peter Zijlstra <peterz@infradead.org> wrote:
> Notably the KVM fastop emulation stuff -- which I've completely rewritten for
> this version -- the generated code doesn't look horrific, but is slightly more
> verbose. I'm running on the assumption that instruction emulation is not super
> performance critical these days of zero VM-exit VMs etc.

It's definitely going to be slower, but I guess it's okay these days.
It's really only somewhat hot with really old processors
(pre-Westmere) and only when running big real mode code.

Paolo

> KVM has another; the VMX interrupt injection stuff calls the IDT handler
> directly.  Is there an alternative? Can we keep a table of Linux functions
> slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?
>
> HyperV hypercall page stuff, which I've previously suggested use direct calls,
> and which I've now converted (after getting properly annoyed with that code).
>
> Also available at:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/core
>
> Changes since v1:
>
>  - complete rewrite of the fastop stuff
>  - HyperV tweaks (Michael)
>  - objtool changes (Josh)
>
>
> [1] https://lkml.kernel.org/r/20250410154556.GB9003@noisy.programming.kicks-ass.net
> [2] https://lkml.kernel.org/r/20250410194334.GA3248459@google.com
>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 18:30         ` Sean Christopherson
@ 2025-05-01 18:42           ` H. Peter Anvin
  2025-05-01 18:59             ` Sean Christopherson
  2025-05-02  5:46           ` Xin Li
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2025-05-01 18:42 UTC (permalink / raw)
  To: Sean Christopherson, Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda,
	xin

On May 1, 2025 11:30:18 AM PDT, Sean Christopherson <seanjc@google.com> wrote:
>On Thu, May 01, 2025, Peter Zijlstra wrote:
>> On Thu, May 01, 2025 at 12:30:38PM +0200, Peter Zijlstra wrote:
>> > On Wed, Apr 30, 2025 at 09:06:00PM +0200, Peter Zijlstra wrote:
>> > > On Wed, Apr 30, 2025 at 07:24:15AM -0700, H. Peter Anvin wrote:
>> > > 
>> > > > >KVM has another; the VMX interrupt injection stuff calls the IDT handler
>> > > > >directly.  Is there an alternative? Can we keep a table of Linux functions
>> > > > >slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?
>> > > 
>> > > > We do have a table of handlers higher up in the stack in the form of
>> > > > the dispatch tables for FRED. They don't in general even need the
>> > > > assembly entry stubs, either.
>> > > 
>> > > Oh, right. I'll go have a look at those.
>> > 
>> > Right, so perhaps the easiest way around this is to setup the FRED entry
>> > tables unconditionally, have VMX mandate CONFIG_FRED and then have it
>> > always use the FRED entry points.
>> > 
>> > Let me see how ugly that gets.
>> 
>> Something like so... except this is broken. Its reporting spurious
>> interrupts on vector 0x00, so something is buggered passing that vector
>> along.
>
>Uh, aren't you making this way more complex than it needs to be?  IIUC, KVM never
>uses the FRED hardware entry points, i.e. the FRED entry tables don't need to be
>in place because they'll never be used.  The only bits of code KVM needs is the
>__fred_entry_from_kvm() glue.
>
>Lightly tested, but this combo works for IRQs and NMIs on non-FRED hardware.
>
>--
>From 664468143109ab7c525c0babeba62195fa4c657e Mon Sep 17 00:00:00 2001
>From: Sean Christopherson <seanjc@google.com>
>Date: Thu, 1 May 2025 11:20:29 -0700
>Subject: [PATCH 1/2] x86/fred: Play nice with invoking
> asm_fred_entry_from_kvm() on non-FRED hardware
>
>Modify asm_fred_entry_from_kvm() to allow it to be invoked by KVM even
>when FRED isn't fully enabled, e.g. when running with CONFIG_X86_FRED=y
>on non-FRED hardware.  This will allow forcing KVM to always use the FRED
>entry points for 64-bit kernels, which in turn will eliminate a rather
>gross non-CFI indirect call that KVM uses to trampoline IRQs by doing IDT
>lookups.
>
>When FRED isn't enabled, simply skip ERETS and restore RBP and RSP from
>the stack frame prior to doing a "regular" RET back to KVM (in quotes
>because of all the RET mitigation horrors).
>
>Signed-off-by: Sean Christopherson <seanjc@google.com>
>---
> arch/x86/entry/entry_64_fred.S | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
>index 29c5c32c16c3..7aff2f0a285f 100644
>--- a/arch/x86/entry/entry_64_fred.S
>+++ b/arch/x86/entry/entry_64_fred.S
>@@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> 	movq %rsp, %rdi				/* %rdi -> pt_regs */
> 	call __fred_entry_from_kvm		/* Call the C entry point */
> 	POP_REGS
>-	ERETS
>+
>+	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
> 1:
> 	/*
> 	 * Objtool doesn't understand what ERETS does, this hint tells it that
>@@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> 	 * isn't strictly needed, but it's the simplest form.
> 	 */
> 	UNWIND_HINT_RESTORE
>-	pop %rbp
>+	leave
> 	RET
> 
> SYM_FUNC_END(asm_fred_entry_from_kvm)
>
>base-commit: 45eb29140e68ffe8e93a5471006858a018480a45

Ok maybe I'm being dense, but what is left other than simply calling __fred_entry_from_kvm() as a normal C function? 

I'm on the go so there might be something in the code I'm missing, but on the surface...?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 18:42           ` H. Peter Anvin
@ 2025-05-01 18:59             ` Sean Christopherson
  2025-05-02  6:12               ` Xin Li
  0 siblings, 1 reply; 52+ messages in thread
From: Sean Christopherson @ 2025-05-01 18:59 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Peter Zijlstra, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Thu, May 01, 2025, H. Peter Anvin wrote:
> On May 1, 2025 11:30:18 AM PDT, Sean Christopherson <seanjc@google.com> wrote:
> >On Thu, May 01, 2025, Peter Zijlstra wrote:
> >> On Thu, May 01, 2025 at 12:30:38PM +0200, Peter Zijlstra wrote:
> >> > On Wed, Apr 30, 2025 at 09:06:00PM +0200, Peter Zijlstra wrote:
> >> > > On Wed, Apr 30, 2025 at 07:24:15AM -0700, H. Peter Anvin wrote:
> >> > > 
> >> > > > >KVM has another; the VMX interrupt injection stuff calls the IDT handler
> >> > > > >directly.  Is there an alternative? Can we keep a table of Linux functions
> >> > > > >slighly higher up the call stack (asm_\cfunc ?) and add CFI to those?
> >> > > 
> >> > > > We do have a table of handlers higher up in the stack in the form of
> >> > > > the dispatch tables for FRED. They don't in general even need the
> >> > > > assembly entry stubs, either.
> >> > > 
> >> > > Oh, right. I'll go have a look at those.
> >> > 
> >> > Right, so perhaps the easiest way around this is to setup the FRED entry
> >> > tables unconditionally, have VMX mandate CONFIG_FRED and then have it
> >> > always use the FRED entry points.
> >> > 
> >> > Let me see how ugly that gets.
> >> 
> >> Something like so... except this is broken. Its reporting spurious
> >> interrupts on vector 0x00, so something is buggered passing that vector
> >> along.
> >
> >Uh, aren't you making this way more complex than it needs to be?  IIUC, KVM never
> >uses the FRED hardware entry points, i.e. the FRED entry tables don't need to be
> >in place because they'll never be used.  The only bits of code KVM needs is the
> >__fred_entry_from_kvm() glue.
> >
> >Lightly tested, but this combo works for IRQs and NMIs on non-FRED hardware.

Hrm, and now I see that fred_extint() relies on fred_install_sysvec(), which makes
me quite curious as to why IRQs didn't go sideways.  Oh, because sysvec_table[]
is statically defined at compile time except for PV crud.

So yeah, I think my the patches are correct, they just the need a small bit of
prep work to support dynamic setup of sysvec_table.

> >--
> >From 664468143109ab7c525c0babeba62195fa4c657e Mon Sep 17 00:00:00 2001
> >From: Sean Christopherson <seanjc@google.com>
> >Date: Thu, 1 May 2025 11:20:29 -0700
> >Subject: [PATCH 1/2] x86/fred: Play nice with invoking
> > asm_fred_entry_from_kvm() on non-FRED hardware
> >
> >Modify asm_fred_entry_from_kvm() to allow it to be invoked by KVM even
> >when FRED isn't fully enabled, e.g. when running with CONFIG_X86_FRED=y
> >on non-FRED hardware.  This will allow forcing KVM to always use the FRED
> >entry points for 64-bit kernels, which in turn will eliminate a rather
> >gross non-CFI indirect call that KVM uses to trampoline IRQs by doing IDT
> >lookups.
> >
> >When FRED isn't enabled, simply skip ERETS and restore RBP and RSP from
> >the stack frame prior to doing a "regular" RET back to KVM (in quotes
> >because of all the RET mitigation horrors).
> >
> >Signed-off-by: Sean Christopherson <seanjc@google.com>
> >---
> > arch/x86/entry/entry_64_fred.S | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> >diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> >index 29c5c32c16c3..7aff2f0a285f 100644
> >--- a/arch/x86/entry/entry_64_fred.S
> >+++ b/arch/x86/entry/entry_64_fred.S
> >@@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> > 	movq %rsp, %rdi				/* %rdi -> pt_regs */
> > 	call __fred_entry_from_kvm		/* Call the C entry point */
> > 	POP_REGS
> >-	ERETS
> >+
> >+	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
> > 1:
> > 	/*
> > 	 * Objtool doesn't understand what ERETS does, this hint tells it that
> >@@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> > 	 * isn't strictly needed, but it's the simplest form.
> > 	 */
> > 	UNWIND_HINT_RESTORE
> >-	pop %rbp
> >+	leave
> > 	RET
> > 
> > SYM_FUNC_END(asm_fred_entry_from_kvm)
> >
> >base-commit: 45eb29140e68ffe8e93a5471006858a018480a45
> 
> Ok maybe I'm being dense, but what is left other than simply calling
> __fred_entry_from_kvm() as a normal C function? 
> 
> I'm on the go so there might be something in the code I'm missing, but on the
> surface...?

I'm sure it's doable, though I'd be more than a little nervous about diverging
from what FRED=y does, e.g. in case code somewhere expects the stack to look
exactly like a real FRED event.

And since we'd still need the assembly to support FRED=y, I don't see any point
in adding more code when it's trivially easy to have asm_fred_entry_from_kvm()
skip ERETS.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 18:30         ` Sean Christopherson
  2025-05-01 18:42           ` H. Peter Anvin
@ 2025-05-02  5:46           ` Xin Li
  2025-05-02  5:48           ` Xin Li
  2025-05-02  8:40           ` Peter Zijlstra
  3 siblings, 0 replies; 52+ messages in thread
From: Xin Li @ 2025-05-02  5:46 UTC (permalink / raw)
  To: Sean Christopherson, Peter Zijlstra
  Cc: H. Peter Anvin, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

>> Something like so... except this is broken. Its reporting spurious
>> interrupts on vector 0x00, so something is buggered passing that vector
>> along.

I'm a bit late to the party :)

Peter kind of got what I had in the FRED patch set v8 or earlier:

https://lore.kernel.org/lkml/20230410081438.1750-34-xin3.li@intel.com/


> Uh, aren't you making this way more complex than it needs to be?  IIUC, KVM never
> uses the FRED hardware entry points, i.e. the FRED entry tables don't need to be
> in place because they'll never be used.  The only bits of code KVM needs is the
> __fred_entry_from_kvm() glue.

+1

> 
> Lightly tested, but this combo works for IRQs and NMIs on non-FRED hardware.
> 
> --
>  From 664468143109ab7c525c0babeba62195fa4c657e Mon Sep 17 00:00:00 2001
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 1 May 2025 11:20:29 -0700
> Subject: [PATCH 1/2] x86/fred: Play nice with invoking
>   asm_fred_entry_from_kvm() on non-FRED hardware
> 
> Modify asm_fred_entry_from_kvm() to allow it to be invoked by KVM even
> when FRED isn't fully enabled, e.g. when running with CONFIG_X86_FRED=y
> on non-FRED hardware.  This will allow forcing KVM to always use the FRED
> entry points for 64-bit kernels, which in turn will eliminate a rather
> gross non-CFI indirect call that KVM uses to trampoline IRQs by doing IDT
> lookups.
> 
> When FRED isn't enabled, simply skip ERETS and restore RBP and RSP from
> the stack frame prior to doing a "regular" RET back to KVM (in quotes
> because of all the RET mitigation horrors).
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/entry/entry_64_fred.S | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> index 29c5c32c16c3..7aff2f0a285f 100644
> --- a/arch/x86/entry/entry_64_fred.S
> +++ b/arch/x86/entry/entry_64_fred.S
> @@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>   	movq %rsp, %rdi				/* %rdi -> pt_regs */
>   	call __fred_entry_from_kvm		/* Call the C entry point */
>   	POP_REGS
> -	ERETS
> +
> +	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED

Neat!

I ever had a plan to do this with "sub $0x8,%rsp; iret;" for non-FRED
case.  But obviously doing nothing here is the best.

>   1:
>   	/*
>   	 * Objtool doesn't understand what ERETS does, this hint tells it that
> @@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>   	 * isn't strictly needed, but it's the simplest form.
>   	 */
>   	UNWIND_HINT_RESTORE
> -	pop %rbp
> +	leave

This is a smart change.

When !X86_FEATURE_FRED, the FRED stack frame set up for ERETS is
implicitly skipped by leave.  Maybe add a comment to explain LEAVE works
for both cases?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 18:30         ` Sean Christopherson
  2025-05-01 18:42           ` H. Peter Anvin
  2025-05-02  5:46           ` Xin Li
@ 2025-05-02  5:48           ` Xin Li
  2025-05-02 19:43             ` H. Peter Anvin
  2025-05-02  8:40           ` Peter Zijlstra
  3 siblings, 1 reply; 52+ messages in thread
From: Xin Li @ 2025-05-02  5:48 UTC (permalink / raw)
  To: Sean Christopherson, Peter Zijlstra
  Cc: H. Peter Anvin, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On 5/1/2025 11:30 AM, Sean Christopherson wrote:
>  From c50fb5a8a46058bbcfdcac0a100c2aa0f7f68f1c Mon Sep 17 00:00:00 2001
> From: Sean Christopherson<seanjc@google.com>
> Date: Thu, 1 May 2025 11:10:39 -0700
> Subject: [PATCH 2/2] x86/fred: KVM: VMX: Always use FRED for IRQ+NMI when
>   CONFIG_X86_FRED=y
> 
> Now that FRED provides C-code entry points for handling IRQ and NMI exits,
> use the FRED infrastructure for forwarding all such events even if FRED
> isn't supported in hardware.  Avoiding the non-FRED assembly trampolines
> into the IDT handlers for IRQs eliminates the associated non-CFI indirect
> call (KVM performs a CALL by doing a lookup on the IDT using the IRQ
> vector).
> 
> Force FRED for 64-bit kernels if KVM_INTEL is enabled, as the benefits of
> eliminating the IRQ trampoline usage far outwieghts the code overhead for
> FRED.
> 
> Suggested-by: Peter Zijlstra<peterz@infradead.org>
> Signed-off-by: Sean Christopherson<seanjc@google.com>
> ---
>   arch/x86/kvm/Kconfig   | 1 +
>   arch/x86/kvm/vmx/vmx.c | 4 ++--
>   2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 2eeffcec5382..712a2ff28ce4 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -95,6 +95,7 @@ config KVM_SW_PROTECTED_VM
>   config KVM_INTEL
>   	tristate "KVM for Intel (and compatible) processors support"
>   	depends on KVM && IA32_FEAT_CTL
> +	select X86_FRED if X86_64

I LOVE this change, but not sure if everyone is happy with it.

>   	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
>   	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
>   	help
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index ef2d7208dd20..2ea89985107d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6995,7 +6995,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
>   		return;
>   
>   	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
> -	if (cpu_feature_enabled(X86_FEATURE_FRED))
> +	if (IS_ENABLED(CONFIG_X86_FRED))

"if (IS_ENABLED(CONFIG_X86_64))"?

>   		fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
>   	else
>   		vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
> @@ -7268,7 +7268,7 @@ noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu)
>   		return;
>   
>   	kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
> -	if (cpu_feature_enabled(X86_FEATURE_FRED))
> +	if (IS_ENABLED(CONFIG_X86_FRED))

Ditto.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 18:59             ` Sean Christopherson
@ 2025-05-02  6:12               ` Xin Li
  0 siblings, 0 replies; 52+ messages in thread
From: Xin Li @ 2025-05-02  6:12 UTC (permalink / raw)
  To: Sean Christopherson, H. Peter Anvin
  Cc: Peter Zijlstra, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On 5/1/2025 11:59 AM, Sean Christopherson wrote:
>> Ok maybe I'm being dense, but what is left other than simply calling
>> __fred_entry_from_kvm() as a normal C function?
>>
>> I'm on the go so there might be something in the code I'm missing, but on the
>> surface...?
> I'm sure it's doable, though I'd be more than a little nervous about diverging
> from what FRED=y does, e.g. in case code somewhere expects the stack to look
> exactly like a real FRED event.

__fred_entry_from_kvm() accepts a pt_regs structure pointer, with event
type and vector in FRED stack frame.  They are set up in the assembly.

> And since we'd still need the assembly to support FRED=y, I don't see any point
> in adding more code when it's trivially easy to have asm_fred_entry_from_kvm()
> skip ERETS.

Yeah, your change seems minimized to me.






^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 18:30         ` Sean Christopherson
                             ` (2 preceding siblings ...)
  2025-05-02  5:48           ` Xin Li
@ 2025-05-02  8:40           ` Peter Zijlstra
  2025-05-02 19:53             ` Sean Christopherson
  3 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-02  8:40 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: H. Peter Anvin, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Thu, May 01, 2025 at 11:30:18AM -0700, Sean Christopherson wrote:

> Uh, aren't you making this way more complex than it needs to be? 

Possibly :-)

> IIUC, KVM never
> uses the FRED hardware entry points, i.e. the FRED entry tables don't need to be
> in place because they'll never be used.  The only bits of code KVM needs is the
> __fred_entry_from_kvm() glue.

But __fred_entry_from_kvm() calls into fred_extint(), which then
directly uses the fred sysvec_table[] for dispatch. How would we not
have to set up that table?

> Lightly tested, but this combo works for IRQs and NMIs on non-FRED hardware.

So the FRED NMI code is significantly different from the IDT NMI code
and I really didn't want to go mixing those.

If we get a nested NMI I don't think it'll behave well.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-01 18:33 ` Paolo Bonzini
@ 2025-05-02 11:08   ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-02 11:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, seanjc, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On Thu, May 01, 2025 at 08:33:57PM +0200, Paolo Bonzini wrote:
> On Wed, Apr 30, 2025 at 1:26 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > Notably the KVM fastop emulation stuff -- which I've completely rewritten for
> > this version -- the generated code doesn't look horrific, but is slightly more
> > verbose. I'm running on the assumption that instruction emulation is not super
> > performance critical these days of zero VM-exit VMs etc.
> 
> It's definitely going to be slower, but I guess it's okay these days.

Yeah, it adds a bunch of branches at the very least. But that was indeed
the argument, we shouldn't care much these days.

> It's really only somewhat hot with really old processors
> (pre-Westmere) and only when running big real mode code.

Right, really old stuff :-)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-02  5:48           ` Xin Li
@ 2025-05-02 19:43             ` H. Peter Anvin
  0 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2025-05-02 19:43 UTC (permalink / raw)
  To: Xin Li, Sean Christopherson, Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	pbonzini, ardb, kees, Arnd Bergmann, gregkh, jpoimboe,
	linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda

On May 1, 2025 10:48:42 PM PDT, Xin Li <xin@zytor.com> wrote:
>On 5/1/2025 11:30 AM, Sean Christopherson wrote:
>>  From c50fb5a8a46058bbcfdcac0a100c2aa0f7f68f1c Mon Sep 17 00:00:00 2001
>> From: Sean Christopherson<seanjc@google.com>
>> Date: Thu, 1 May 2025 11:10:39 -0700
>> Subject: [PATCH 2/2] x86/fred: KVM: VMX: Always use FRED for IRQ+NMI when
>>   CONFIG_X86_FRED=y
>> 
>> Now that FRED provides C-code entry points for handling IRQ and NMI exits,
>> use the FRED infrastructure for forwarding all such events even if FRED
>> isn't supported in hardware.  Avoiding the non-FRED assembly trampolines
>> into the IDT handlers for IRQs eliminates the associated non-CFI indirect
>> call (KVM performs a CALL by doing a lookup on the IDT using the IRQ
>> vector).
>> 
>> Force FRED for 64-bit kernels if KVM_INTEL is enabled, as the benefits of
>> eliminating the IRQ trampoline usage far outwieghts the code overhead for
>> FRED.
>> 
>> Suggested-by: Peter Zijlstra<peterz@infradead.org>
>> Signed-off-by: Sean Christopherson<seanjc@google.com>
>> ---
>>   arch/x86/kvm/Kconfig   | 1 +
>>   arch/x86/kvm/vmx/vmx.c | 4 ++--
>>   2 files changed, 3 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
>> index 2eeffcec5382..712a2ff28ce4 100644
>> --- a/arch/x86/kvm/Kconfig
>> +++ b/arch/x86/kvm/Kconfig
>> @@ -95,6 +95,7 @@ config KVM_SW_PROTECTED_VM
>>   config KVM_INTEL
>>   	tristate "KVM for Intel (and compatible) processors support"
>>   	depends on KVM && IA32_FEAT_CTL
>> +	select X86_FRED if X86_64
>
>I LOVE this change, but not sure if everyone is happy with it.
>
>>   	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
>>   	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
>>   	help
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index ef2d7208dd20..2ea89985107d 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -6995,7 +6995,7 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
>>   		return;
>>     	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
>> -	if (cpu_feature_enabled(X86_FEATURE_FRED))
>> +	if (IS_ENABLED(CONFIG_X86_FRED))
>
>"if (IS_ENABLED(CONFIG_X86_64))"?
>
>>   		fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
>>   	else
>>   		vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
>> @@ -7268,7 +7268,7 @@ noinstr void vmx_handle_nmi(struct kvm_vcpu *vcpu)
>>   		return;
>>     	kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
>> -	if (cpu_feature_enabled(X86_FEATURE_FRED))
>> +	if (IS_ENABLED(CONFIG_X86_FRED))
>
>Ditto.

I don't think anyone will have a problem with compiling it in... it is such a small amount of code.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-02  8:40           ` Peter Zijlstra
@ 2025-05-02 19:53             ` Sean Christopherson
  2025-05-03  9:50               ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Sean Christopherson @ 2025-05-02 19:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

[-- Attachment #1: Type: text/plain, Size: 1711 bytes --]

On Fri, May 02, 2025, Peter Zijlstra wrote:
> On Thu, May 01, 2025 at 11:30:18AM -0700, Sean Christopherson wrote:
> 
> > Uh, aren't you making this way more complex than it needs to be? 
> 
> Possibly :-)
> 
> > IIUC, KVM never
> > uses the FRED hardware entry points, i.e. the FRED entry tables don't need to be
> > in place because they'll never be used.  The only bits of code KVM needs is the
> > __fred_entry_from_kvm() glue.
> 
> But __fred_entry_from_kvm() calls into fred_extint(), which then
> directly uses the fred sysvec_table[] for dispatch. How would we not
> have to set up that table?

I missed that the first time around.  From my self-reply:

 : Hrm, and now I see that fred_extint() relies on fred_install_sysvec(), which makes
 : me quite curious as to why IRQs didn't go sideways.  Oh, because sysvec_table[]
 : is statically defined at compile time except for PV crud.
 : 
 : So yeah, I think my the patches are correct, they just the need a small bit of
 : prep work to support dynamic setup of sysvec_table.

> > Lightly tested, but this combo works for IRQs and NMIs on non-FRED hardware.
> 
> So the FRED NMI code is significantly different from the IDT NMI code
> and I really didn't want to go mixing those.
> 
> If we get a nested NMI I don't think it'll behave well.

Ah, because FRED hardwware doesn't have the crazy NMI unblocking behavior, and
the FRED NMI entry code relies on that.

But I don't see why we need to care about NMIs from KVM, while they do bounce
through assembly to create a stack frame, the actual CALL is direct to
asm_exc_nmi_kvm_vmx().  If it's just the unwind hint that's needed, that

The attached patches handle the IRQ case and are passing my testing.

[-- Attachment #2: 0001-x86-fred-Install-system-vector-handlers-even-if-FRED.patch --]
[-- Type: text/x-diff, Size: 2455 bytes --]

From e605366a5ff6dfcfb45084828585e5fcfc5d3bcc Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 2 May 2025 07:24:01 -0700
Subject: [PATCH 1/3] x86/fred: Install system vector handlers even if FRED
 isn't fully enabled

Install the system vector IRQ handlers for FRED even if FRED isn't fully
enabled in hardware.  This will allow KVM to use the FRED IRQ path even
on non-FRED hardware, which in turn will eliminate a non-CFI indirect CALL
(KVM currently invokes the IRQ handler via an IDT lookup on the vector).

Not-yet-Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[sean: extract from diff, drop stub, write changelog]
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/idtentry.h | 9 ++-------
 arch/x86/kernel/irqinit.c       | 6 ++++--
 2 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h
index a4ec27c67988..abd637e54e94 100644
--- a/arch/x86/include/asm/idtentry.h
+++ b/arch/x86/include/asm/idtentry.h
@@ -460,17 +460,12 @@ __visible noinstr void func(struct pt_regs *regs,			\
 #endif
 
 void idt_install_sysvec(unsigned int n, const void *function);
-
-#ifdef CONFIG_X86_FRED
 void fred_install_sysvec(unsigned int vector, const idtentry_t function);
-#else
-static inline void fred_install_sysvec(unsigned int vector, const idtentry_t function) { }
-#endif
 
 #define sysvec_install(vector, function) {				\
-	if (cpu_feature_enabled(X86_FEATURE_FRED))			\
+	if (IS_ENABLED(CONFIG_X86_FRED))				\
 		fred_install_sysvec(vector, function);			\
-	else								\
+	if (!cpu_feature_enabled(X86_FEATURE_FRED))			\
 		idt_install_sysvec(vector, asm_##function);		\
 }
 
diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c
index f79c5edc0b89..6ab9eac64670 100644
--- a/arch/x86/kernel/irqinit.c
+++ b/arch/x86/kernel/irqinit.c
@@ -97,9 +97,11 @@ void __init native_init_IRQ(void)
 	/* Execute any quirks before the call gates are initialised: */
 	x86_init.irqs.pre_vector_init();
 
-	if (cpu_feature_enabled(X86_FEATURE_FRED))
+	/* FRED's IRQ path may be used even if FRED isn't fully enabled. */
+	if (IS_ENABLED(CONFIG_X86_FRED))
 		fred_complete_exception_setup();
-	else
+
+	if (!cpu_feature_enabled(X86_FEATURE_FRED))
 		idt_setup_apic_and_irq_gates();
 
 	lapic_assign_system_vectors();

base-commit: 45eb29140e68ffe8e93a5471006858a018480a45
-- 
2.49.0.906.g1f30a19c02-goog


[-- Attachment #3: 0002-x86-fred-Play-nice-with-invoking-asm_fred_entry_from.patch --]
[-- Type: text/x-diff, Size: 1722 bytes --]

From 12dc39eeb3d5ed1950a9bbaf4ac68c46943d0e9d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 1 May 2025 11:20:29 -0700
Subject: [PATCH 2/3] x86/fred: Play nice with invoking
 asm_fred_entry_from_kvm() on non-FRED hardware

Modify asm_fred_entry_from_kvm() to allow it to be invoked by KVM even
when FRED isn't fully enabled, e.g. when running with CONFIG_X86_FRED=y
on non-FRED hardware.  This will allow forcing KVM to always use the FRED
entry points for 64-bit kernels, which in turn will eliminate a rather
gross non-CFI indirect call that KVM uses to trampoline IRQs by doing IDT
lookups.

When FRED isn't enabled, simply skip ERETS and restore RBP and RSP from
the stack frame prior to doing a "regular" RET back to KVM (in quotes
because of all the RET mitigation horrors).

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/entry/entry_64_fred.S | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 29c5c32c16c3..7aff2f0a285f 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	movq %rsp, %rdi				/* %rdi -> pt_regs */
 	call __fred_entry_from_kvm		/* Call the C entry point */
 	POP_REGS
-	ERETS
+
+	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
 1:
 	/*
 	 * Objtool doesn't understand what ERETS does, this hint tells it that
@@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	 * isn't strictly needed, but it's the simplest form.
 	 */
 	UNWIND_HINT_RESTORE
-	pop %rbp
+	leave
 	RET
 
 SYM_FUNC_END(asm_fred_entry_from_kvm)
-- 
2.49.0.906.g1f30a19c02-goog


[-- Attachment #4: 0003-x86-fred-KVM-VMX-Always-use-FRED-for-IRQs-when-CONFI.patch --]
[-- Type: text/x-diff, Size: 2775 bytes --]

From 047137843838e48af8ec9cfd36e62e605b43cacd Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Thu, 1 May 2025 11:10:39 -0700
Subject: [PATCH 3/3] x86/fred: KVM: VMX: Always use FRED for IRQs when
 CONFIG_X86_FRED=y

Now that FRED provides C-code entry points for handling IRQs, use the FRED
infrastructure for forwarding IRQs even if FRED fully enabled, e.g. isn't
supported in hardware.  Avoiding the non-FRED assembly trampolines into
the IDT handlers for IRQs eliminates the associated non-CFI indirect call
(KVM performs a CALL by doing a lookup on the IDT using the IRQ vector).

Keep NMIs on the legacy IDT path, as the FRED NMI entry code relies on
FRED's architectural behavior with respect to NMI blocking, i.e. doesn't
jump through the myriad hoops needed to deal with IRET "unexpectedly"
unmasking NMIs.  KVM's NMI path already makes a direct CALL to C-code,
i.e. isn't problematic for CFI.  KVM does make a short detour through
assembly code to build the stack frame, but the "FRED entry from KVM"
path does the same.

Force FRED for 64-bit kernels if KVM_INTEL is enabled, as the benefits of
eliminating the IRQ trampoline usage far outwieghts the code overhead for
FRED.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/Kconfig   | 1 +
 arch/x86/kvm/vmx/vmx.c | 8 +++++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2eeffcec5382..712a2ff28ce4 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -95,6 +95,7 @@ config KVM_SW_PROTECTED_VM
 config KVM_INTEL
 	tristate "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
+	select X86_FRED if X86_64
 	select KVM_GENERIC_PRIVATE_MEM if INTEL_TDX_HOST
 	select KVM_GENERIC_MEMORY_ATTRIBUTES if INTEL_TDX_HOST
 	help
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef2d7208dd20..3139658de9ed 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6994,8 +6994,14 @@ static void handle_external_interrupt_irqoff(struct kvm_vcpu *vcpu,
 	    "unexpected VM-Exit interrupt info: 0x%x", intr_info))
 		return;
 
+	/*
+	 * Invoke the kernel's IRQ handler for the vector.  Use the FRED path
+	 * when it's available even if FRED isn't fully enabled, e.g. even if
+	 * FRED isn't supported in hardware, in order to avoid the indirect
+	 * CALL in the non-FRED path.
+	 */
 	kvm_before_interrupt(vcpu, KVM_HANDLING_IRQ);
-	if (cpu_feature_enabled(X86_FEATURE_FRED))
+	if (IS_ENABLED(CONFIG_X86_FRED))
 		fred_entry_from_kvm(EVENT_TYPE_EXTINT, vector);
 	else
 		vmx_do_interrupt_irqoff(gate_offset((gate_desc *)host_idt_base + vector));
-- 
2.49.0.906.g1f30a19c02-goog


^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-02 19:53             ` Sean Christopherson
@ 2025-05-03  9:50               ` Peter Zijlstra
  2025-05-03 18:28                 ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-03  9:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: H. Peter Anvin, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, pbonzini, ardb, kees, Arnd Bergmann, gregkh,
	jpoimboe, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Fri, May 02, 2025 at 12:53:36PM -0700, Sean Christopherson wrote:

> > So the FRED NMI code is significantly different from the IDT NMI code
> > and I really didn't want to go mixing those.
> > 
> > If we get a nested NMI I don't think it'll behave well.
> 
> Ah, because FRED hardwware doesn't have the crazy NMI unblocking behavior, and
> the FRED NMI entry code relies on that.

Just so.

> But I don't see why we need to care about NMIs from KVM, while they do bounce
> through assembly to create a stack frame, the actual CALL is direct to
> asm_exc_nmi_kvm_vmx().  If it's just the unwind hint that's needed, that

That part is all fine.


>  arch/x86/entry/entry_64_fred.S | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> index 29c5c32c16c3..7aff2f0a285f 100644
> --- a/arch/x86/entry/entry_64_fred.S
> +++ b/arch/x86/entry/entry_64_fred.S
> @@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>  	movq %rsp, %rdi				/* %rdi -> pt_regs */
>  	call __fred_entry_from_kvm		/* Call the C entry point */
>  	POP_REGS
> -	ERETS
> +
> +	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
>  1:
>  	/*
>  	 * Objtool doesn't understand what ERETS does, this hint tells it that
> @@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>  	 * isn't strictly needed, but it's the simplest form.
>  	 */
>  	UNWIND_HINT_RESTORE
> -	pop %rbp
> +	leave
>  	RET

So this, while clever, might be a problem with ORC unwinding. Because
now the stack is different depending on the alternative, and we can't
deal with that.

Anyway, I'll go have a poke on Monday (or Tuesday if Monday turns out to
be a bank holiday :-).

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-03  9:50               ` Peter Zijlstra
@ 2025-05-03 18:28                 ` Josh Poimboeuf
  2025-05-06  7:31                   ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2025-05-03 18:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Sat, May 03, 2025 at 11:50:23AM +0200, Peter Zijlstra wrote:
> > +++ b/arch/x86/entry/entry_64_fred.S
> > @@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> >  	movq %rsp, %rdi				/* %rdi -> pt_regs */
> >  	call __fred_entry_from_kvm		/* Call the C entry point */
> >  	POP_REGS
> > -	ERETS
> > +
> > +	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
> >  1:
> >  	/*
> >  	 * Objtool doesn't understand what ERETS does, this hint tells it that
> > @@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> >  	 * isn't strictly needed, but it's the simplest form.
> >  	 */
> >  	UNWIND_HINT_RESTORE
> > -	pop %rbp
> > +	leave
> >  	RET
> 
> So this, while clever, might be a problem with ORC unwinding. Because
> now the stack is different depending on the alternative, and we can't
> deal with that.
> 
> Anyway, I'll go have a poke on Monday (or Tuesday if Monday turns out to
> be a bank holiday :-).

Can we just adjust the stack in the alternative?

	ALTERNATIVE "add $64 %rsp", __stringify(ERETS), X86_FEATURE_FRED
1:
	UNWIND_HINT_RESTORE
	pop %rbp
	RET

-- 
Josh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-03 18:28                 ` Josh Poimboeuf
@ 2025-05-06  7:31                   ` Peter Zijlstra
  2025-05-06 13:32                     ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-06  7:31 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Sat, May 03, 2025 at 11:28:37AM -0700, Josh Poimboeuf wrote:
> On Sat, May 03, 2025 at 11:50:23AM +0200, Peter Zijlstra wrote:
> > > +++ b/arch/x86/entry/entry_64_fred.S
> > > @@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> > >  	movq %rsp, %rdi				/* %rdi -> pt_regs */
> > >  	call __fred_entry_from_kvm		/* Call the C entry point */
> > >  	POP_REGS
> > > -	ERETS
> > > +
> > > +	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
> > >  1:
> > >  	/*
> > >  	 * Objtool doesn't understand what ERETS does, this hint tells it that
> > > @@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> > >  	 * isn't strictly needed, but it's the simplest form.
> > >  	 */
> > >  	UNWIND_HINT_RESTORE
> > > -	pop %rbp
> > > +	leave
> > >  	RET
> > 
> > So this, while clever, might be a problem with ORC unwinding. Because
> > now the stack is different depending on the alternative, and we can't
> > deal with that.
> > 
> > Anyway, I'll go have a poke on Monday (or Tuesday if Monday turns out to
> > be a bank holiday :-).
> 
> Can we just adjust the stack in the alternative?
> 
> 	ALTERNATIVE "add $64 %rsp", __stringify(ERETS), X86_FEATURE_FRED

Yes, that should work. But I wanted to have a poke at objtool, so it
will properly complain about the mistake first.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-06  7:31                   ` Peter Zijlstra
@ 2025-05-06 13:32                     ` Peter Zijlstra
  2025-05-06 19:18                       ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-06 13:32 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Tue, May 06, 2025 at 09:31:00AM +0200, Peter Zijlstra wrote:
> On Sat, May 03, 2025 at 11:28:37AM -0700, Josh Poimboeuf wrote:
> > On Sat, May 03, 2025 at 11:50:23AM +0200, Peter Zijlstra wrote:
> > > > +++ b/arch/x86/entry/entry_64_fred.S
> > > > @@ -116,7 +116,8 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> > > >  	movq %rsp, %rdi				/* %rdi -> pt_regs */
> > > >  	call __fred_entry_from_kvm		/* Call the C entry point */
> > > >  	POP_REGS
> > > > -	ERETS
> > > > +
> > > > +	ALTERNATIVE "", __stringify(ERETS), X86_FEATURE_FRED
> > > >  1:
> > > >  	/*
> > > >  	 * Objtool doesn't understand what ERETS does, this hint tells it that
> > > > @@ -124,7 +125,7 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> > > >  	 * isn't strictly needed, but it's the simplest form.
> > > >  	 */
> > > >  	UNWIND_HINT_RESTORE
> > > > -	pop %rbp
> > > > +	leave
> > > >  	RET
> > > 
> > > So this, while clever, might be a problem with ORC unwinding. Because
> > > now the stack is different depending on the alternative, and we can't
> > > deal with that.
> > > 
> > > Anyway, I'll go have a poke on Monday (or Tuesday if Monday turns out to
> > > be a bank holiday :-).
> > 
> > Can we just adjust the stack in the alternative?
> > 
> > 	ALTERNATIVE "add $64 %rsp", __stringify(ERETS), X86_FEATURE_FRED
> 
> Yes, that should work. 

Nope, it needs to be "mov %rbp, %rsp". Because that is the actual rsp
value after ERETS-to-self.

> But I wanted to have a poke at objtool, so it
> will properly complain about the mistake first.

So a metric ton of fail here :/

The biggest problem is the UNWIND_HINT_RESTORE right after the
alternative. This ensures that objtool thinks all paths through the
alternative end up with the same stack. And hence won't actually
complain.

Second being of course, that in order to get IRET and co correct, we'd
need far more of an emulator.

Also, it actually chokes on this variant, and I've not yet figured out
why. Whatever state should be created by that mov, the restore hint
should wipe it all. But still the ORC generation bails with unknown base
reg -1.



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-06 13:32                     ` Peter Zijlstra
@ 2025-05-06 19:18                       ` Josh Poimboeuf
  2025-05-28  7:44                         ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2025-05-06 19:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Tue, May 06, 2025 at 03:32:34PM +0200, Peter Zijlstra wrote:
> On Tue, May 06, 2025 at 09:31:00AM +0200, Peter Zijlstra wrote:
> > On Sat, May 03, 2025 at 11:28:37AM -0700, Josh Poimboeuf wrote:
> > > Can we just adjust the stack in the alternative?
> > > 
> > > 	ALTERNATIVE "add $64 %rsp", __stringify(ERETS), X86_FEATURE_FRED
> > 
> > Yes, that should work. 
> 
> Nope, it needs to be "mov %rbp, %rsp". Because that is the actual rsp
> value after ERETS-to-self.
> 
> > But I wanted to have a poke at objtool, so it
> > will properly complain about the mistake first.
> 
> So a metric ton of fail here :/
> 
> The biggest problem is the UNWIND_HINT_RESTORE right after the
> alternative. This ensures that objtool thinks all paths through the
> alternative end up with the same stack. And hence won't actually
> complain.

Right, that's what the unwind hints are made for, it's up to the human
to get it right.

> Second being of course, that in order to get IRET and co correct, we'd
> need far more of an emulator.

At least finding RSP should be pretty easy, it's at a known location on
the stack.  We already have an ORC type for doing that, but that would
again require an unwind hint, unless we make objtool smart enough to
know that.  But then the ORC would be inconsistent between the two
alternative paths.

> Also, it actually chokes on this variant, and I've not yet figured out
> why. Whatever state should be created by that mov, the restore hint
> should wipe it all. But still the ORC generation bails with unknown base
> reg -1.

Weird, I'm not seeing that.

-- 
Josh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-06 19:18                       ` Josh Poimboeuf
@ 2025-05-28  7:44                         ` Peter Zijlstra
  2025-05-28 16:30                           ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-28  7:44 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Tue, May 06, 2025 at 12:18:49PM -0700, Josh Poimboeuf wrote:

> Weird, I'm not seeing that.

I Ate'nt Crazeh...

https://lore.kernel.org/all/202505280410.2qfTQCRt-lkp@intel.com/T/#u

I'll go poke at it, see if today is the day I can figure out WTF
happens.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-28  7:44                         ` Peter Zijlstra
@ 2025-05-28 16:30                           ` Peter Zijlstra
  2025-05-28 16:35                             ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-28 16:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Wed, May 28, 2025 at 09:44:52AM +0200, Peter Zijlstra wrote:
> On Tue, May 06, 2025 at 12:18:49PM -0700, Josh Poimboeuf wrote:
> 
> > Weird, I'm not seeing that.
> 
> I Ate'nt Crazeh...
> 
> https://lore.kernel.org/all/202505280410.2qfTQCRt-lkp@intel.com/T/#u
> 
> I'll go poke at it, see if today is the day I can figure out WTF
> happens.

It manages to trip the CFI_UNDEFINED case in op->dest.reg == cfa->base
in update_cfi_state().

I figured it ought to tickle the regular 'mov %rbp, %rsp' case above
there, but it doesn't, for some reason it has cfa.base == SP at this
point.

This happens... /me looks in scrollback ... at POP_REGS 'pop
%rbp'. ARGH!!


So the sequence of fail is:

	push %rbp
	mov %rsp, %rbp	# cfa.base = BP

	SAVE
	...
	push %rbp
	...
	pop %rbp	# cfa.base = SP
	...
	mov %rbp, %rsp  # UNDEF
	nop		# FAIL
	RESTORE

Note that the MOV+NOP is the 4 bytes ERETS needs.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-28 16:30                           ` Peter Zijlstra
@ 2025-05-28 16:35                             ` Peter Zijlstra
  2025-05-29  9:30                               ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-28 16:35 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Wed, May 28, 2025 at 06:30:35PM +0200, Peter Zijlstra wrote:
> On Wed, May 28, 2025 at 09:44:52AM +0200, Peter Zijlstra wrote:
> > On Tue, May 06, 2025 at 12:18:49PM -0700, Josh Poimboeuf wrote:
> > 
> > > Weird, I'm not seeing that.
> > 
> > I Ate'nt Crazeh...
> > 
> > https://lore.kernel.org/all/202505280410.2qfTQCRt-lkp@intel.com/T/#u
> > 
> > I'll go poke at it, see if today is the day I can figure out WTF
> > happens.
> 
> It manages to trip the CFI_UNDEFINED case in op->dest.reg == cfa->base
> in update_cfi_state().
> 
> I figured it ought to tickle the regular 'mov %rbp, %rsp' case above
> there, but it doesn't, for some reason it has cfa.base == SP at this
> point.
> 
> This happens... /me looks in scrollback ... at POP_REGS 'pop
> %rbp'. ARGH!!
> 
> 
> So the sequence of fail is:
> 
> 	push %rbp
> 	mov %rsp, %rbp	# cfa.base = BP
> 
> 	SAVE
> 	...
> 	push %rbp
> 	...
> 	pop %rbp	# cfa.base = SP

This is the POP !drap and dest==base case.

> 	...
> 	mov %rbp, %rsp  # UNDEF
> 	nop		# FAIL
> 	RESTORE
> 
> Note that the MOV+NOP is the 4 bytes ERETS needs.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-28 16:35                             ` Peter Zijlstra
@ 2025-05-29  9:30                               ` Peter Zijlstra
  2025-06-03  5:43                                 ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-05-29  9:30 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Wed, May 28, 2025 at 06:35:58PM +0200, Peter Zijlstra wrote:
> On Wed, May 28, 2025 at 06:30:35PM +0200, Peter Zijlstra wrote:
> > On Wed, May 28, 2025 at 09:44:52AM +0200, Peter Zijlstra wrote:
> > > On Tue, May 06, 2025 at 12:18:49PM -0700, Josh Poimboeuf wrote:
> > > 
> > > > Weird, I'm not seeing that.
> > > 
> > > I Ate'nt Crazeh...
> > > 
> > > https://lore.kernel.org/all/202505280410.2qfTQCRt-lkp@intel.com/T/#u
> > > 
> > > I'll go poke at it, see if today is the day I can figure out WTF
> > > happens.
> > 
> > It manages to trip the CFI_UNDEFINED case in op->dest.reg == cfa->base
> > in update_cfi_state().
> > 
> > I figured it ought to tickle the regular 'mov %rbp, %rsp' case above
> > there, but it doesn't, for some reason it has cfa.base == SP at this
> > point.
> > 
> > This happens... /me looks in scrollback ... at POP_REGS 'pop
> > %rbp'. ARGH!!
> > 

More fun!

> > So the sequence of fail is:
> > 
> > 	push %rbp
> > 	mov %rsp, %rbp	# cfa.base = BP
> > 
> > 	SAVE

	sub    $0x40,%rsp
	and    $0xffffffffffffffc0,%rsp

This hits the 'older GCC, drap with frame pointer' case in OP_SRC_AND.
Which means we then hard rely on the frame pointer to get things right.

However, per all the PUSH/POP_REGS nonsense, BP can get clobbered.
Specifically the code between the CALL and POP %rbp below are up in the
air. I don't think it can currently unwind properly there.

> > 	...
> > 	push %rbp
> > 	...
> > 	pop %rbp	# cfa.base = SP
> 
> This is the POP !drap and dest==base case.
> 
> > 	...
> > 	mov %rbp, %rsp  # UNDEF
> > 	nop		# FAIL
> > 	RESTORE
> > 
> > Note that the MOV+NOP is the 4 bytes ERETS needs.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-05-29  9:30                               ` Peter Zijlstra
@ 2025-06-03  5:43                                 ` Josh Poimboeuf
  2025-06-03 16:29                                   ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2025-06-03  5:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Thu, May 29, 2025 at 11:30:17AM +0200, Peter Zijlstra wrote:
> > > So the sequence of fail is:
> > > 
> > > 	push %rbp
> > > 	mov %rsp, %rbp	# cfa.base = BP
> > > 
> > > 	SAVE
> 
> 	sub    $0x40,%rsp
> 	and    $0xffffffffffffffc0,%rsp
> 
> This hits the 'older GCC, drap with frame pointer' case in OP_SRC_AND.
> Which means we then hard rely on the frame pointer to get things right.
> 
> However, per all the PUSH/POP_REGS nonsense, BP can get clobbered.
> Specifically the code between the CALL and POP %rbp below are up in the
> air. I don't think it can currently unwind properly there.

RBP is callee saved, so there's no need to pop it or any of the other
callee-saved regs.  If they were to change, that would break C ABI
pretty badly.  Maybe add a skip_callee=1 arg to POP_REGS?

-- 
Josh

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-06-03  5:43                                 ` Josh Poimboeuf
@ 2025-06-03 16:29                                   ` Josh Poimboeuf
  2025-06-05 17:19                                     ` Josh Poimboeuf
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2025-06-03 16:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Mon, Jun 02, 2025 at 10:43:42PM -0700, Josh Poimboeuf wrote:
> On Thu, May 29, 2025 at 11:30:17AM +0200, Peter Zijlstra wrote:
> > > > So the sequence of fail is:
> > > > 
> > > > 	push %rbp
> > > > 	mov %rsp, %rbp	# cfa.base = BP
> > > > 
> > > > 	SAVE
> > 
> > 	sub    $0x40,%rsp
> > 	and    $0xffffffffffffffc0,%rsp
> > 
> > This hits the 'older GCC, drap with frame pointer' case in OP_SRC_AND.
> > Which means we then hard rely on the frame pointer to get things right.
> > 
> > However, per all the PUSH/POP_REGS nonsense, BP can get clobbered.
> > Specifically the code between the CALL and POP %rbp below are up in the
> > air. I don't think it can currently unwind properly there.
> 
> RBP is callee saved, so there's no need to pop it or any of the other
> callee-saved regs.  If they were to change, that would break C ABI
> pretty badly.  Maybe add a skip_callee=1 arg to POP_REGS?

This compiles for me:

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index d83236b96f22..414f8bcf07ec 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -99,7 +99,7 @@ For 32-bit we have the following conventions - kernel is built with
 	.endif
 .endm
 
-.macro CLEAR_REGS clear_bp=1
+.macro CLEAR_REGS clear_callee=1
 	/*
 	 * Sanitize registers of values that a speculation attack might
 	 * otherwise want to exploit. The lower registers are likely clobbered
@@ -113,29 +113,31 @@ For 32-bit we have the following conventions - kernel is built with
 	xorl	%r9d,  %r9d	/* nospec r9  */
 	xorl	%r10d, %r10d	/* nospec r10 */
 	xorl	%r11d, %r11d	/* nospec r11 */
+	.if \clear_callee
 	xorl	%ebx,  %ebx	/* nospec rbx */
-	.if \clear_bp
 	xorl	%ebp,  %ebp	/* nospec rbp */
-	.endif
 	xorl	%r12d, %r12d	/* nospec r12 */
 	xorl	%r13d, %r13d	/* nospec r13 */
 	xorl	%r14d, %r14d	/* nospec r14 */
 	xorl	%r15d, %r15d	/* nospec r15 */
+	.endif
 
 .endm
 
-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_bp=1 unwind_hint=1
+.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_callee=1 unwind_hint=1
 	PUSH_REGS rdx=\rdx, rcx=\rcx, rax=\rax, save_ret=\save_ret unwind_hint=\unwind_hint
-	CLEAR_REGS clear_bp=\clear_bp
+	CLEAR_REGS clear_callee=\clear_callee
 .endm
 
-.macro POP_REGS pop_rdi=1
+.macro POP_REGS pop_rdi=1 pop_callee=1
+.if \pop_callee
 	popq %r15
 	popq %r14
 	popq %r13
 	popq %r12
 	popq %rbp
 	popq %rbx
+.endif
 	popq %r11
 	popq %r10
 	popq %r9
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 29c5c32c16c3..277f980c46fd 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -112,11 +112,12 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	push %rax				/* Return RIP */
 	push $0					/* Error code, 0 for IRQ/NMI */
 
-	PUSH_AND_CLEAR_REGS clear_bp=0 unwind_hint=0
+	PUSH_AND_CLEAR_REGS clear_callee=0 unwind_hint=0
 	movq %rsp, %rdi				/* %rdi -> pt_regs */
 	call __fred_entry_from_kvm		/* Call the C entry point */
-	POP_REGS
-	ERETS
+	POP_REGS pop_callee=0
+
+	ALTERNATIVE "mov %rbp, %rsp", __stringify(ERETS), X86_FEATURE_FRED
 1:
 	/*
 	 * Objtool doesn't understand what ERETS does, this hint tells it that

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-06-03 16:29                                   ` Josh Poimboeuf
@ 2025-06-05 17:19                                     ` Josh Poimboeuf
  2025-06-06 10:49                                       ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Josh Poimboeuf @ 2025-06-05 17:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Tue, Jun 03, 2025 at 09:29:45AM -0700, Josh Poimboeuf wrote:
> On Mon, Jun 02, 2025 at 10:43:42PM -0700, Josh Poimboeuf wrote:
> > On Thu, May 29, 2025 at 11:30:17AM +0200, Peter Zijlstra wrote:
> > > > > So the sequence of fail is:
> > > > > 
> > > > > 	push %rbp
> > > > > 	mov %rsp, %rbp	# cfa.base = BP
> > > > > 
> > > > > 	SAVE
> > > 
> > > 	sub    $0x40,%rsp
> > > 	and    $0xffffffffffffffc0,%rsp
> > > 
> > > This hits the 'older GCC, drap with frame pointer' case in OP_SRC_AND.
> > > Which means we then hard rely on the frame pointer to get things right.
> > > 
> > > However, per all the PUSH/POP_REGS nonsense, BP can get clobbered.
> > > Specifically the code between the CALL and POP %rbp below are up in the
> > > air. I don't think it can currently unwind properly there.
> > 
> > RBP is callee saved, so there's no need to pop it or any of the other
> > callee-saved regs.  If they were to change, that would break C ABI
> > pretty badly.  Maybe add a skip_callee=1 arg to POP_REGS?
> 
> This compiles for me:

That last patch had a pretty heinous bug: it didn't adjust the stack
accordingly when it skipped the callee-saved pops.

But actually there's no need to pop *any* regs there.

asm_fred_entry_from_kvm() uses C function ABI, so changes to
callee-saved regs aren't allowed, and changes to caller-saved regs would
have no effect.

How about something like this?

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index d83236b96f22..e680afbf65b6 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -99,7 +99,7 @@ For 32-bit we have the following conventions - kernel is built with
 	.endif
 .endm
 
-.macro CLEAR_REGS clear_bp=1
+.macro CLEAR_REGS clear_callee=1
 	/*
 	 * Sanitize registers of values that a speculation attack might
 	 * otherwise want to exploit. The lower registers are likely clobbered
@@ -113,20 +113,19 @@ For 32-bit we have the following conventions - kernel is built with
 	xorl	%r9d,  %r9d	/* nospec r9  */
 	xorl	%r10d, %r10d	/* nospec r10 */
 	xorl	%r11d, %r11d	/* nospec r11 */
+	.if \clear_callee
 	xorl	%ebx,  %ebx	/* nospec rbx */
-	.if \clear_bp
 	xorl	%ebp,  %ebp	/* nospec rbp */
-	.endif
 	xorl	%r12d, %r12d	/* nospec r12 */
 	xorl	%r13d, %r13d	/* nospec r13 */
 	xorl	%r14d, %r14d	/* nospec r14 */
 	xorl	%r15d, %r15d	/* nospec r15 */
-
+	.endif
 .endm
 
-.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_bp=1 unwind_hint=1
+.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_callee=1 unwind_hint=1
 	PUSH_REGS rdx=\rdx, rcx=\rcx, rax=\rax, save_ret=\save_ret unwind_hint=\unwind_hint
-	CLEAR_REGS clear_bp=\clear_bp
+	CLEAR_REGS clear_callee=\clear_callee
 .endm
 
 .macro POP_REGS pop_rdi=1
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 29c5c32c16c3..5d1eef193b79 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -112,11 +112,12 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	push %rax				/* Return RIP */
 	push $0					/* Error code, 0 for IRQ/NMI */
 
-	PUSH_AND_CLEAR_REGS clear_bp=0 unwind_hint=0
+	PUSH_AND_CLEAR_REGS clear_callee=0 unwind_hint=0
 	movq %rsp, %rdi				/* %rdi -> pt_regs */
 	call __fred_entry_from_kvm		/* Call the C entry point */
-	POP_REGS
-	ERETS
+	addq $C_PTREGS_SIZE, %rsp
+
+	ALTERNATIVE "mov %rbp, %rsp", __stringify(ERETS), X86_FEATURE_FRED
 1:
 	/*
 	 * Objtool doesn't understand what ERETS does, this hint tells it that
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index ad4ea6fb3b6c..d4f9bfdc24a7 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -94,6 +94,7 @@ static void __used common(void)
 
 	BLANK();
 	DEFINE(PTREGS_SIZE, sizeof(struct pt_regs));
+	OFFSET(C_PTREGS_SIZE, pt_regs, orig_ax);
 
 	/* TLB state for the entry code */
 	OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask);

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-06-05 17:19                                     ` Josh Poimboeuf
@ 2025-06-06 10:49                                       ` Peter Zijlstra
  2025-06-06 13:15                                         ` Sean Christopherson
  0 siblings, 1 reply; 52+ messages in thread
From: Peter Zijlstra @ 2025-06-06 10:49 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Sean Christopherson, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Thu, Jun 05, 2025 at 10:19:41AM -0700, Josh Poimboeuf wrote:

> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index d83236b96f22..e680afbf65b6 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -99,7 +99,7 @@ For 32-bit we have the following conventions - kernel is built with
>  	.endif
>  .endm
>  
> -.macro CLEAR_REGS clear_bp=1
> +.macro CLEAR_REGS clear_callee=1
>  	/*
>  	 * Sanitize registers of values that a speculation attack might
>  	 * otherwise want to exploit. The lower registers are likely clobbered
> @@ -113,20 +113,19 @@ For 32-bit we have the following conventions - kernel is built with
>  	xorl	%r9d,  %r9d	/* nospec r9  */
>  	xorl	%r10d, %r10d	/* nospec r10 */
>  	xorl	%r11d, %r11d	/* nospec r11 */
> +	.if \clear_callee
>  	xorl	%ebx,  %ebx	/* nospec rbx */
> -	.if \clear_bp
>  	xorl	%ebp,  %ebp	/* nospec rbp */
> -	.endif
>  	xorl	%r12d, %r12d	/* nospec r12 */
>  	xorl	%r13d, %r13d	/* nospec r13 */
>  	xorl	%r14d, %r14d	/* nospec r14 */
>  	xorl	%r15d, %r15d	/* nospec r15 */
> -
> +	.endif
>  .endm
>  
> -.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_bp=1 unwind_hint=1
> +.macro PUSH_AND_CLEAR_REGS rdx=%rdx rcx=%rcx rax=%rax save_ret=0 clear_callee=1 unwind_hint=1
>  	PUSH_REGS rdx=\rdx, rcx=\rcx, rax=\rax, save_ret=\save_ret unwind_hint=\unwind_hint
> -	CLEAR_REGS clear_bp=\clear_bp
> +	CLEAR_REGS clear_callee=\clear_callee
>  .endm
>  
>  .macro POP_REGS pop_rdi=1

Nice. So that leaves the callee-clobbered, extra caller-saved and return
registers cleared, and preserves the callee-saved regs.

> diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> index 29c5c32c16c3..5d1eef193b79 100644
> --- a/arch/x86/entry/entry_64_fred.S
> +++ b/arch/x86/entry/entry_64_fred.S
> @@ -112,11 +112,12 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
>  	push %rax				/* Return RIP */
>  	push $0					/* Error code, 0 for IRQ/NMI */
>  
> -	PUSH_AND_CLEAR_REGS clear_bp=0 unwind_hint=0
> +	PUSH_AND_CLEAR_REGS clear_callee=0 unwind_hint=0
>  	movq %rsp, %rdi				/* %rdi -> pt_regs */
>  	call __fred_entry_from_kvm		/* Call the C entry point */
> -	POP_REGS
> -	ERETS
> +	addq $C_PTREGS_SIZE, %rsp
> +
> +	ALTERNATIVE "mov %rbp, %rsp", __stringify(ERETS), X86_FEATURE_FRED

So... I was wondering.. do we actually ever need the ERETS? AFAICT this
will only ever 'inject' external interrupts, and those are not supposed
to change the exception frame, like ever. Only exceptions get to change
the exception frame, but those are explicitly excluded in fred_extint().

As such, it should always be correct to just do:

	leave;
	RET;

at this point, and call it a day, no? Just completely forget about all
this sillyness with alternatives and funky stack state.

Only problem seems to be that if we do this, then
has_modified_stack_frame() has a fit, because of the register state.

The first to complain is bx, the push %rbx modifies the CFI state to
track where on the stack its saved, and that's not what initial_func_cfi
has.

We can stomp on that with UNWIND_HINT_FUNC right before RET. It's all a
bit magical, but should work, right?

So keeping your CLEAR_REGS changes, I've ended up with the below:

---
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 29c5c32c16c3..8c03d04ea69d 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -62,8 +62,6 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	push %rbp
 	mov %rsp, %rbp
 
-	UNWIND_HINT_SAVE
-
 	/*
 	 * Both IRQ and NMI from VMX can be handled on current task stack
 	 * because there is no need to protect from reentrancy and the call
@@ -112,19 +110,35 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
 	push %rax				/* Return RIP */
 	push $0					/* Error code, 0 for IRQ/NMI */
 
-	PUSH_AND_CLEAR_REGS clear_bp=0 unwind_hint=0
+	PUSH_AND_CLEAR_REGS clear_callee=0 unwind_hint=0
 	movq %rsp, %rdi				/* %rdi -> pt_regs */
+
+	/*
+	 * At this point: {rdi, rsi, rdx, rcx, r8, r9}, {r10, r11}, {rax, rdx}
+	 * are clobbered, which corresponds to: arguments, extra caller-saved
+	 * and return. All registers a C function is allowed to clobber.
+	 *
+	 * Notably, the callee-saved registers: {rbx, r12, r13, r14, r15}
+	 * are untouched, with the exception of rbp, which carries the stack
+	 * frame and will be restored before exit.
+	 *
+	 * Further calling another C function will not alter this state.
+	 */
 	call __fred_entry_from_kvm		/* Call the C entry point */
-	POP_REGS
-	ERETS
-1:
+
+1:	/*
+	 * Therefore, all that remains to be done at this point is restore the
+	 * stack and frame pointer register.
+	 */
+	leave
 	/*
-	 * Objtool doesn't understand what ERETS does, this hint tells it that
-	 * yes, we'll reach here and with what stack state. A save/restore pair
-	 * isn't strictly needed, but it's the simplest form.
+	 * Objtool gets confused by the cfi register state; this doesn't match
+	 * initial_func_cfi because of PUSH_REGS, where it tracks where those
+	 * registers are on the stack.
+	 *
+	 * Forcefully make it forget this before returning.
 	 */
-	UNWIND_HINT_RESTORE
-	pop %rbp
+	UNWIND_HINT_FUNC
 	RET
 
 SYM_FUNC_END(asm_fred_entry_from_kvm)

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-06-06 10:49                                       ` Peter Zijlstra
@ 2025-06-06 13:15                                         ` Sean Christopherson
  2025-06-06 13:20                                           ` Peter Zijlstra
  0 siblings, 1 reply; 52+ messages in thread
From: Sean Christopherson @ 2025-06-06 13:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Fri, Jun 06, 2025, Peter Zijlstra wrote:
> On Thu, Jun 05, 2025 at 10:19:41AM -0700, Josh Poimboeuf wrote:
> > diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> > index 29c5c32c16c3..5d1eef193b79 100644
> > --- a/arch/x86/entry/entry_64_fred.S
> > +++ b/arch/x86/entry/entry_64_fred.S
> > @@ -112,11 +112,12 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> >  	push %rax				/* Return RIP */
> >  	push $0					/* Error code, 0 for IRQ/NMI */
> >  
> > -	PUSH_AND_CLEAR_REGS clear_bp=0 unwind_hint=0
> > +	PUSH_AND_CLEAR_REGS clear_callee=0 unwind_hint=0
> >  	movq %rsp, %rdi				/* %rdi -> pt_regs */
> >  	call __fred_entry_from_kvm		/* Call the C entry point */
> > -	POP_REGS
> > -	ERETS
> > +	addq $C_PTREGS_SIZE, %rsp
> > +
> > +	ALTERNATIVE "mov %rbp, %rsp", __stringify(ERETS), X86_FEATURE_FRED
> 
> So... I was wondering.. do we actually ever need the ERETS?

Yes, to unblock NMIs, because NMIs are blocked on VM-Exit due to NMI.

The !X86_FEATURE_FRED path can use RET (instead of IRET) because NMIs are routed
through vmx_do_nmi_irqoff(), as proposed in this version[*], after you pointed out
that the FRED entry doesn't have the legacy NMI madness.

[*] https://lore.kernel.org/all/aBUiwLV4ZY2HdRbz@google.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions
  2025-06-06 13:15                                         ` Sean Christopherson
@ 2025-06-06 13:20                                           ` Peter Zijlstra
  0 siblings, 0 replies; 52+ messages in thread
From: Peter Zijlstra @ 2025-06-06 13:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Josh Poimboeuf, H. Peter Anvin, x86, kys, haiyangz, wei.liu,
	decui, tglx, mingo, bp, dave.hansen, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, xin

On Fri, Jun 06, 2025 at 06:15:19AM -0700, Sean Christopherson wrote:
> On Fri, Jun 06, 2025, Peter Zijlstra wrote:
> > On Thu, Jun 05, 2025 at 10:19:41AM -0700, Josh Poimboeuf wrote:
> > > diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
> > > index 29c5c32c16c3..5d1eef193b79 100644
> > > --- a/arch/x86/entry/entry_64_fred.S
> > > +++ b/arch/x86/entry/entry_64_fred.S
> > > @@ -112,11 +112,12 @@ SYM_FUNC_START(asm_fred_entry_from_kvm)
> > >  	push %rax				/* Return RIP */
> > >  	push $0					/* Error code, 0 for IRQ/NMI */
> > >  
> > > -	PUSH_AND_CLEAR_REGS clear_bp=0 unwind_hint=0
> > > +	PUSH_AND_CLEAR_REGS clear_callee=0 unwind_hint=0
> > >  	movq %rsp, %rdi				/* %rdi -> pt_regs */
> > >  	call __fred_entry_from_kvm		/* Call the C entry point */
> > > -	POP_REGS
> > > -	ERETS
> > > +	addq $C_PTREGS_SIZE, %rsp
> > > +
> > > +	ALTERNATIVE "mov %rbp, %rsp", __stringify(ERETS), X86_FEATURE_FRED
> > 
> > So... I was wondering.. do we actually ever need the ERETS?
> 
> Yes, to unblock NMIs, because NMIs are blocked on VM-Exit due to NMI.

Ah, bah, indeed! Shame.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2025-06-06 13:20 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 11:07 [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 01/13] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 02/13] x86/kvm/emulate: Introduce COP1 Peter Zijlstra
2025-04-30 16:19   ` Josh Poimboeuf
2025-04-30 19:05     ` Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 03/13] x86/kvm/emulate: Introduce COP2 Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 04/13] x86/kvm/emulate: Introduce COP2R Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 05/13] x86/kvm/emulate: Introduce COP2W Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 06/13] x86/kvm/emulate: Introduce COP2CL Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 07/13] x86/kvm/emulate: Introduce COP1SRC2 Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 08/13] x86/kvm/emulate: Introduce COP3WCL Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 09/13] x86/kvm/emulate: Convert em_salc() to C Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 10/13] x86/kvm/emulate: Remove fastops Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 11/13] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
2025-05-01  2:36   ` Michael Kelley
2025-04-30 11:07 ` [PATCH v2 12/13] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
2025-05-01  2:36   ` Michael Kelley
2025-05-01  8:59     ` Peter Zijlstra
2025-04-30 11:07 ` [PATCH v2 13/13] objtool: Validate kCFI calls Peter Zijlstra
2025-04-30 15:59   ` Josh Poimboeuf
2025-04-30 19:03     ` Peter Zijlstra
2025-05-01 15:56       ` Peter Zijlstra
2025-04-30 14:24 ` [PATCH v2 00/13] objtool: Detect and warn about indirect calls in __nocfi functions H. Peter Anvin
2025-04-30 19:06   ` Peter Zijlstra
2025-05-01 10:30     ` Peter Zijlstra
2025-05-01 15:38       ` Peter Zijlstra
2025-05-01 18:30         ` Sean Christopherson
2025-05-01 18:42           ` H. Peter Anvin
2025-05-01 18:59             ` Sean Christopherson
2025-05-02  6:12               ` Xin Li
2025-05-02  5:46           ` Xin Li
2025-05-02  5:48           ` Xin Li
2025-05-02 19:43             ` H. Peter Anvin
2025-05-02  8:40           ` Peter Zijlstra
2025-05-02 19:53             ` Sean Christopherson
2025-05-03  9:50               ` Peter Zijlstra
2025-05-03 18:28                 ` Josh Poimboeuf
2025-05-06  7:31                   ` Peter Zijlstra
2025-05-06 13:32                     ` Peter Zijlstra
2025-05-06 19:18                       ` Josh Poimboeuf
2025-05-28  7:44                         ` Peter Zijlstra
2025-05-28 16:30                           ` Peter Zijlstra
2025-05-28 16:35                             ` Peter Zijlstra
2025-05-29  9:30                               ` Peter Zijlstra
2025-06-03  5:43                                 ` Josh Poimboeuf
2025-06-03 16:29                                   ` Josh Poimboeuf
2025-06-05 17:19                                     ` Josh Poimboeuf
2025-06-06 10:49                                       ` Peter Zijlstra
2025-06-06 13:15                                         ` Sean Christopherson
2025-06-06 13:20                                           ` Peter Zijlstra
2025-05-01 18:33 ` Paolo Bonzini
2025-05-02 11:08   ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).