[PATCH 0/6] objtool: Detect and warn about indirect calls in _

linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions
@ 2025-04-14 11:11 Peter Zijlstra
  2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

Hi!

On kCFI (CONFIG_CFI_CLANG=y) builds all indirect calls should have the CFI
check on (with very few exceptions). Not having the CFI checks undermines the
protection provided by CFI and will make these sites candidates for people
wanting to steal your cookies.

Specifically the ABI changes are so that doing indirect calls without the CFI
magic, to a CFI adorned function is not compatible (although it happens to work
for some setups, it very much does not for FineIBT).

Rust people tripped over this the other day, since their 'core' happened to
have some no_sanitize(kcfi) bits in, which promptly exploded when ran with
FineIBT on.

Since this is very much not a supported model -- on purpose, have objtool
detect and warn about such constructs.

This effort [1] found all existins [2] non-cfi indirect calls in the kernel.

Notably the KVM fastop emulation stuff -- which reminded me I still had pending
patches there. Included here since they reduce the amount of fastop call sites,
and the final patch includes an annotation for that. Although ideally we should
look at means of doing fastops differently.

KVM has another; the interrupt injection stuff calls the IDT handler directly.
Is there an alternative? Can we keep a table of Linux functions slighly higher
up the call stack (asm_\cfunc ?) and add CFI to those?

HyperV hypercall page stuff, which I've previously suggested use direct calls,
and which I've now converted (after getting properly annoyed with that code).

[1] https://lkml.kernel.org/r/20250410154556.GB9003@noisy.programming.kicks-ass.net
[2] https://lkml.kernel.org/r/20250410194334.GA3248459@google.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/6] x86/nospec: JMP_NOSPEC
  2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
@ 2025-04-14 11:11 ` Peter Zijlstra
  2025-04-14 11:11 ` [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda


Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/nospec-branch.h |   24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -438,6 +438,9 @@ static inline void call_depth_return_thu
 #define CALL_NOSPEC	__CS_PREFIX("%V[thunk_target]")	\
 			"call __x86_indirect_thunk_%V[thunk_target]\n"
 
+#define JMP_NOSPEC	__CS_PREFIX("%V[thunk_target]")	\
+			"jmp __x86_indirect_thunk_%V[thunk_target]\n"
+
 # define THUNK_TARGET(addr) [thunk_target] "r" (addr)
 
 #else /* CONFIG_X86_32 */
@@ -468,10 +471,31 @@ static inline void call_depth_return_thu
 	"call *%[thunk_target]\n",				\
 	X86_FEATURE_RETPOLINE_LFENCE)
 
+# define JMP_NOSPEC						\
+	ALTERNATIVE_2(						\
+	ANNOTATE_RETPOLINE_SAFE					\
+	"jmp *%[thunk_target]\n",				\
+	"       jmp    901f;\n"					\
+	"       .align 16\n"					\
+	"901:	call   903f;\n"					\
+	"902:	pause;\n"					\
+	"       lfence;\n"					\
+	"       jmp    902b;\n"					\
+	"       .align 16\n"					\
+	"903:	lea    4(%%esp), %%esp;\n"			\
+	"       pushl  %[thunk_target];\n"			\
+	"       ret;\n",					\
+	X86_FEATURE_RETPOLINE,					\
+	"lfence;\n"						\
+	ANNOTATE_RETPOLINE_SAFE					\
+	"jmp *%[thunk_target]\n",				\
+	X86_FEATURE_RETPOLINE_LFENCE)
+
 # define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
 #endif
 #else /* No retpoline for C / inline asm */
 # define CALL_NOSPEC "call *%[thunk_target]\n"
+# define JMP_NOSPEC "jmp *%[thunk_target]\n"
 # define THUNK_TARGET(addr) [thunk_target] "rm" (addr)
 #endif
 



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C
  2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
  2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra
@ 2025-04-14 11:11 ` Peter Zijlstra
  2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

Current test_cc() uses the fastop infrastructure to test flags using
SETcc instructions. However, int3_emulate_jcc() already fully
implements the flags->CC mapping, use that.

Removes a pile of gnarly asm.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/include/asm/text-patching.h |   20 +++++++++++++-------
 arch/x86/kvm/emulate.c               |   34 ++--------------------------------
 2 files changed, 15 insertions(+), 39 deletions(-)

--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -177,9 +177,9 @@ void int3_emulate_ret(struct pt_regs *re
 }
 
 static __always_inline
-void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp)
+bool __emulate_cc(unsigned long flags, u8 cc)
 {
-	static const unsigned long jcc_mask[6] = {
+	static const unsigned long cc_mask[6] = {
 		[0] = X86_EFLAGS_OF,
 		[1] = X86_EFLAGS_CF,
 		[2] = X86_EFLAGS_ZF,
@@ -192,15 +192,21 @@ void int3_emulate_jcc(struct pt_regs *re
 	bool match;
 
 	if (cc < 0xc) {
-		match = regs->flags & jcc_mask[cc >> 1];
+		match = flags & cc_mask[cc >> 1];
 	} else {
-		match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
-			((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
+		match = ((flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^
+			((flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT);
 		if (cc >= 0xe)
-			match = match || (regs->flags & X86_EFLAGS_ZF);
+			match = match || (flags & X86_EFLAGS_ZF);
 	}
 
-	if ((match && !invert) || (!match && invert))
+	return (match && !invert) || (!match && invert);
+}
+
+static __always_inline
+void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp)
+{
+	if (__emulate_cc(regs->flags, cc))
 		ip += disp;
 
 	int3_emulate_jmp(regs, ip);
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -26,6 +26,7 @@
 #include <asm/debugreg.h>
 #include <asm/nospec-branch.h>
 #include <asm/ibt.h>
+#include <asm/text-patching.h>
 
 #include "x86.h"
 #include "tss.h"
@@ -416,31 +417,6 @@ static int fastop(struct x86_emulate_ctx
 	ON64(FOP3E(op##q, rax, rdx, cl)) \
 	FOP_END
 
-/* Special case for SETcc - 1 instruction per cc */
-#define FOP_SETCC(op) \
-	FOP_FUNC(op) \
-	#op " %al \n\t" \
-	FOP_RET(op)
-
-FOP_START(setcc)
-FOP_SETCC(seto)
-FOP_SETCC(setno)
-FOP_SETCC(setc)
-FOP_SETCC(setnc)
-FOP_SETCC(setz)
-FOP_SETCC(setnz)
-FOP_SETCC(setbe)
-FOP_SETCC(setnbe)
-FOP_SETCC(sets)
-FOP_SETCC(setns)
-FOP_SETCC(setp)
-FOP_SETCC(setnp)
-FOP_SETCC(setl)
-FOP_SETCC(setnl)
-FOP_SETCC(setle)
-FOP_SETCC(setnle)
-FOP_END;
-
 FOP_START(salc)
 FOP_FUNC(salc)
 "pushf; sbb %al, %al; popf \n\t"
@@ -1068,13 +1044,7 @@ static int em_bsr_c(struct x86_emulate_c
 
 static __always_inline u8 test_cc(unsigned int condition, unsigned long flags)
 {
-	u8 rc;
-	void (*fop)(void) = (void *)em_setcc + FASTOP_SIZE * (condition & 0xf);
-
-	flags = (flags & EFLAGS_MASK) | X86_EFLAGS_IF;
-	asm("push %[flags]; popf; " CALL_NOSPEC
-	    : "=a"(rc), ASM_CALL_CONSTRAINT : [thunk_target]"r"(fop), [flags]"r"(flags));
-	return rc;
+	return __emulate_cc(flags, condition & 0xf);
 }
 
 static void fetch_register_operand(struct operand *op)



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
  2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra
  2025-04-14 11:11 ` [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra
@ 2025-04-14 11:11 ` Peter Zijlstra
  2025-04-14 22:36   ` Josh Poimboeuf
  2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

Since there is only a single fastop() function, convert the FASTOP
stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
thunks and all that jazz.

Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
which not all of them can trivially do (call depth tracing suffers
here).

Objtool strenuously complains about things, therefore fix up the
various problems:

 - indirect call without a .rodata, fails to determine JUMP_TABLE,
   add an annotation for this.
 - fastop functions fall through, create an exception for this case
 - unreachable instruction after fastop_return, save/restore

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kvm/emulate.c              |   20 +++++++++++++++-----
 include/linux/objtool_types.h       |    1 +
 tools/include/linux/objtool_types.h |    1 +
 tools/objtool/check.c               |   11 ++++++++++-
 4 files changed, 27 insertions(+), 6 deletions(-)

--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -285,8 +285,8 @@ static void invalidate_registers(struct
  * different operand sizes can be reached by calculation, rather than a jump
  * table (which would be bigger than the code).
  *
- * The 16 byte alignment, considering 5 bytes for the RET thunk, 3 for ENDBR
- * and 1 for the straight line speculation INT3, leaves 7 bytes for the
+ * The 16 byte alignment, considering 5 bytes for the JMP, 4 for ENDBR
+ * and 1 for the straight line speculation INT3, leaves 6 bytes for the
  * body of the function.  Currently none is larger than 4.
  */
 static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop);
@@ -304,7 +304,7 @@ static int fastop(struct x86_emulate_ctx
 	__FOP_FUNC(#name)
 
 #define __FOP_RET(name) \
-	"11: " ASM_RET \
+	"11: jmp fastop_return; int3 \n\t" \
 	".size " name ", .-" name "\n\t"
 
 #define FOP_RET(name) \
@@ -5044,14 +5044,24 @@ static void fetch_possible_mmx_operand(s
 		kvm_read_mmx_reg(op->addr.mm, &op->mm_val);
 }
 
-static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
+/*
+ * All the FASTOP magic above relies on there being *one* instance of this
+ * so it can JMP back, avoiding RET and it's various thunks.
+ */
+static noinline int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop)
 {
 	ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF;
 
 	if (!(ctxt->d & ByteOp))
 		fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE;
 
-	asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
+	asm("push %[flags]; popf \n\t"
+	    UNWIND_HINT(UNWIND_HINT_TYPE_SAVE, 0, 0, 0)
+	    ASM_ANNOTATE(ANNOTYPE_JUMP_TABLE)
+	    JMP_NOSPEC
+	    "fastop_return: \n\t"
+	    UNWIND_HINT(UNWIND_HINT_TYPE_RESTORE, 0, 0, 0)
+	    "pushf; pop %[flags]\n"
 	    : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags),
 	      [thunk_target]"+S"(fop), ASM_CALL_CONSTRAINT
 	    : "c"(ctxt->src2.val));
--- a/include/linux/objtool_types.h
+++ b/include/linux/objtool_types.h
@@ -65,5 +65,6 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_JUMP_TABLE		9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/include/linux/objtool_types.h
+++ b/tools/include/linux/objtool_types.h
@@ -65,5 +65,6 @@ struct unwind_hint {
 #define ANNOTYPE_IGNORE_ALTS		6
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
+#define ANNOTYPE_JUMP_TABLE		9
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2428,6 +2428,14 @@ static int __annotate_late(struct objtoo
 		insn->dead_end = false;
 		break;
 
+	/*
+	 * Must be after add_jump_table(); for it doesn't set a sane
+	 * _jump_table value.
+	 */
+	case ANNOTYPE_JUMP_TABLE:
+		insn->_jump_table = (void *)1;
+		break;
+
 	default:
 		ERROR_INSN(insn, "Unknown annotation type: %d", type);
 		return -1;
@@ -3559,7 +3567,8 @@ static int validate_branch(struct objtoo
 		if (func && insn_func(insn) && func != insn_func(insn)->pfunc) {
 			/* Ignore KCFI type preambles, which always fall through */
 			if (!strncmp(func->name, "__cfi_", 6) ||
-			    !strncmp(func->name, "__pfx_", 6))
+			    !strncmp(func->name, "__pfx_", 6) ||
+			    !strcmp(insn_func(insn)->name, "fastop"))
 				return 0;
 
 			if (file->ignore_unreachables)



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra
@ 2025-04-14 22:36   ` Josh Poimboeuf
  2025-04-15  7:44     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2025-04-14 22:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> Since there is only a single fastop() function, convert the FASTOP
> stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> thunks and all that jazz.
> 
> Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> which not all of them can trivially do (call depth tracing suffers
> here).
> 
> Objtool strenuously complains about things, therefore fix up the
> various problems:
> 
>  - indirect call without a .rodata, fails to determine JUMP_TABLE,
>    add an annotation for this.
>  - fastop functions fall through, create an exception for this case
>  - unreachable instruction after fastop_return, save/restore

I think this breaks unwinding.  Each of the individual fastops inherits
fastop()'s stack but the ORC doesn't reflect that.

Should they just be moved to a proper .S file?

-- 
Josh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-14 22:36   ` Josh Poimboeuf
@ 2025-04-15  7:44     ` Peter Zijlstra
  2025-04-15 14:39       ` Josh Poimboeuf
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-15  7:44 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > Since there is only a single fastop() function, convert the FASTOP
> > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > thunks and all that jazz.
> > 
> > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > which not all of them can trivially do (call depth tracing suffers
> > here).
> > 
> > Objtool strenuously complains about things, therefore fix up the
> > various problems:
> > 
> >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> >    add an annotation for this.
> >  - fastop functions fall through, create an exception for this case
> >  - unreachable instruction after fastop_return, save/restore
> 
> I think this breaks unwinding.  Each of the individual fastops inherits
> fastop()'s stack but the ORC doesn't reflect that.

I'm not sure I understand. There is only the one location, and we
simply save/restore the state around the one 'call'.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-15  7:44     ` Peter Zijlstra
@ 2025-04-15 14:39       ` Josh Poimboeuf
  2025-04-16  8:38         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Josh Poimboeuf @ 2025-04-15 14:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > > Since there is only a single fastop() function, convert the FASTOP
> > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > > thunks and all that jazz.
> > > 
> > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > > which not all of them can trivially do (call depth tracing suffers
> > > here).
> > > 
> > > Objtool strenuously complains about things, therefore fix up the
> > > various problems:
> > > 
> > >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> > >    add an annotation for this.
> > >  - fastop functions fall through, create an exception for this case
> > >  - unreachable instruction after fastop_return, save/restore
> > 
> > I think this breaks unwinding.  Each of the individual fastops inherits
> > fastop()'s stack but the ORC doesn't reflect that.
> 
> I'm not sure I understand. There is only the one location, and we
> simply save/restore the state around the one 'call'.

The problem isn't fastop() but rather the tiny functions it "calls".
Each of those is marked STT_FUNC so it gets its own ORC data saying the
return address is at RSP+8.

Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address
isn't pushed before the branch.  Thus they become part of fastop()
rather than separate functions.  RSP+8 is only correct if it happens to
have not pushed anything to the stack before the indirect JMP.

The addresses aren't stored in an .rodata jump table so objtool doesn't
know the control flow.  Even if we made them non-FUNC, objtool wouldn't
be able to transfer the stack state.

-- 
Josh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-15 14:39       ` Josh Poimboeuf
@ 2025-04-16  8:38         ` Peter Zijlstra
  2025-04-26 10:01           ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-16  8:38 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Tue, Apr 15, 2025 at 07:39:41AM -0700, Josh Poimboeuf wrote:
> On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote:
> > > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote:
> > > > Since there is only a single fastop() function, convert the FASTOP
> > > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return
> > > > thunks and all that jazz.
> > > > 
> > > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS,
> > > > which not all of them can trivially do (call depth tracing suffers
> > > > here).
> > > > 
> > > > Objtool strenuously complains about things, therefore fix up the
> > > > various problems:
> > > > 
> > > >  - indirect call without a .rodata, fails to determine JUMP_TABLE,
> > > >    add an annotation for this.
> > > >  - fastop functions fall through, create an exception for this case
> > > >  - unreachable instruction after fastop_return, save/restore
> > > 
> > > I think this breaks unwinding.  Each of the individual fastops inherits
> > > fastop()'s stack but the ORC doesn't reflect that.
> > 
> > I'm not sure I understand. There is only the one location, and we
> > simply save/restore the state around the one 'call'.
> 
> The problem isn't fastop() but rather the tiny functions it "calls".
> Each of those is marked STT_FUNC so it gets its own ORC data saying the
> return address is at RSP+8.
> 
> Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address
> isn't pushed before the branch.  Thus they become part of fastop()
> rather than separate functions.  RSP+8 is only correct if it happens to
> have not pushed anything to the stack before the indirect JMP.

Yeah, I finally got there. I'll go cook up something else.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-16  8:38         ` Peter Zijlstra
@ 2025-04-26 10:01           ` Peter Zijlstra
  2025-04-28 17:13             ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-26 10:01 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:

> Yeah, I finally got there. I'll go cook up something else.

Sean, Paolo, can I once again ask how best to test this fastop crud?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-26 10:01           ` Peter Zijlstra
@ 2025-04-28 17:13             ` Sean Christopherson
  2025-04-29 10:09               ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-04-28 17:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> 
> > Yeah, I finally got there. I'll go cook up something else.
> 
> Sean, Paolo, can I once again ask how best to test this fastop crud?

Apply the below, build KVM selftests, enable forced emulation in KVM, and then
run fastops_test.  It's well past time we had a selftest for this.  It won't
detect bugs that are specific to 32-bit kernels, e.g. b63f20a778c8 ("x86/retpoline:
Don't clobber RFLAGS during CALL_NOSPEC on i386"), since KVM selftests are 64-bit
only, but for what you're doing, it should suffice.

For 32-bit kernels, it requires a 32-bit QEMU and KVM-Unit-Tests (or maybe even
a full blown 32-bit guest image; I forget how much coverage KUT provides).
Regardless, I don't see any reason to put you through that pain, I can do that
sanity testing.

I'll post a proper patch for the new selftest after testing on AMD.  The test
relies on hardware providing deterministic behavior for undefined output (RFLAGS
and GPRs); I don't know if that holds true on AMD.

To enable forced emulation, set /sys/module/kvm/parameters/force_emulation_prefix
to '1' (for the purposes of this test, the value doesn't matter).  The param is
writable at runtime, so it doesn't matter if kvm.ko is built-in or a module. 

---
From: Sean Christopherson <seanjc@google.com>
Date: Mon, 28 Apr 2025 08:55:44 -0700
Subject: [PATCH] KVM: selftests: Add a test for x86's fastops emulation

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/Makefile.kvm      |   1 +
 .../testing/selftests/kvm/x86/fastops_test.c  | 165 ++++++++++++++++++
 2 files changed, 166 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/x86/fastops_test.c

diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm
index f62b0a5aba35..411c3d5eb5b1 100644
--- a/tools/testing/selftests/kvm/Makefile.kvm
+++ b/tools/testing/selftests/kvm/Makefile.kvm
@@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86 += x86/cr4_cpuid_sync_test
 TEST_GEN_PROGS_x86 += x86/dirty_log_page_splitting_test
 TEST_GEN_PROGS_x86 += x86/feature_msrs_test
 TEST_GEN_PROGS_x86 += x86/exit_on_emulation_failure_test
+TEST_GEN_PROGS_x86 += x86/fastops_test
 TEST_GEN_PROGS_x86 += x86/fix_hypercall_test
 TEST_GEN_PROGS_x86 += x86/hwcr_msr_test
 TEST_GEN_PROGS_x86 += x86/hyperv_clock
diff --git a/tools/testing/selftests/kvm/x86/fastops_test.c b/tools/testing/selftests/kvm/x86/fastops_test.c
new file mode 100644
index 000000000000..c3799edb5d0c
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86/fastops_test.c
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: GPL-2.0-only
+#include "test_util.h"
+#include "kvm_util.h"
+#include "processor.h"
+
+/*
+ * Execute a fastop() instruction, with or without forced emulation.  BT bit 0
+ * to set RFLAGS.CF based on whether or not the input is even or odd, so that
+ * instructions like ADC and SBB are deterministic.
+ */
+#define guest_execute_fastop_1(FEP, insn, type_t, __val, __flags)			\
+do {											\
+	__asm__ __volatile__("bt $0, %[val]\n\t"					\
+			     FEP insn " %[val]\n\t"					\
+			     "pushfq\n\t"						\
+			     "pop %[flags]\n\t"						\
+			     : [val]"+r"(__val), [flags]"=r"(__flags)			\
+			     : : "cc", "memory");					\
+} while (0)
+
+#define guest_test_fastop_1(insn, type_t, __val)					\
+do {											\
+	type_t val = __val, ex_val = __val, input = __val;				\
+	uint64_t flags, ex_flags;							\
+											\
+	guest_execute_fastop_1("", insn, type_t, ex_val, ex_flags);			\
+	guest_execute_fastop_1(KVM_FEP, insn, type_t, val, flags);			\
+											\
+	__GUEST_ASSERT(val == ex_val,							\
+		       "Wanted 0x%lx for '%s 0x%lx', got 0x%lx",			\
+		       (uint64_t)ex_val, insn, (uint64_t)input, (uint64_t)val);		\
+	__GUEST_ASSERT(flags == ex_flags,						\
+			"Wanted flags 0x%lx for '%s 0x%lx', got 0x%lx",			\
+			ex_flags, insn, (uint64_t)input, flags);			\
+} while (0)
+
+#define guest_execute_fastop_2(FEP, insn, type_t, __input, __output, __flags)		\
+do {											\
+	__asm__ __volatile__("bt $0, %[output]\n\t"					\
+			     FEP insn " %[input], %[output]\n\t"			\
+			     "pushfq\n\t"						\
+			     "pop %[flags]\n\t"						\
+			     : [output]"+r"(__output), [flags]"=r"(__flags)		\
+			     : [input]"r"(__input) : "cc", "memory");			\
+} while (0)
+
+#define guest_test_fastop_2(insn, type_t, __val1, __val2)				\
+do {											\
+	type_t input = __val1, input2 = __val2, output = __val2, ex_output = __val2;	\
+	uint64_t flags, ex_flags;							\
+											\
+	guest_execute_fastop_2("", insn, type_t, input, ex_output, ex_flags);		\
+	guest_execute_fastop_2(KVM_FEP, insn, type_t, input, output, flags);		\
+											\
+	__GUEST_ASSERT(output == ex_output,						\
+		       "Wanted 0x%lx for '%s 0x%lx 0x%lx', got 0x%lx",			\
+		       (uint64_t)ex_output, insn, (uint64_t)input,			\
+		       (uint64_t)input2, (uint64_t)output);				\
+	__GUEST_ASSERT(flags == ex_flags,						\
+			"Wanted flags 0x%lx for '%s 0x%lx, 0x%lx', got 0x%lx",		\
+			ex_flags, insn, (uint64_t)input, (uint64_t)input2, flags);	\
+} while (0)
+
+#define guest_execute_fastop_cl(FEP, insn, type_t, __shift, __output, __flags)		\
+do {											\
+	__asm__ __volatile__("bt $0, %[output]\n\t"					\
+			     FEP insn " %%cl, %[output]\n\t"				\
+			     "pushfq\n\t"						\
+			     "pop %[flags]\n\t"						\
+			     : [output]"+r"(__output), [flags]"=r"(__flags)		\
+			     : "c"(__shift) : "cc", "memory");				\
+} while (0)
+
+#define guest_test_fastop_cl(insn, type_t, __val1, __val2)				\
+do {											\
+	type_t output = __val2, ex_output = __val2, input = __val2;			\
+	uint8_t shift = __val1;								\
+	uint64_t flags, ex_flags;							\
+											\
+	guest_execute_fastop_cl("", insn, type_t, shift, ex_output, ex_flags);		\
+	guest_execute_fastop_cl(KVM_FEP, insn, type_t, shift, output, flags);		\
+											\
+	__GUEST_ASSERT(output == ex_output,						\
+		       "Wanted 0x%lx for '%s 0x%x, 0x%lx', got 0x%lx",			\
+		       (uint64_t)ex_output, insn, shift, (uint64_t)input,		\
+		       (uint64_t)output);						\
+	__GUEST_ASSERT(flags == ex_flags,						\
+			"Wanted flags 0x%lx for '%s 0x%x, 0x%lx', got 0x%lx",		\
+			ex_flags, insn, shift, (uint64_t)input, flags);			\
+} while (0)
+
+static const uint64_t vals[] = {
+	0,
+	1,
+	2,
+	4,
+	7,
+	0x5555555555555555,
+	0xaaaaaaaaaaaaaaaa,
+	0xfefefefefefefefe,
+	0xffffffffffffffff,
+};
+
+#define guest_test_fastops(type_t, suffix)						\
+do {											\
+	int i, j;									\
+											\
+	for (i = 0; i < ARRAY_SIZE(vals); i++) {					\
+		guest_test_fastop_1("dec" suffix, type_t, vals[i]);			\
+		guest_test_fastop_1("inc" suffix, type_t, vals[i]);			\
+		guest_test_fastop_1("neg" suffix, type_t, vals[i]);			\
+		guest_test_fastop_1("not" suffix, type_t, vals[i]);			\
+											\
+		for (j = 0; j < ARRAY_SIZE(vals); j++) {				\
+			guest_test_fastop_2("add" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("adc" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("and" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bsf" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bsr" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bt" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("btc" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("btr" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("bts" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("cmp" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("imul" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("or" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("sbb" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("sub" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("test" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_2("xor" suffix, type_t, vals[i], vals[j]);	\
+											\
+			guest_test_fastop_cl("rol" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("ror" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("rcl" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("rcr" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("sar" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("shl" suffix, type_t, vals[i], vals[j]);	\
+			guest_test_fastop_cl("shr" suffix, type_t, vals[i], vals[j]);	\
+		}									\
+	}										\
+} while (0)
+
+static void guest_code(void)
+{
+	guest_test_fastops(uint16_t, "w");
+	guest_test_fastops(uint32_t, "l");
+	guest_test_fastops(uint64_t, "q");
+
+	GUEST_DONE();
+}
+
+int main(int argc, char *argv[])
+{
+	struct kvm_vcpu *vcpu;
+	struct kvm_vm *vm;
+
+	TEST_REQUIRE(is_forced_emulation_enabled);
+
+	vm = vm_create_with_one_vcpu(&vcpu, guest_code);
+
+	vcpu_run(vcpu);
+	TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE);
+
+	kvm_vm_free(vm);
+}

base-commit: 661b7ddb2d10258b53106d7c39c309806b00a99c
-- 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-28 17:13             ` Sean Christopherson
@ 2025-04-29 10:09               ` Peter Zijlstra
  2025-04-29 14:05                 ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-29 10:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > 
> > > Yeah, I finally got there. I'll go cook up something else.
> > 
> > Sean, Paolo, can I once again ask how best to test this fastop crud?
> 
> Apply the below, build KVM selftests, 

Patch applied, my own hackery applied, host kernel built and booted,
foce_emulation_prefix set, but now I'm stuck at this seemingly simple
step..

$ cd tools/testing/selftests/kvm/
$ make
... metric ton of fail ...

Clearly I'm doing something wrong :/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-29 10:09               ` Peter Zijlstra
@ 2025-04-29 14:05                 ` Sean Christopherson
  2025-04-29 14:46                   ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-04-29 14:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> > On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > > 
> > > > Yeah, I finally got there. I'll go cook up something else.
> > > 
> > > Sean, Paolo, can I once again ask how best to test this fastop crud?
> > 
> > Apply the below, build KVM selftests, 
> 
> Patch applied, my own hackery applied, host kernel built and booted,
> foce_emulation_prefix set, but now I'm stuck at this seemingly simple
> step..
> 
> $ cd tools/testing/selftests/kvm/
> $ make
> ... metric ton of fail ...
> 
> Clearly I'm doing something wrong :/

Did you install headers in the top level directory?  I.e. make headers_install.
The selftests build system was change a while back to require users to manually
install headers (I forget why, but it is indeed annoying).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-29 14:05                 ` Sean Christopherson
@ 2025-04-29 14:46                   ` Peter Zijlstra
  2025-04-29 17:16                     ` Sean Christopherson
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-29 14:46 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, shuah

On Tue, Apr 29, 2025 at 07:05:35AM -0700, Sean Christopherson wrote:
> On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> > On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> > > On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > > > 
> > > > > Yeah, I finally got there. I'll go cook up something else.
> > > > 
> > > > Sean, Paolo, can I once again ask how best to test this fastop crud?
> > > 
> > > Apply the below, build KVM selftests, 
> > 
> > Patch applied, my own hackery applied, host kernel built and booted,
> > foce_emulation_prefix set, but now I'm stuck at this seemingly simple
> > step..
> > 
> > $ cd tools/testing/selftests/kvm/
> > $ make
> > ... metric ton of fail ...
> > 
> > Clearly I'm doing something wrong :/
> 
> Did you install headers in the top level directory?  I.e. make headers_install.

No, of course not :-) I don't use the top directory to build anything,
ever.

All my builds are into build directories, using make O=foo. This allows
me to do parallel builds for multiple architectures etc. Also, much
easier to wipe a complete build directory than it is to clean out the
top level dir.

> The selftests build system was change a while back to require users to manually
> install headers (I forget why, but it is indeed annoying).

Bah, I remember NAK-ing that. Clearly the selftest people don't want
selftests to be usable :-(

Anyway, mingo build me a copy of the fastop selftest, and aside from a
few stupid mistakes, I seem to now pass it \o/

I'll attempt a Changelog and update the hyper-v patches and then post
the lot.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops
  2025-04-29 14:46                   ` Peter Zijlstra
@ 2025-04-29 17:16                     ` Sean Christopherson
  0 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-04-29 17:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo,
	bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda, shuah

On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> On Tue, Apr 29, 2025 at 07:05:35AM -0700, Sean Christopherson wrote:
> > On Tue, Apr 29, 2025, Peter Zijlstra wrote:
> > > On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote:
> > > > On Sat, Apr 26, 2025, Peter Zijlstra wrote:
> > > > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote:
> > > > > 
> > > > > > Yeah, I finally got there. I'll go cook up something else.
> > > > > 
> > > > > Sean, Paolo, can I once again ask how best to test this fastop crud?
> > > > 
> > > > Apply the below, build KVM selftests, 
> > > 
> > > Patch applied, my own hackery applied, host kernel built and booted,
> > > foce_emulation_prefix set, but now I'm stuck at this seemingly simple
> > > step..
> > > 
> > > $ cd tools/testing/selftests/kvm/
> > > $ make
> > > ... metric ton of fail ...
> > > 
> > > Clearly I'm doing something wrong :/
> > 
> > Did you install headers in the top level directory?  I.e. make headers_install.
> 
> No, of course not :-) I don't use the top directory to build anything,
> ever.
> 
> All my builds are into build directories, using make O=foo. This allows
> me to do parallel builds for multiple architectures etc. Also, much
> easier to wipe a complete build directory than it is to clean out the
> top level dir.

FWIW, you can do the same with KVM selftests (and presumably others?), although
the syntax is kinda weird (no idea why lib.mk uses OUTPUT instead of O).

E.g. to build KVM selftests in $HOME/build/selftests/x86

  make O=$HOME/build/selftests/x86 headers_install
  make OUTPUT=$HOME/build/selftests/x86

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (2 preceding siblings ...)
  2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra
@ 2025-04-14 11:11 ` Peter Zijlstra
  2025-04-14 11:47   ` Peter Zijlstra
                     ` (2 more replies)
  2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
  2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra
  5 siblings, 3 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

What used to be a simple few instructions has turned into a giant mess
(for x86_64). Not only does it use static_branch wrong, it mixes it
with dynamic branches for no apparent reason.

Notably it uses static_branch through an out-of-line function call,
which completely defeats the purpose, since instead of a simple
JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is
absolutely idiotic.

Add to that a dynamic test of hyperv_paravisor_present, something
which is set once and never changed.

Replace all this idiocy with a single direct function call to the
right hypercall variant.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/hyperv/hv_init.c       |   21 ++++++
 arch/x86/hyperv/ivm.c           |   14 ++++
 arch/x86/include/asm/mshyperv.h |  137 +++++++++++-----------------------------
 arch/x86/kernel/cpu/mshyperv.c  |   18 +++--
 4 files changed, 88 insertions(+), 102 deletions(-)

--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -35,7 +35,28 @@
 #include <linux/highmem.h>
 
 void *hv_hypercall_pg;
+
+#ifdef CONFIG_X86_64
+u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
+{
+	u64 hv_status;
+
+	if (!hv_hypercall_pg)
+		return U64_MAX;
+
+	register u64 __r8 asm("r8") = param2;
+	asm volatile (CALL_NOSPEC
+		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+		        "+c" (control), "+d" (param1)
+		      : "r" (__r8),
+		        THUNK_TARGET(hv_hypercall_pg)
+		      : "cc", "memory", "r9", "r10", "r11");
+
+	return hv_status;
+}
+#else
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
+#endif
 
 union hv_ghcb * __percpu *hv_ghcb_pg;
 
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -376,6 +376,20 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon
 	return ret;
 }
 
+u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2)
+{
+	u64 hv_status;
+
+	register u64 __r8 asm("r8") = param2;
+	asm volatile("vmmcall"
+		     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
+		       "+c" (control), "+d" (param1)
+		     : "r" (__r8)
+		     : "cc", "memory", "r9", "r10", "r11");
+
+	return hv_status;
+}
+
 #else
 static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
 static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -6,6 +6,7 @@
 #include <linux/nmi.h>
 #include <linux/msi.h>
 #include <linux/io.h>
+#include <linux/static_call.h>
 #include <asm/nospec-branch.h>
 #include <asm/paravirt.h>
 #include <hyperv/hvhdk.h>
@@ -39,15 +40,20 @@ static inline unsigned char hv_get_nmi_r
 }
 
 #if IS_ENABLED(CONFIG_HYPERV)
-extern bool hyperv_paravisor_present;
-
 extern void *hv_hypercall_pg;
 
 extern union hv_ghcb * __percpu *hv_ghcb_pg;
 
 bool hv_isolation_type_snp(void);
 bool hv_isolation_type_tdx(void);
-u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+
+#ifdef CONFIG_X86_64
+extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
+extern u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2);
+
+DECLARE_STATIC_CALL(hv_hypercall, hv_pg_hypercall);
+#endif
 
 /*
  * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
@@ -64,37 +70,15 @@ static inline u64 hv_do_hypercall(u64 co
 {
 	u64 input_address = input ? virt_to_phys(input) : 0;
 	u64 output_address = output ? virt_to_phys(output) : 0;
-	u64 hv_status;
 
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
-		return hv_tdx_hypercall(control, input_address, output_address);
-
-	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
-		__asm__ __volatile__("mov %[output_address], %%r8\n"
-				     "vmmcall"
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input_address)
-				     : [output_address] "r" (output_address)
-				     : "cc", "memory", "r8", "r9", "r10", "r11");
-		return hv_status;
-	}
-
-	if (!hv_hypercall_pg)
-		return U64_MAX;
-
-	__asm__ __volatile__("mov %[output_address], %%r8\n"
-			     CALL_NOSPEC
-			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-			       "+c" (control), "+d" (input_address)
-			     : [output_address] "r" (output_address),
-			       THUNK_TARGET(hv_hypercall_pg)
-			     : "cc", "memory", "r8", "r9", "r10", "r11");
+	return static_call_mod(hv_hypercall)(control, input_address, output_address);
 #else
 	u32 input_address_hi = upper_32_bits(input_address);
 	u32 input_address_lo = lower_32_bits(input_address);
 	u32 output_address_hi = upper_32_bits(output_address);
 	u32 output_address_lo = lower_32_bits(output_address);
+	u64 hv_status;
 
 	if (!hv_hypercall_pg)
 		return U64_MAX;
@@ -107,8 +91,8 @@ static inline u64 hv_do_hypercall(u64 co
 			       "D"(output_address_hi), "S"(output_address_lo),
 			       THUNK_TARGET(hv_hypercall_pg)
 			     : "cc", "memory");
-#endif /* !x86_64 */
 	return hv_status;
+#endif /* !x86_64 */
 }
 
 /* Hypercall to the L0 hypervisor */
@@ -120,41 +104,23 @@ static inline u64 hv_do_nested_hypercall
 /* Fast hypercall with 8 bytes of input and no output */
 static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
 {
-	u64 hv_status;
-
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
-		return hv_tdx_hypercall(control, input1, 0);
-
-	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
-		__asm__ __volatile__(
-				"vmmcall"
-				: "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				"+c" (control), "+d" (input1)
-				:: "cc", "r8", "r9", "r10", "r11");
-	} else {
-		__asm__ __volatile__(CALL_NOSPEC
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input1)
-				     : THUNK_TARGET(hv_hypercall_pg)
-				     : "cc", "r8", "r9", "r10", "r11");
-	}
+	return static_call_mod(hv_hypercall)(control, input1, 0);
 #else
-	{
-		u32 input1_hi = upper_32_bits(input1);
-		u32 input1_lo = lower_32_bits(input1);
-
-		__asm__ __volatile__ (CALL_NOSPEC
-				      : "=A"(hv_status),
-					"+c"(input1_lo),
-					ASM_CALL_CONSTRAINT
-				      :	"A" (control),
-					"b" (input1_hi),
-					THUNK_TARGET(hv_hypercall_pg)
-				      : "cc", "edi", "esi");
-	}
-#endif
+	u32 input1_hi = upper_32_bits(input1);
+	u32 input1_lo = lower_32_bits(input1);
+	u64 hv_status;
+
+	__asm__ __volatile__ (CALL_NOSPEC
+			      : "=A"(hv_status),
+			      "+c"(input1_lo),
+			      ASM_CALL_CONSTRAINT
+			      :	"A" (control),
+			      "b" (input1_hi),
+			      THUNK_TARGET(hv_hypercall_pg)
+			      : "cc", "edi", "esi");
 	return hv_status;
+#endif
 }
 
 static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
@@ -174,45 +140,24 @@ static inline u64 hv_do_fast_nested_hype
 /* Fast hypercall with 16 bytes of input */
 static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
 {
-	u64 hv_status;
-
 #ifdef CONFIG_X86_64
-	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
-		return hv_tdx_hypercall(control, input1, input2);
-
-	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
-		__asm__ __volatile__("mov %[input2], %%r8\n"
-				     "vmmcall"
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input1)
-				     : [input2] "r" (input2)
-				     : "cc", "r8", "r9", "r10", "r11");
-	} else {
-		__asm__ __volatile__("mov %[input2], %%r8\n"
-				     CALL_NOSPEC
-				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
-				       "+c" (control), "+d" (input1)
-				     : [input2] "r" (input2),
-				       THUNK_TARGET(hv_hypercall_pg)
-				     : "cc", "r8", "r9", "r10", "r11");
-	}
+	return static_call_mod(hv_hypercall)(control, input1, input2);
 #else
-	{
-		u32 input1_hi = upper_32_bits(input1);
-		u32 input1_lo = lower_32_bits(input1);
-		u32 input2_hi = upper_32_bits(input2);
-		u32 input2_lo = lower_32_bits(input2);
-
-		__asm__ __volatile__ (CALL_NOSPEC
-				      : "=A"(hv_status),
-					"+c"(input1_lo), ASM_CALL_CONSTRAINT
-				      :	"A" (control), "b" (input1_hi),
-					"D"(input2_hi), "S"(input2_lo),
-					THUNK_TARGET(hv_hypercall_pg)
-				      : "cc");
-	}
-#endif
+	u32 input1_hi = upper_32_bits(input1);
+	u32 input1_lo = lower_32_bits(input1);
+	u32 input2_hi = upper_32_bits(input2);
+	u32 input2_lo = lower_32_bits(input2);
+	u64 hv_status;
+
+	__asm__ __volatile__ (CALL_NOSPEC
+			      : "=A"(hv_status),
+			      "+c"(input1_lo), ASM_CALL_CONSTRAINT
+			      :	"A" (control), "b" (input1_hi),
+			      "D"(input2_hi), "S"(input2_lo),
+			      THUNK_TARGET(hv_hypercall_pg)
+			      : "cc");
 	return hv_status;
+#endif
 }
 
 static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -37,10 +37,6 @@
 bool hv_nested;
 struct ms_hyperv_info ms_hyperv;
 
-/* Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h */
-bool hyperv_paravisor_present __ro_after_init;
-EXPORT_SYMBOL_GPL(hyperv_paravisor_present);
-
 #if IS_ENABLED(CONFIG_HYPERV)
 static inline unsigned int hv_get_nested_msr(unsigned int reg)
 {
@@ -287,6 +283,11 @@ static void __init x86_setup_ops_for_tsc
 	old_restore_sched_clock_state = x86_platform.restore_sched_clock_state;
 	x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state;
 }
+
+#ifdef CONFIG_X86_64
+DEFINE_STATIC_CALL(hv_hypercall, hv_pg_hypercall);
+EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall);
+#endif
 #endif /* CONFIG_HYPERV */
 
 static uint32_t  __init ms_hyperv_platform(void)
@@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo
 			ms_hyperv.shared_gpa_boundary =
 				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
 
-		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
-
 		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
 			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
 
 
 		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
 			static_branch_enable(&isolation_type_snp);
+#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV)
+			if (!ms_hyperv.paravisor_present)
+				static_call_update(hv_hypercall, hv_snp_hypercall);
+#endif
 		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
 			static_branch_enable(&isolation_type_tdx);
 
@@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo
 			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
 
 			if (!ms_hyperv.paravisor_present) {
+#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV)
+				static_call_update(hv_hypercall, hv_tdx_hypercall);
+#endif
 				/*
 				 * Mark the Hyper-V TSC page feature as disabled
 				 * in a TDX VM without paravisor so that the



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
@ 2025-04-14 11:47   ` Peter Zijlstra
  2025-04-14 14:06   ` Uros Bizjak
  2025-04-21 18:27   ` Michael Kelley
  2 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:47 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 14, 2025 at 01:11:44PM +0200, Peter Zijlstra wrote:

> +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	u64 hv_status;
> +
> +	if (!hv_hypercall_pg)
> +		return U64_MAX;
> +
> +	register u64 __r8 asm("r8") = param2;
> +	asm volatile (CALL_NOSPEC
> +		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +		        "+c" (control), "+d" (param1)
> +		      : "r" (__r8),
> +		        THUNK_TARGET(hv_hypercall_pg)
> +		      : "cc", "memory", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}

I've tried making this a tail-call, but even without the
ASM_CALL_CONSTRAINT on it confuses the compiler (objtool warns on
frame-pointer builds about non-matching stack).

I could of course have written the entire thing in asm, but meh.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
  2025-04-14 11:47   ` Peter Zijlstra
@ 2025-04-14 14:06   ` Uros Bizjak
  2025-04-14 14:08     ` Peter Zijlstra
  2025-04-21 18:27   ` Michael Kelley
  2 siblings, 1 reply; 32+ messages in thread
From: Uros Bizjak @ 2025-04-14 14:06 UTC (permalink / raw)
  To: Peter Zijlstra, x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda



On 14. 04. 25 13:11, Peter Zijlstra wrote:
> What used to be a simple few instructions has turned into a giant mess
> (for x86_64). Not only does it use static_branch wrong, it mixes it
> with dynamic branches for no apparent reason.
> 
> Notably it uses static_branch through an out-of-line function call,
> which completely defeats the purpose, since instead of a simple
> JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is
> absolutely idiotic.
> 
> Add to that a dynamic test of hyperv_paravisor_present, something
> which is set once and never changed.
> 
> Replace all this idiocy with a single direct function call to the
> right hypercall variant.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>   arch/x86/hyperv/hv_init.c       |   21 ++++++
>   arch/x86/hyperv/ivm.c           |   14 ++++
>   arch/x86/include/asm/mshyperv.h |  137 +++++++++++-----------------------------
>   arch/x86/kernel/cpu/mshyperv.c  |   18 +++--
>   4 files changed, 88 insertions(+), 102 deletions(-)
> 
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -35,7 +35,28 @@
>   #include <linux/highmem.h>
>   
>   void *hv_hypercall_pg;
> +
> +#ifdef CONFIG_X86_64
> +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	u64 hv_status;
> +
> +	if (!hv_hypercall_pg)
> +		return U64_MAX;
> +
> +	register u64 __r8 asm("r8") = param2;
> +	asm volatile (CALL_NOSPEC
> +		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +		        "+c" (control), "+d" (param1)
> +		      : "r" (__r8),

r8 is call-clobbered register, so you should use "+r" (__r8) to properly 
clobber it:

		        "+c" (control), "+d" (param1), "+r" (__r8)
		      : THUNK_TARGET(hv_hypercall_pg)

> +		      : "cc", "memory", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}
> +#else
>   EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> +#endif
>   
>   union hv_ghcb * __percpu *hv_ghcb_pg;
>   
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -376,6 +376,20 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon
>   	return ret;
>   }
>   
> +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	u64 hv_status;
> +
> +	register u64 __r8 asm("r8") = param2;
> +	asm volatile("vmmcall"
> +		     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +		       "+c" (control), "+d" (param1)
> +		     : "r" (__r8)

Also here:
		        "+c" (control), "+d" (param1), "+r" (__r8)
		      :

> +		     : "cc", "memory", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}

Uros.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-14 14:06   ` Uros Bizjak
@ 2025-04-14 14:08     ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 14:08 UTC (permalink / raw)
  To: Uros Bizjak
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 14, 2025 at 04:06:41PM +0200, Uros Bizjak wrote:
> 
> 
> On 14. 04. 25 13:11, Peter Zijlstra wrote:
> > What used to be a simple few instructions has turned into a giant mess
> > (for x86_64). Not only does it use static_branch wrong, it mixes it
> > with dynamic branches for no apparent reason.
> > 
> > Notably it uses static_branch through an out-of-line function call,
> > which completely defeats the purpose, since instead of a simple
> > JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is
> > absolutely idiotic.
> > 
> > Add to that a dynamic test of hyperv_paravisor_present, something
> > which is set once and never changed.
> > 
> > Replace all this idiocy with a single direct function call to the
> > right hypercall variant.
> > 
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> >   arch/x86/hyperv/hv_init.c       |   21 ++++++
> >   arch/x86/hyperv/ivm.c           |   14 ++++
> >   arch/x86/include/asm/mshyperv.h |  137 +++++++++++-----------------------------
> >   arch/x86/kernel/cpu/mshyperv.c  |   18 +++--
> >   4 files changed, 88 insertions(+), 102 deletions(-)
> > 
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -35,7 +35,28 @@
> >   #include <linux/highmem.h>
> >   void *hv_hypercall_pg;
> > +
> > +#ifdef CONFIG_X86_64
> > +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
> > +{
> > +	u64 hv_status;
> > +
> > +	if (!hv_hypercall_pg)
> > +		return U64_MAX;
> > +
> > +	register u64 __r8 asm("r8") = param2;
> > +	asm volatile (CALL_NOSPEC
> > +		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> > +		        "+c" (control), "+d" (param1)
> > +		      : "r" (__r8),
> 
> r8 is call-clobbered register, so you should use "+r" (__r8) to properly
> clobber it:

Ah, okay. 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
  2025-04-14 11:47   ` Peter Zijlstra
  2025-04-14 14:06   ` Uros Bizjak
@ 2025-04-21 18:27   ` Michael Kelley
  2025-04-25 13:50     ` Peter Zijlstra
  2025-04-29 15:18     ` Peter Zijlstra
  2 siblings, 2 replies; 32+ messages in thread
From: Michael Kelley @ 2025-04-21 18:27 UTC (permalink / raw)
  To: Peter Zijlstra, x86@kernel.org
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com,
	seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org,
	kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Monday, April 14, 2025 4:12 AM
> 
> What used to be a simple few instructions has turned into a giant mess
> (for x86_64). Not only does it use static_branch wrong, it mixes it
> with dynamic branches for no apparent reason.
> 
> Notably it uses static_branch through an out-of-line function call,
> which completely defeats the purpose, since instead of a simple
> JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is
> absolutely idiotic.
> 
> Add to that a dynamic test of hyperv_paravisor_present, something
> which is set once and never changed.
> 
> Replace all this idiocy with a single direct function call to the
> right hypercall variant.

This did indeed need cleaning after all the CoCo VM and paravisor
stuff got added. Thanks for doing it.

From looking at the code changes, I believe the 32-bit hypercall paths
are unchanged, as they weren't affected the CoCo VM and paravisor
additions. Perhaps explicitly state that intent in the commit message.

I've tested this patch set against linux-next-20250411 on normal Hyper-V
guests. Basic smoke tests pass, along with taking a panic, and
suspend/resume for guest hibernation. But getting into kdump after a
panic does not work. See comments in Patch 5 for the likely reason why.

I've also tested SNP and TDX VMs with a paravisor, and basic smoke
tests pass. But I'm testing in the Azure cloud, and I don't have access to an
environment where I can test without a paravisor. So my testing doesn't
cover the SNP and TDX specific static call paths. Maybe someone at
Microsoft can test that configuration.

> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/hyperv/hv_init.c       |   21 ++++++
>  arch/x86/hyperv/ivm.c           |   14 ++++
>  arch/x86/include/asm/mshyperv.h |  137 +++++++++++-----------------------------
>  arch/x86/kernel/cpu/mshyperv.c  |   18 +++--
>  4 files changed, 88 insertions(+), 102 deletions(-)
> 
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -35,7 +35,28 @@
>  #include <linux/highmem.h>
> 
>  void *hv_hypercall_pg;
> +
> +#ifdef CONFIG_X86_64
> +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)

Could this get a different name so we don't have the confusion of
hv_hypercall_pg vs hv_pg_hypercall?  Some possibilities:

hv_std_hypercall
hv_basic_hypercall
hv_core_hypercall
hv_normal_hypercall
hv_simple_hypercall

> +{
> +	u64 hv_status;
> +
> +	if (!hv_hypercall_pg)
> +		return U64_MAX;
> +
> +	register u64 __r8 asm("r8") = param2;
> +	asm volatile (CALL_NOSPEC
> +		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +		        "+c" (control), "+d" (param1)
> +		      : "r" (__r8),
> +		        THUNK_TARGET(hv_hypercall_pg)
> +		      : "cc", "memory", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}
> +#else
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> +#endif
> 
>  union hv_ghcb * __percpu *hv_ghcb_pg;
> 
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -376,6 +376,20 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon
>  	return ret;
>  }
> 
> +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2)
> +{
> +	u64 hv_status;
> +
> +	register u64 __r8 asm("r8") = param2;
> +	asm volatile("vmmcall"
> +		     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> +		       "+c" (control), "+d" (param1)
> +		     : "r" (__r8)
> +		     : "cc", "memory", "r9", "r10", "r11");
> +
> +	return hv_status;
> +}
> +
>  #else
>  static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
>  static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -6,6 +6,7 @@
>  #include <linux/nmi.h>
>  #include <linux/msi.h>
>  #include <linux/io.h>
> +#include <linux/static_call.h>
>  #include <asm/nospec-branch.h>
>  #include <asm/paravirt.h>
>  #include <hyperv/hvhdk.h>
> @@ -39,15 +40,20 @@ static inline unsigned char hv_get_nmi_r
>  }
> 
>  #if IS_ENABLED(CONFIG_HYPERV)
> -extern bool hyperv_paravisor_present;
> -
>  extern void *hv_hypercall_pg;
> 
>  extern union hv_ghcb * __percpu *hv_ghcb_pg;
> 
>  bool hv_isolation_type_snp(void);
>  bool hv_isolation_type_tdx(void);
> -u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +
> +#ifdef CONFIG_X86_64
> +extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
> +extern u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2);
> +
> +DECLARE_STATIC_CALL(hv_hypercall, hv_pg_hypercall);
> +#endif
> 
>  /*
>   * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA
> @@ -64,37 +70,15 @@ static inline u64 hv_do_hypercall(u64 co
>  {
>  	u64 input_address = input ? virt_to_phys(input) : 0;
>  	u64 output_address = output ? virt_to_phys(output) : 0;
> -	u64 hv_status;
> 
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
> -		return hv_tdx_hypercall(control, input_address, output_address);
> -
> -	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
> -		__asm__ __volatile__("mov %[output_address], %%r8\n"
> -				     "vmmcall"
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input_address)
> -				     : [output_address] "r" (output_address)
> -				     : "cc", "memory", "r8", "r9", "r10", "r11");
> -		return hv_status;
> -	}
> -
> -	if (!hv_hypercall_pg)
> -		return U64_MAX;
> -
> -	__asm__ __volatile__("mov %[output_address], %%r8\n"
> -			     CALL_NOSPEC
> -			     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -			       "+c" (control), "+d" (input_address)
> -			     : [output_address] "r" (output_address),
> -			       THUNK_TARGET(hv_hypercall_pg)
> -			     : "cc", "memory", "r8", "r9", "r10", "r11");
> +	return static_call_mod(hv_hypercall)(control, input_address, output_address);
>  #else
>  	u32 input_address_hi = upper_32_bits(input_address);
>  	u32 input_address_lo = lower_32_bits(input_address);
>  	u32 output_address_hi = upper_32_bits(output_address);
>  	u32 output_address_lo = lower_32_bits(output_address);
> +	u64 hv_status;
> 
>  	if (!hv_hypercall_pg)
>  		return U64_MAX;
> @@ -107,8 +91,8 @@ static inline u64 hv_do_hypercall(u64 co
>  			       "D"(output_address_hi), "S"(output_address_lo),
>  			       THUNK_TARGET(hv_hypercall_pg)
>  			     : "cc", "memory");
> -#endif /* !x86_64 */
>  	return hv_status;
> +#endif /* !x86_64 */
>  }
> 
>  /* Hypercall to the L0 hypervisor */
> @@ -120,41 +104,23 @@ static inline u64 hv_do_nested_hypercall
>  /* Fast hypercall with 8 bytes of input and no output */
>  static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1)
>  {
> -	u64 hv_status;
> -
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
> -		return hv_tdx_hypercall(control, input1, 0);
> -
> -	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
> -		__asm__ __volatile__(
> -				"vmmcall"
> -				: "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				"+c" (control), "+d" (input1)
> -				:: "cc", "r8", "r9", "r10", "r11");
> -	} else {
> -		__asm__ __volatile__(CALL_NOSPEC
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input1)
> -				     : THUNK_TARGET(hv_hypercall_pg)
> -				     : "cc", "r8", "r9", "r10", "r11");
> -	}
> +	return static_call_mod(hv_hypercall)(control, input1, 0);
>  #else
> -	{
> -		u32 input1_hi = upper_32_bits(input1);
> -		u32 input1_lo = lower_32_bits(input1);
> -
> -		__asm__ __volatile__ (CALL_NOSPEC
> -				      : "=A"(hv_status),
> -					"+c"(input1_lo),
> -					ASM_CALL_CONSTRAINT
> -				      :	"A" (control),
> -					"b" (input1_hi),
> -					THUNK_TARGET(hv_hypercall_pg)
> -				      : "cc", "edi", "esi");
> -	}
> -#endif
> +	u32 input1_hi = upper_32_bits(input1);
> +	u32 input1_lo = lower_32_bits(input1);
> +	u64 hv_status;
> +
> +	__asm__ __volatile__ (CALL_NOSPEC
> +			      : "=A"(hv_status),
> +			      "+c"(input1_lo),
> +			      ASM_CALL_CONSTRAINT
> +			      :	"A" (control),
> +			      "b" (input1_hi),
> +			      THUNK_TARGET(hv_hypercall_pg)
> +			      : "cc", "edi", "esi");
>  	return hv_status;
> +#endif
>  }
> 
>  static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1)
> @@ -174,45 +140,24 @@ static inline u64 hv_do_fast_nested_hype
>  /* Fast hypercall with 16 bytes of input */
>  static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2)
>  {
> -	u64 hv_status;
> -
>  #ifdef CONFIG_X86_64
> -	if (hv_isolation_type_tdx() && !hyperv_paravisor_present)
> -		return hv_tdx_hypercall(control, input1, input2);
> -
> -	if (hv_isolation_type_snp() && !hyperv_paravisor_present) {
> -		__asm__ __volatile__("mov %[input2], %%r8\n"
> -				     "vmmcall"
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input1)
> -				     : [input2] "r" (input2)
> -				     : "cc", "r8", "r9", "r10", "r11");
> -	} else {
> -		__asm__ __volatile__("mov %[input2], %%r8\n"
> -				     CALL_NOSPEC
> -				     : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> -				       "+c" (control), "+d" (input1)
> -				     : [input2] "r" (input2),
> -				       THUNK_TARGET(hv_hypercall_pg)
> -				     : "cc", "r8", "r9", "r10", "r11");
> -	}
> +	return static_call_mod(hv_hypercall)(control, input1, input2);
>  #else
> -	{
> -		u32 input1_hi = upper_32_bits(input1);
> -		u32 input1_lo = lower_32_bits(input1);
> -		u32 input2_hi = upper_32_bits(input2);
> -		u32 input2_lo = lower_32_bits(input2);
> -
> -		__asm__ __volatile__ (CALL_NOSPEC
> -				      : "=A"(hv_status),
> -					"+c"(input1_lo), ASM_CALL_CONSTRAINT
> -				      :	"A" (control), "b" (input1_hi),
> -					"D"(input2_hi), "S"(input2_lo),
> -					THUNK_TARGET(hv_hypercall_pg)
> -				      : "cc");
> -	}
> -#endif
> +	u32 input1_hi = upper_32_bits(input1);
> +	u32 input1_lo = lower_32_bits(input1);
> +	u32 input2_hi = upper_32_bits(input2);
> +	u32 input2_lo = lower_32_bits(input2);
> +	u64 hv_status;
> +
> +	__asm__ __volatile__ (CALL_NOSPEC
> +			      : "=A"(hv_status),
> +			      "+c"(input1_lo), ASM_CALL_CONSTRAINT
> +			      :	"A" (control), "b" (input1_hi),
> +			      "D"(input2_hi), "S"(input2_lo),
> +			      THUNK_TARGET(hv_hypercall_pg)
> +			      : "cc");
>  	return hv_status;
> +#endif
>  }
> 
>  static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2)
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -37,10 +37,6 @@
>  bool hv_nested;
>  struct ms_hyperv_info ms_hyperv;
> 
> -/* Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h */
> -bool hyperv_paravisor_present __ro_after_init;
> -EXPORT_SYMBOL_GPL(hyperv_paravisor_present);
> -
>  #if IS_ENABLED(CONFIG_HYPERV)
>  static inline unsigned int hv_get_nested_msr(unsigned int reg)
>  {
> @@ -287,6 +283,11 @@ static void __init x86_setup_ops_for_tsc
>  	old_restore_sched_clock_state = x86_platform.restore_sched_clock_state;
>  	x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state;
>  }
> +
> +#ifdef CONFIG_X86_64
> +DEFINE_STATIC_CALL(hv_hypercall, hv_pg_hypercall);
> +EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall);
> +#endif
>  #endif /* CONFIG_HYPERV */
> 
>  static uint32_t  __init ms_hyperv_platform(void)
> @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo
>  			ms_hyperv.shared_gpa_boundary =
>  				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> 
> -		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
> -
>  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
>  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> 
> 
>  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>  			static_branch_enable(&isolation_type_snp);
> +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV)
> +			if (!ms_hyperv.paravisor_present)
> +				static_call_update(hv_hypercall, hv_snp_hypercall);
> +#endif

This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding
stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static
call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to
not do that?

>  		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
>  			static_branch_enable(&isolation_type_tdx);
> 
> @@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo
>  			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> 
>  			if (!ms_hyperv.paravisor_present) {
> +#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV)
> +				static_call_update(hv_hypercall, hv_tdx_hypercall);
> +#endif
>  				/*
>  				 * Mark the Hyper-V TSC page feature as disabled
>  				 * in a TDX VM without paravisor so that the
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-21 18:27   ` Michael Kelley
@ 2025-04-25 13:50     ` Peter Zijlstra
  2025-04-29 15:18     ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-25 13:50 UTC (permalink / raw)
  To: Michael Kelley
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, jpoimboe@kernel.org,
	pawan.kumar.gupta@linux.intel.com, seanjc@google.com,
	pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org,
	Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

On Mon, Apr 21, 2025 at 06:27:57PM +0000, Michael Kelley wrote:
> From: Peter Zijlstra <peterz@infradead.org> Sent: Monday, April 14, 2025 4:12 AM
> > 
> > What used to be a simple few instructions has turned into a giant mess
> > (for x86_64). Not only does it use static_branch wrong, it mixes it
> > with dynamic branches for no apparent reason.
> > 
> > Notably it uses static_branch through an out-of-line function call,
> > which completely defeats the purpose, since instead of a simple
> > JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is
> > absolutely idiotic.
> > 
> > Add to that a dynamic test of hyperv_paravisor_present, something
> > which is set once and never changed.
> > 
> > Replace all this idiocy with a single direct function call to the
> > right hypercall variant.
> 
> This did indeed need cleaning after all the CoCo VM and paravisor
> stuff got added. Thanks for doing it.
> 
> From looking at the code changes, I believe the 32-bit hypercall paths
> are unchanged, as they weren't affected the CoCo VM and paravisor
> additions. Perhaps explicitly state that intent in the commit message.
> 
> I've tested this patch set against linux-next-20250411 on normal Hyper-V
> guests. Basic smoke tests pass, along with taking a panic, and
> suspend/resume for guest hibernation. But getting into kdump after a
> panic does not work. See comments in Patch 5 for the likely reason why.
> 
> I've also tested SNP and TDX VMs with a paravisor, and basic smoke
> tests pass. But I'm testing in the Azure cloud, and I don't have access to an
> environment where I can test without a paravisor. So my testing doesn't
> cover the SNP and TDX specific static call paths. Maybe someone at
> Microsoft can test that configuration.

Excellent, thanks!


> > +#ifdef CONFIG_X86_64
> > +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
> 
> Could this get a different name so we don't have the confusion of
> hv_hypercall_pg vs hv_pg_hypercall?  Some possibilities:
> 
> hv_std_hypercall
> hv_basic_hypercall
> hv_core_hypercall
> hv_normal_hypercall
> hv_simple_hypercall

Sure, I'll throw a dice an pick one ;-)


> > @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo
> >  			ms_hyperv.shared_gpa_boundary =
> >  				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> > 
> > -		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
> > -
> >  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
> >  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> > 
> > 
> >  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> >  			static_branch_enable(&isolation_type_snp);
> > +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV)
> > +			if (!ms_hyperv.paravisor_present)
> > +				static_call_update(hv_hypercall, hv_snp_hypercall);
> > +#endif
> 
> This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding
> stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static
> call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to
> not do that?

I'll try and make it so.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-21 18:27   ` Michael Kelley
  2025-04-25 13:50     ` Peter Zijlstra
@ 2025-04-29 15:18     ` Peter Zijlstra
  2025-04-29 20:36       ` Michael Kelley
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-29 15:18 UTC (permalink / raw)
  To: Michael Kelley
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, jpoimboe@kernel.org,
	pawan.kumar.gupta@linux.intel.com, seanjc@google.com,
	pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org,
	Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

On Mon, Apr 21, 2025 at 06:27:57PM +0000, Michael Kelley wrote:

> > @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo
> >  			ms_hyperv.shared_gpa_boundary =
> >  				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> > 
> > -		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
> > -
> >  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
> >  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> > 
> > 
> >  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> >  			static_branch_enable(&isolation_type_snp);
> > +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV)
> > +			if (!ms_hyperv.paravisor_present)
> > +				static_call_update(hv_hypercall, hv_snp_hypercall);
> > +#endif
> 
> This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding
> stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static
> call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to
> not do that?
> 
> >  		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
> >  			static_branch_enable(&isolation_type_tdx);
> > 
> > @@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo
> >  			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> > 
> >  			if (!ms_hyperv.paravisor_present) {
> > +#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV)
> > +				static_call_update(hv_hypercall, hv_tdx_hypercall);
> > +#endif
> >  				/*
> >  				 * Mark the Hyper-V TSC page feature as disabled
> >  				 * in a TDX VM without paravisor so that the
> > 
> > 

I've ended up with the below.. I thought it a waste to make all that
stuff available to 32bit and !HYPERV.


--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -392,6 +392,7 @@ u64 hv_snp_hypercall(u64 control, u64 pa
 #else
 static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
 static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
+u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) {}
 #endif /* CONFIG_AMD_MEM_ENCRYPT */
 
 #ifdef CONFIG_INTEL_TDX_GUEST
@@ -441,6 +442,7 @@ u64 hv_tdx_hypercall(u64 control, u64 pa
 #else
 static inline void hv_tdx_msr_write(u64 msr, u64 value) {}
 static inline void hv_tdx_msr_read(u64 msr, u64 *value) {}
+u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2) {}
 #endif /* CONFIG_INTEL_TDX_GUEST */
 
 #if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST)
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -39,6 +39,10 @@ static inline unsigned char hv_get_nmi_r
 	return 0;
 }
 
+extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
+extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
+extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2);
+
 #if IS_ENABLED(CONFIG_HYPERV)
 extern void *hv_hypercall_pg;
 
@@ -48,10 +52,6 @@ bool hv_isolation_type_snp(void);
 bool hv_isolation_type_tdx(void);
 
 #ifdef CONFIG_X86_64
-extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
-extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
-extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2);
-
 DECLARE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
 #endif
 
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -287,9 +287,14 @@ static void __init x86_setup_ops_for_tsc
 #ifdef CONFIG_X86_64
 DEFINE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
 EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall);
+#define hypercall_update(hc) static_call_update(hv_hypercall, hc)
 #endif
 #endif /* CONFIG_HYPERV */
 
+#ifndef hypercall_update
+#define hypercall_update(hc) (void)hc
+#endif
+
 static uint32_t  __init ms_hyperv_platform(void)
 {
 	u32 eax;
@@ -490,10 +495,8 @@ static void __init ms_hyperv_init_platfo
 
 		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
 			static_branch_enable(&isolation_type_snp);
-#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV)
 			if (!ms_hyperv.paravisor_present)
-				static_call_update(hv_hypercall, hv_snp_hypercall);
-#endif
+				hypercall_update(hv_snp_hypercall);
 		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
 			static_branch_enable(&isolation_type_tdx);
 
@@ -501,9 +504,7 @@ static void __init ms_hyperv_init_platfo
 			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
 
 			if (!ms_hyperv.paravisor_present) {
-#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV)
-				static_call_update(hv_hypercall, hv_tdx_hypercall);
-#endif
+				hypercall_update(hv_tdx_hypercall);
 				/*
 				 * Mark the Hyper-V TSC page feature as disabled
 				 * in a TDX VM without paravisor so that the

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall()
  2025-04-29 15:18     ` Peter Zijlstra
@ 2025-04-29 20:36       ` Michael Kelley
  0 siblings, 0 replies; 32+ messages in thread
From: Michael Kelley @ 2025-04-29 20:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, jpoimboe@kernel.org,
	pawan.kumar.gupta@linux.intel.com, seanjc@google.com,
	pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org,
	Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Tuesday, April 29, 2025 8:18 AM
> 
> On Mon, Apr 21, 2025 at 06:27:57PM +0000, Michael Kelley wrote:
> 
> > > @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo
> > >  			ms_hyperv.shared_gpa_boundary =
> > >  				BIT_ULL(ms_hyperv.shared_gpa_boundary_bits);
> > >
> > > -		hyperv_paravisor_present = !!ms_hyperv.paravisor_present;
> > > -
> > >  		pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n",
> > >  			ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b);
> > >
> > >
> > >  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
> > >  			static_branch_enable(&isolation_type_snp);
> > > +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV)
> > > +			if (!ms_hyperv.paravisor_present)
> > > +				static_call_update(hv_hypercall, hv_snp_hypercall);
> > > +#endif
> >
> > This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding
> > stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static
> > call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to
> > not do that?
> >
> > >  		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
> > >  			static_branch_enable(&isolation_type_tdx);
> > >
> > > @@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo
> > >  			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> > >
> > >  			if (!ms_hyperv.paravisor_present) {
> > > +#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV)
> > > +				static_call_update(hv_hypercall, hv_tdx_hypercall);
> > > +#endif
> > >  				/*
> > >  				 * Mark the Hyper-V TSC page feature as disabled
> > >  				 * in a TDX VM without paravisor so that the
> > >
> > >
> 
> I've ended up with the below.. I thought it a waste to make all that
> stuff available to 32bit and !HYPERV.
> 
> 
> --- a/arch/x86/hyperv/ivm.c
> +++ b/arch/x86/hyperv/ivm.c
> @@ -392,6 +392,7 @@ u64 hv_snp_hypercall(u64 control, u64 pa
>  #else
>  static inline void hv_ghcb_msr_write(u64 msr, u64 value) {}
>  static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {}
> +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) {}
>  #endif /* CONFIG_AMD_MEM_ENCRYPT */
> 
>  #ifdef CONFIG_INTEL_TDX_GUEST
> @@ -441,6 +442,7 @@ u64 hv_tdx_hypercall(u64 control, u64 pa
>  #else
>  static inline void hv_tdx_msr_write(u64 msr, u64 value) {}
>  static inline void hv_tdx_msr_read(u64 msr, u64 *value) {}
> +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2) {}
>  #endif /* CONFIG_INTEL_TDX_GUEST */
> 
>  #if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST)
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -39,6 +39,10 @@ static inline unsigned char hv_get_nmi_r
>  	return 0;
>  }
> 
> +extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> +extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
> +extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2);
> +
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
> 
> @@ -48,10 +52,6 @@ bool hv_isolation_type_snp(void);
>  bool hv_isolation_type_tdx(void);
> 
>  #ifdef CONFIG_X86_64
> -extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2);
> -extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2);
> -extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2);
> -
>  DECLARE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
>  #endif
> 
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -287,9 +287,14 @@ static void __init x86_setup_ops_for_tsc
>  #ifdef CONFIG_X86_64
>  DEFINE_STATIC_CALL(hv_hypercall, hv_std_hypercall);
>  EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall);
> +#define hypercall_update(hc) static_call_update(hv_hypercall, hc)
>  #endif
>  #endif /* CONFIG_HYPERV */
> 
> +#ifndef hypercall_update
> +#define hypercall_update(hc) (void)hc
> +#endif
> +
>  static uint32_t  __init ms_hyperv_platform(void)
>  {
>  	u32 eax;
> @@ -490,10 +495,8 @@ static void __init ms_hyperv_init_platfo
> 
>  		if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>  			static_branch_enable(&isolation_type_snp);
> -#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV)
>  			if (!ms_hyperv.paravisor_present)
> -				static_call_update(hv_hypercall, hv_snp_hypercall);
> -#endif
> +				hypercall_update(hv_snp_hypercall);
>  		} else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) {
>  			static_branch_enable(&isolation_type_tdx);
> 
> @@ -501,9 +504,7 @@ static void __init ms_hyperv_init_platfo
>  			ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
> 
>  			if (!ms_hyperv.paravisor_present) {
> -#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV)
> -				static_call_update(hv_hypercall, hv_tdx_hypercall);
> -#endif
> +				hypercall_update(hv_tdx_hypercall);
>  				/*
>  				 * Mark the Hyper-V TSC page feature as disabled
>  				 * in a TDX VM without paravisor so that the

Yes, that's a reasonable improvement that I can live with. This source code file
is certainly not a model for avoiding ugly #ifdef's, but your new approach avoids
adding to the problem quite so egregiously.

Michael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (3 preceding siblings ...)
  2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
@ 2025-04-14 11:11 ` Peter Zijlstra
  2025-04-21 18:28   ` Michael Kelley
  2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra
  5 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

Instead of using an indirect call to the hypercall page, use a direct
call instead. This avoids all CFI problems, including the one where
the hypercall page doesn't have IBT on.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/hyperv/hv_init.c |   62 +++++++++++++++++++++++-----------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -37,24 +37,42 @@
 void *hv_hypercall_pg;
 
 #ifdef CONFIG_X86_64
+static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
+{
+	return U64_MAX;
+}
+
+DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail);
+
 u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
 {
 	u64 hv_status;
 
-	if (!hv_hypercall_pg)
-		return U64_MAX;
-
 	register u64 __r8 asm("r8") = param2;
-	asm volatile (CALL_NOSPEC
+	asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall)
 		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
 		        "+c" (control), "+d" (param1)
-		      : "r" (__r8),
-		        THUNK_TARGET(hv_hypercall_pg)
+		      : "r" (__r8)
 		      : "cc", "memory", "r9", "r10", "r11");
 
 	return hv_status;
 }
+
+typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2);
+
+static inline void hv_set_hypercall_pg(void *ptr)
+{
+	hv_hypercall_pg = ptr;
+
+	if (!ptr)
+		ptr = &__hv_hyperfail;
+	static_call_update(__hv_hypercall, (hv_hypercall_f)ptr);
+}
 #else
+static inline void hv_set_hypercall_pg(void *ptr)
+{
+	hv_hypercall_pg = ptr;
+}
 EXPORT_SYMBOL_GPL(hv_hypercall_pg);
 #endif
 
@@ -349,7 +367,7 @@ static int hv_suspend(void)
 	 * pointer is restored on resume.
 	 */
 	hv_hypercall_pg_saved = hv_hypercall_pg;
-	hv_hypercall_pg = NULL;
+	hv_set_hypercall_pg(NULL);
 
 	/* Disable the hypercall page in the hypervisor */
 	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -375,7 +393,7 @@ static void hv_resume(void)
 		vmalloc_to_pfn(hv_hypercall_pg_saved);
 	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 
-	hv_hypercall_pg = hv_hypercall_pg_saved;
+	hv_set_hypercall_pg(hv_hypercall_pg_saved);
 	hv_hypercall_pg_saved = NULL;
 
 	/*
@@ -529,8 +547,8 @@ void __init hyperv_init(void)
 	if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present)
 		goto skip_hypercall_pg_init;
 
-	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
-			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
+	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
+			MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX,
 			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
 			__builtin_return_address(0));
 	if (hv_hypercall_pg == NULL)
@@ -568,27 +586,9 @@ void __init hyperv_init(void)
 		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
 	}
 
-skip_hypercall_pg_init:
-	/*
-	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
-	 * in that there's no ENDBR64 instruction at the entry to the
-	 * hypercall page. Because hypercalls are invoked via an indirect call
-	 * to the hypercall page, all hypercall attempts fail when IBT is
-	 * enabled, and Linux panics. For such buggy versions, disable IBT.
-	 *
-	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
-	 * page, so if future Linux kernel versions enable IBT for 32-bit
-	 * builds, additional hypercall page hackery will be required here
-	 * to provide an ENDBR32.
-	 */
-#ifdef CONFIG_X86_KERNEL_IBT
-	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
-	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
-		setup_clear_cpu_cap(X86_FEATURE_IBT);
-		pr_warn("Disabling IBT because of Hyper-V bug\n");
-	}
-#endif
+	hv_set_hypercall_pg(hv_hypercall_pg);
 
+skip_hypercall_pg_init:
 	/*
 	 * hyperv_init() is called before LAPIC is initialized: see
 	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
@@ -658,7 +658,7 @@ void hyperv_cleanup(void)
 	 * let hypercall operations fail safely rather than
 	 * panic the kernel for using invalid hypercall page
 	 */
-	hv_hypercall_pg = NULL;
+	hv_set_hypercall_pg(NULL);
 
 	/* Reset the hypercall page */
 	hypercall_msr.as_uint64 = hv_get_msr(HV_X64_MSR_HYPERCALL);



^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
@ 2025-04-21 18:28   ` Michael Kelley
  2025-04-25 14:03     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Kelley @ 2025-04-21 18:28 UTC (permalink / raw)
  To: Peter Zijlstra, x86@kernel.org
  Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org,
	decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com,
	jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com,
	seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org,
	kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Monday, April 14, 2025 4:12 AM
> 
> Instead of using an indirect call to the hypercall page, use a direct
> call instead. This avoids all CFI problems, including the one where
> the hypercall page doesn't have IBT on.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/hyperv/hv_init.c |   62 +++++++++++++++++++++++-----------------------
>  1 file changed, 31 insertions(+), 31 deletions(-)
> 
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -37,24 +37,42 @@
>  void *hv_hypercall_pg;
> 
>  #ifdef CONFIG_X86_64
> +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
> +{
> +	return U64_MAX;
> +}
> +
> +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail);
> +
>  u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
>  {
>  	u64 hv_status;
> 
> -	if (!hv_hypercall_pg)
> -		return U64_MAX;
> -
>  	register u64 __r8 asm("r8") = param2;
> -	asm volatile (CALL_NOSPEC
> +	asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall)
>  		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
>  		        "+c" (control), "+d" (param1)
> -		      : "r" (__r8),
> -		        THUNK_TARGET(hv_hypercall_pg)
> +		      : "r" (__r8)
>  		      : "cc", "memory", "r9", "r10", "r11");
> 
>  	return hv_status;
>  }
> +
> +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2);
> +
> +static inline void hv_set_hypercall_pg(void *ptr)
> +{
> +	hv_hypercall_pg = ptr;
> +
> +	if (!ptr)
> +		ptr = &__hv_hyperfail;
> +	static_call_update(__hv_hypercall, (hv_hypercall_f)ptr);
> +}
>  #else
> +static inline void hv_set_hypercall_pg(void *ptr)
> +{
> +	hv_hypercall_pg = ptr;
> +}
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
>  #endif
> 
> @@ -349,7 +367,7 @@ static int hv_suspend(void)
>  	 * pointer is restored on resume.
>  	 */
>  	hv_hypercall_pg_saved = hv_hypercall_pg;
> -	hv_hypercall_pg = NULL;
> +	hv_set_hypercall_pg(NULL);
> 
>  	/* Disable the hypercall page in the hypervisor */
>  	rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> @@ -375,7 +393,7 @@ static void hv_resume(void)
>  		vmalloc_to_pfn(hv_hypercall_pg_saved);
>  	wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> 
> -	hv_hypercall_pg = hv_hypercall_pg_saved;
> +	hv_set_hypercall_pg(hv_hypercall_pg_saved);
>  	hv_hypercall_pg_saved = NULL;
> 
>  	/*
> @@ -529,8 +547,8 @@ void __init hyperv_init(void)
>  	if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present)
>  		goto skip_hypercall_pg_init;
> 
> -	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START,
> -			VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
> +	hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR,
> +			MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX,
>  			VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
>  			__builtin_return_address(0));
>  	if (hv_hypercall_pg == NULL)
> @@ -568,27 +586,9 @@ void __init hyperv_init(void)
>  		wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>  	}
> 
> -skip_hypercall_pg_init:
> -	/*
> -	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> -	 * in that there's no ENDBR64 instruction at the entry to the
> -	 * hypercall page. Because hypercalls are invoked via an indirect call
> -	 * to the hypercall page, all hypercall attempts fail when IBT is
> -	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> -	 *
> -	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> -	 * page, so if future Linux kernel versions enable IBT for 32-bit
> -	 * builds, additional hypercall page hackery will be required here
> -	 * to provide an ENDBR32.
> -	 */
> -#ifdef CONFIG_X86_KERNEL_IBT
> -	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> -	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> -		setup_clear_cpu_cap(X86_FEATURE_IBT);
> -		pr_warn("Disabling IBT because of Hyper-V bug\n");
> -	}
> -#endif

With this patch set, it's nice to see IBT working in a Hyper-V guest!
I had previously tested IBT with some hackery to the hypercall page
to add the missing ENDBR64, and didn't see any problems. Same
after these changes -- no complaints from IBT.

> +	hv_set_hypercall_pg(hv_hypercall_pg);
> 
> +skip_hypercall_pg_init:
>  	/*
>  	 * hyperv_init() is called before LAPIC is initialized: see
>  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> @@ -658,7 +658,7 @@ void hyperv_cleanup(void)
>  	 * let hypercall operations fail safely rather than
>  	 * panic the kernel for using invalid hypercall page
>  	 */
> -	hv_hypercall_pg = NULL;
> +	hv_set_hypercall_pg(NULL);

This causes a hang getting into the kdump kernel after a panic.
hyperv_cleanup() is called after native_machine_crash_shutdown()
has done crash_smp_send_stop() on all the other CPUs. I don't know
the details of how static_call_update() works, but it's easy to imagine that
it wouldn't work when the kernel is in such a state.

The original code setting hv_hypercall_pg to NULL is just tidiness.
Other CPUs are stopped and can't be making hypercalls, and this CPU
shouldn't be making hypercalls either, so setting it to NULL more
cleanly catches some erroneous hypercall (vs. accessing the hypercall
page after Hyper-V has been told to reset it).

If I remove this change and let hv_hypercall_pg be set to NULL without
doing static_call_update(), then we correctly get to the kdump kernel
and everything works.

Michael

> 
>  	/* Reset the hypercall page */
>  	hypercall_msr.as_uint64 = hv_get_msr(HV_X64_MSR_HYPERCALL);
> 
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-21 18:28   ` Michael Kelley
@ 2025-04-25 14:03     ` Peter Zijlstra
  2025-04-25 14:32       ` Michael Kelley
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-25 14:03 UTC (permalink / raw)
  To: Michael Kelley
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, jpoimboe@kernel.org,
	pawan.kumar.gupta@linux.intel.com, seanjc@google.com,
	pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org,
	Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

On Mon, Apr 21, 2025 at 06:28:42PM +0000, Michael Kelley wrote:

> >  #ifdef CONFIG_X86_64
> > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
> > +{
> > +	return U64_MAX;
> > +}
> > +
> > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail);
> > +
> >  u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
> >  {
> >  	u64 hv_status;
> > 
> > +	asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall)
> >  		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> >  		        "+c" (control), "+d" (param1)
> > +		      : "r" (__r8)
> >  		      : "cc", "memory", "r9", "r10", "r11");
> > 
> >  	return hv_status;
> >  }
> > +
> > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2);
> > +
> > +static inline void hv_set_hypercall_pg(void *ptr)
> > +{
> > +	hv_hypercall_pg = ptr;
> > +
> > +	if (!ptr)
> > +		ptr = &__hv_hyperfail;
> > +	static_call_update(__hv_hypercall, (hv_hypercall_f)ptr);
> > +}

^ kept for reference, as I try and explain how static_call() works
below.

> > -skip_hypercall_pg_init:
> > -	/*
> > -	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> > -	 * in that there's no ENDBR64 instruction at the entry to the
> > -	 * hypercall page. Because hypercalls are invoked via an indirect call
> > -	 * to the hypercall page, all hypercall attempts fail when IBT is
> > -	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> > -	 *
> > -	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> > -	 * page, so if future Linux kernel versions enable IBT for 32-bit
> > -	 * builds, additional hypercall page hackery will be required here
> > -	 * to provide an ENDBR32.
> > -	 */
> > -#ifdef CONFIG_X86_KERNEL_IBT
> > -	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> > -	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> > -		setup_clear_cpu_cap(X86_FEATURE_IBT);
> > -		pr_warn("Disabling IBT because of Hyper-V bug\n");
> > -	}
> > -#endif
> 
> With this patch set, it's nice to see IBT working in a Hyper-V guest!
> I had previously tested IBT with some hackery to the hypercall page
> to add the missing ENDBR64, and didn't see any problems. Same
> after these changes -- no complaints from IBT.

No indirect calls left, no IBT complaints ;-)

> > +	hv_set_hypercall_pg(hv_hypercall_pg);
> > 
> > +skip_hypercall_pg_init:
> >  	/*
> >  	 * hyperv_init() is called before LAPIC is initialized: see
> >  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> > @@ -658,7 +658,7 @@ void hyperv_cleanup(void)
> >  	 * let hypercall operations fail safely rather than
> >  	 * panic the kernel for using invalid hypercall page
> >  	 */
> > -	hv_hypercall_pg = NULL;
> > +	hv_set_hypercall_pg(NULL);
> 
> This causes a hang getting into the kdump kernel after a panic.
> hyperv_cleanup() is called after native_machine_crash_shutdown()
> has done crash_smp_send_stop() on all the other CPUs. I don't know
> the details of how static_call_update() works, 

Right, so let me try and explain this :-)

So we get the compiler to emit direct calls (CALL/JMP) to symbols
prefixed with "__SCT__", in this case from asm, but more usually by
means of the static_call() macro mess.

Meanwhile DEFINE_STATIC_CALL() ensures such a symbol actually exists.
This symbol is a little trampoline that redirects to the actual
target function given to DEFINE_STATIC_CALL() -- __hv_hyperfail() in the
above case.

Then objtool runs through the resulting object file and stores the
location of every call to these __STC__ prefixed symbols in a custom
section.

This enables static_call init (boot time) to go through the section and
rewrite all the trampoline calls to direct calls to the target.
Subsequent static_call_update() calls will again rewrite the direct call
to point elsewhere.

So very much how static_branch() does a NOP/JMP rewrite to toggle
branches, static_call() rewrites (direct) call targets.

> but it's easy to imagine that
> it wouldn't work when the kernel is in such a state.
> 
> The original code setting hv_hypercall_pg to NULL is just tidiness.
> Other CPUs are stopped and can't be making hypercalls, and this CPU
> shouldn't be making hypercalls either, so setting it to NULL more
> cleanly catches some erroneous hypercall (vs. accessing the hypercall
> page after Hyper-V has been told to reset it).

So if you look at (retained above) hv_set_hypercall_pg(), when given
NULL, the call target is set to __hv_hyperfail(), which does an
unconditional U64_MAX return.

Combined with the fact that the thing *should* not be doing hypercalls
anymore at this point, something is iffy.

I can easily remove it, but it *should* be equivalent to before, where
it dynamicall checked for hv_hypercall_pg being NULL.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-25 14:03     ` Peter Zijlstra
@ 2025-04-25 14:32       ` Michael Kelley
  2025-04-27  3:58         ` Michael Kelley
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Kelley @ 2025-04-25 14:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, jpoimboe@kernel.org,
	pawan.kumar.gupta@linux.intel.com, seanjc@google.com,
	pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org,
	Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, April 25, 2025 7:04 AM
> 
> On Mon, Apr 21, 2025 at 06:28:42PM +0000, Michael Kelley wrote:
> 
> > >  #ifdef CONFIG_X86_64
> > > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
> > > +{
> > > +	return U64_MAX;
> > > +}
> > > +
> > > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail);
> > > +
> > >  u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
> > >  {
> > >  	u64 hv_status;
> > >
> > > +	asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall)
> > >  		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> > >  		        "+c" (control), "+d" (param1)
> > > +		      : "r" (__r8)
> > >  		      : "cc", "memory", "r9", "r10", "r11");
> > >
> > >  	return hv_status;
> > >  }
> > > +
> > > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2);
> > > +
> > > +static inline void hv_set_hypercall_pg(void *ptr)
> > > +{
> > > +	hv_hypercall_pg = ptr;
> > > +
> > > +	if (!ptr)
> > > +		ptr = &__hv_hyperfail;
> > > +	static_call_update(__hv_hypercall, (hv_hypercall_f)ptr);
> > > +}
> 
> ^ kept for reference, as I try and explain how static_call() works
> below.
> 
> > > -skip_hypercall_pg_init:
> > > -	/*
> > > -	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> > > -	 * in that there's no ENDBR64 instruction at the entry to the
> > > -	 * hypercall page. Because hypercalls are invoked via an indirect call
> > > -	 * to the hypercall page, all hypercall attempts fail when IBT is
> > > -	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> > > -	 *
> > > -	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> > > -	 * page, so if future Linux kernel versions enable IBT for 32-bit
> > > -	 * builds, additional hypercall page hackery will be required here
> > > -	 * to provide an ENDBR32.
> > > -	 */
> > > -#ifdef CONFIG_X86_KERNEL_IBT
> > > -	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> > > -	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> > > -		setup_clear_cpu_cap(X86_FEATURE_IBT);
> > > -		pr_warn("Disabling IBT because of Hyper-V bug\n");
> > > -	}
> > > -#endif
> >
> > With this patch set, it's nice to see IBT working in a Hyper-V guest!
> > I had previously tested IBT with some hackery to the hypercall page
> > to add the missing ENDBR64, and didn't see any problems. Same
> > after these changes -- no complaints from IBT.
> 
> No indirect calls left, no IBT complaints ;-)
> 
> > > +	hv_set_hypercall_pg(hv_hypercall_pg);
> > >
> > > +skip_hypercall_pg_init:
> > >  	/*
> > >  	 * hyperv_init() is called before LAPIC is initialized: see
> > >  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> > > @@ -658,7 +658,7 @@ void hyperv_cleanup(void)
> > >  	 * let hypercall operations fail safely rather than
> > >  	 * panic the kernel for using invalid hypercall page
> > >  	 */
> > > -	hv_hypercall_pg = NULL;
> > > +	hv_set_hypercall_pg(NULL);
> >
> > This causes a hang getting into the kdump kernel after a panic.
> > hyperv_cleanup() is called after native_machine_crash_shutdown()
> > has done crash_smp_send_stop() on all the other CPUs. I don't know
> > the details of how static_call_update() works,
> 
> Right, so let me try and explain this :-)
> 
> So we get the compiler to emit direct calls (CALL/JMP) to symbols
> prefixed with "__SCT__", in this case from asm, but more usually by
> means of the static_call() macro mess.
> 
> Meanwhile DEFINE_STATIC_CALL() ensures such a symbol actually exists.
> This symbol is a little trampoline that redirects to the actual
> target function given to DEFINE_STATIC_CALL() -- __hv_hyperfail() in the
> above case.
> 
> Then objtool runs through the resulting object file and stores the
> location of every call to these __STC__ prefixed symbols in a custom
> section.
> 
> This enables static_call init (boot time) to go through the section and
> rewrite all the trampoline calls to direct calls to the target.
> Subsequent static_call_update() calls will again rewrite the direct call
> to point elsewhere.
> 
> So very much how static_branch() does a NOP/JMP rewrite to toggle
> branches, static_call() rewrites (direct) call targets.
> 
> > but it's easy to imagine that
> > it wouldn't work when the kernel is in such a state.
> >
> > The original code setting hv_hypercall_pg to NULL is just tidiness.
> > Other CPUs are stopped and can't be making hypercalls, and this CPU
> > shouldn't be making hypercalls either, so setting it to NULL more
> > cleanly catches some erroneous hypercall (vs. accessing the hypercall
> > page after Hyper-V has been told to reset it).
> 
> So if you look at (retained above) hv_set_hypercall_pg(), when given
> NULL, the call target is set to __hv_hyperfail(), which does an
> unconditional U64_MAX return.
> 
> Combined with the fact that the thing *should* not be doing hypercalls
> anymore at this point, something is iffy.
> 
> I can easily remove it, but it *should* be equivalent to before, where
> it dynamicall checked for hv_hypercall_pg being NULL.

I agree that setting the call target to __hv_hyperfail() should be good.
But my theory is that static_call_update() is hanging when trying to
do the rewrite, because of the state of the other CPUs. I don't think
control is ever returning from static_call_update() when invoked
through hyperv_cleanup(). Wouldn't static_call_update() need to park
the other CPUs temporarily and/or flush instruction caches to make
everything consistent?

But that's just my theory. I'll run a few more experiments to confirm
if control ever returns from static_call_update() in this case.

Michael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-25 14:32       ` Michael Kelley
@ 2025-04-27  3:58         ` Michael Kelley
  2025-04-29 15:19           ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Michael Kelley @ 2025-04-27  3:58 UTC (permalink / raw)
  To: Michael Kelley, Peter Zijlstra
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, jpoimboe@kernel.org,
	pawan.kumar.gupta@linux.intel.com, seanjc@google.com,
	pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org,
	Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

From: Michael Kelley <mhklinux@outlook.com> Sent: Friday, April 25, 2025 7:32 AM
> 
> From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, April 25, 2025 7:04 AM
> >
> > On Mon, Apr 21, 2025 at 06:28:42PM +0000, Michael Kelley wrote:
> >
> > > >  #ifdef CONFIG_X86_64
> > > > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2)
> > > > +{
> > > > +	return U64_MAX;
> > > > +}
> > > > +
> > > > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail);
> > > > +
> > > >  u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2)
> > > >  {
> > > >  	u64 hv_status;
> > > >
> > > > +	asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall)
> > > >  		      : "=a" (hv_status), ASM_CALL_CONSTRAINT,
> > > >  		        "+c" (control), "+d" (param1)
> > > > +		      : "r" (__r8)
> > > >  		      : "cc", "memory", "r9", "r10", "r11");
> > > >
> > > >  	return hv_status;
> > > >  }
> > > > +
> > > > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2);
> > > > +
> > > > +static inline void hv_set_hypercall_pg(void *ptr)
> > > > +{
> > > > +	hv_hypercall_pg = ptr;
> > > > +
> > > > +	if (!ptr)
> > > > +		ptr = &__hv_hyperfail;
> > > > +	static_call_update(__hv_hypercall, (hv_hypercall_f)ptr);
> > > > +}
> >
> > ^ kept for reference, as I try and explain how static_call() works
> > below.
> >
> > > > -skip_hypercall_pg_init:
> > > > -	/*
> > > > -	 * Some versions of Hyper-V that provide IBT in guest VMs have a bug
> > > > -	 * in that there's no ENDBR64 instruction at the entry to the
> > > > -	 * hypercall page. Because hypercalls are invoked via an indirect call
> > > > -	 * to the hypercall page, all hypercall attempts fail when IBT is
> > > > -	 * enabled, and Linux panics. For such buggy versions, disable IBT.
> > > > -	 *
> > > > -	 * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
> > > > -	 * page, so if future Linux kernel versions enable IBT for 32-bit
> > > > -	 * builds, additional hypercall page hackery will be required here
> > > > -	 * to provide an ENDBR32.
> > > > -	 */
> > > > -#ifdef CONFIG_X86_KERNEL_IBT
> > > > -	if (cpu_feature_enabled(X86_FEATURE_IBT) &&
> > > > -	    *(u32 *)hv_hypercall_pg != gen_endbr()) {
> > > > -		setup_clear_cpu_cap(X86_FEATURE_IBT);
> > > > -		pr_warn("Disabling IBT because of Hyper-V bug\n");
> > > > -	}
> > > > -#endif
> > >
> > > With this patch set, it's nice to see IBT working in a Hyper-V guest!
> > > I had previously tested IBT with some hackery to the hypercall page
> > > to add the missing ENDBR64, and didn't see any problems. Same
> > > after these changes -- no complaints from IBT.
> >
> > No indirect calls left, no IBT complaints ;-)
> >
> > > > +	hv_set_hypercall_pg(hv_hypercall_pg);
> > > >
> > > > +skip_hypercall_pg_init:
> > > >  	/*
> > > >  	 * hyperv_init() is called before LAPIC is initialized: see
> > > >  	 * apic_intr_mode_init() -> x86_platform.apic_post_init() and
> > > > @@ -658,7 +658,7 @@ void hyperv_cleanup(void)
> > > >  	 * let hypercall operations fail safely rather than
> > > >  	 * panic the kernel for using invalid hypercall page
> > > >  	 */
> > > > -	hv_hypercall_pg = NULL;
> > > > +	hv_set_hypercall_pg(NULL);
> > >
> > > This causes a hang getting into the kdump kernel after a panic.
> > > hyperv_cleanup() is called after native_machine_crash_shutdown()
> > > has done crash_smp_send_stop() on all the other CPUs. I don't know
> > > the details of how static_call_update() works,
> >
> > Right, so let me try and explain this :-)
> >
> > So we get the compiler to emit direct calls (CALL/JMP) to symbols
> > prefixed with "__SCT__", in this case from asm, but more usually by
> > means of the static_call() macro mess.
> >
> > Meanwhile DEFINE_STATIC_CALL() ensures such a symbol actually exists.
> > This symbol is a little trampoline that redirects to the actual
> > target function given to DEFINE_STATIC_CALL() -- __hv_hyperfail() in the
> > above case.
> >
> > Then objtool runs through the resulting object file and stores the
> > location of every call to these __STC__ prefixed symbols in a custom
> > section.
> >
> > This enables static_call init (boot time) to go through the section and
> > rewrite all the trampoline calls to direct calls to the target.
> > Subsequent static_call_update() calls will again rewrite the direct call
> > to point elsewhere.
> >
> > So very much how static_branch() does a NOP/JMP rewrite to toggle
> > branches, static_call() rewrites (direct) call targets.
> >
> > > but it's easy to imagine that
> > > it wouldn't work when the kernel is in such a state.
> > >
> > > The original code setting hv_hypercall_pg to NULL is just tidiness.
> > > Other CPUs are stopped and can't be making hypercalls, and this CPU
> > > shouldn't be making hypercalls either, so setting it to NULL more
> > > cleanly catches some erroneous hypercall (vs. accessing the hypercall
> > > page after Hyper-V has been told to reset it).
> >
> > So if you look at (retained above) hv_set_hypercall_pg(), when given
> > NULL, the call target is set to __hv_hyperfail(), which does an
> > unconditional U64_MAX return.
> >
> > Combined with the fact that the thing *should* not be doing hypercalls
> > anymore at this point, something is iffy.
> >
> > I can easily remove it, but it *should* be equivalent to before, where
> > it dynamicall checked for hv_hypercall_pg being NULL.
> 
> I agree that setting the call target to __hv_hyperfail() should be good.
> But my theory is that static_call_update() is hanging when trying to
> do the rewrite, because of the state of the other CPUs. I don't think
> control is ever returning from static_call_update() when invoked
> through hyperv_cleanup(). Wouldn't static_call_update() need to park
> the other CPUs temporarily and/or flush instruction caches to make
> everything consistent?
> 
> But that's just my theory. I'll run a few more experiments to confirm
> if control ever returns from static_call_update() in this case.
> 

Indeed, control never returns from static_call_update(). Prior to
hyperv_cleanup() running, crash_smp_send_stop() has been called to
stop all the other CPUs, and it does not update cpu_online_mask to
reflect the other CPUs being stopped.

static_call_update() runs this call sequence:

arch_static_call_transform()
__static_call_transform()
smp_text_poke_single()
smp_text_poke_batch_finish()
smp_text_poke_sync_each_cpu()

smp_text_poke_sync_each_cpu() sends an IPI to each CPU in
cpu_online_mask, and of course the other CPUs never respond, so
it waits forever.

Michael

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page
  2025-04-27  3:58         ` Michael Kelley
@ 2025-04-29 15:19           ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-29 15:19 UTC (permalink / raw)
  To: Michael Kelley
  Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com,
	wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, jpoimboe@kernel.org,
	pawan.kumar.gupta@linux.intel.com, seanjc@google.com,
	pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org,
	Arnd Bergmann, gregkh@linuxfoundation.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-efi@vger.kernel.org,
	samitolvanen@google.com, ojeda@kernel.org

On Sun, Apr 27, 2025 at 03:58:54AM +0000, Michael Kelley wrote:

> Indeed, control never returns from static_call_update(). Prior to
> hyperv_cleanup() running, crash_smp_send_stop() has been called to
> stop all the other CPUs, and it does not update cpu_online_mask to
> reflect the other CPUs being stopped.
> 
> static_call_update() runs this call sequence:
> 
> arch_static_call_transform()
> __static_call_transform()
> smp_text_poke_single()
> smp_text_poke_batch_finish()
> smp_text_poke_sync_each_cpu()
> 
> smp_text_poke_sync_each_cpu() sends an IPI to each CPU in
> cpu_online_mask, and of course the other CPUs never respond, so
> it waits forever.

Fair enough. I'll just remove this final call. Thanks!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 6/6] objtool: Validate kCFI calls
  2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
                   ` (4 preceding siblings ...)
  2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
@ 2025-04-14 11:11 ` Peter Zijlstra
  2025-04-14 23:43   ` Josh Poimboeuf
  5 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
  To: x86
  Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
	peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

Validate that all indirect calls adhere to kCFI rules. Notably doing
nocfi indirect call to a cfi function is broken.

Apparently some Rust 'core' code violates this and explodes when ran
with FineIBT.

All the ANNOTATE_NOCFI sites are prime targets for attackers.

 - runtime EFI is especially henous because it also needs to disable
   IBT. Basically calling unknown code without CFI protection at
   runtime is a massice security issue.

 - Kexec image handover; if you can exploit this, you get to keep it :-)

 - KVM, once for the interrupt injection calling IDT gates directly.

 - KVM, once for the FASTOP emulation stuff.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/machine_kexec_64.c  |    1 
 arch/x86/kvm/emulate.c              |    4 +++
 arch/x86/kvm/vmx/vmenter.S          |    4 +++
 arch/x86/platform/efi/efi_stub_64.S |    4 +++
 drivers/misc/lkdtm/perms.c          |    2 +
 include/linux/objtool.h             |    3 ++
 include/linux/objtool_types.h       |    1 
 tools/include/linux/objtool_types.h |    1 
 tools/objtool/check.c               |   41 ++++++++++++++++++++++++++++++++++++
 tools/objtool/include/objtool/elf.h |    1 
 10 files changed, 62 insertions(+)

--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -442,6 +442,7 @@ void __nocfi machine_kexec(struct kimage
 
 	__ftrace_enabled_restore(save_ftrace_enabled);
 }
+ANNOTATE_NOCFI_SYM(machine_kexec);
 
 /* arch-dependent functionality related to kexec file-based syscall */
 
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5071,6 +5071,10 @@ static noinline int fastop(struct x86_em
 		return emulate_de(ctxt);
 	return X86EMUL_CONTINUE;
 }
+/*
+ * The ASM stubs don't have CFI on.
+ */
+ANNOTATE_NOCFI_SYM(fastop);
 
 void init_decode_cache(struct x86_emulate_ctxt *ctxt)
 {
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -363,5 +363,9 @@ SYM_FUNC_END(vmread_error_trampoline)
 .section .text, "ax"
 
 SYM_FUNC_START(vmx_do_interrupt_irqoff)
+	/*
+	 * Calling an IDT gate directly.
+	 */
+	ANNOTATE_NOCFI
 	VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1
 SYM_FUNC_END(vmx_do_interrupt_irqoff)
--- a/arch/x86/platform/efi/efi_stub_64.S
+++ b/arch/x86/platform/efi/efi_stub_64.S
@@ -11,6 +11,10 @@
 #include <asm/nospec-branch.h>
 
 SYM_FUNC_START(__efi_call)
+	/*
+	 * The EFI code doesn't have any CFI :-(
+	 */
+	ANNOTATE_NOCFI
 	pushq %rbp
 	movq %rsp, %rbp
 	and $~0xf, %rsp
--- a/drivers/misc/lkdtm/perms.c
+++ b/drivers/misc/lkdtm/perms.c
@@ -9,6 +9,7 @@
 #include <linux/vmalloc.h>
 #include <linux/mman.h>
 #include <linux/uaccess.h>
+#include <linux/objtool.h>
 #include <asm/cacheflush.h>
 #include <asm/sections.h>
 
@@ -86,6 +87,7 @@ static noinline __nocfi void execute_loc
 	func();
 	pr_err("FAIL: func returned\n");
 }
+ANNOTATE_NOCFI_SYM(execute_location);
 
 static void execute_user_location(void *dst)
 {
--- a/include/linux/objtool.h
+++ b/include/linux/objtool.h
@@ -185,6 +185,8 @@
  */
 #define ANNOTATE_REACHABLE(label)	__ASM_ANNOTATE(label, ANNOTYPE_REACHABLE)
 
+#define ANNOTATE_NOCFI_SYM(sym)		asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOCFI))
+
 #else
 #define ANNOTATE_NOENDBR		ANNOTATE type=ANNOTYPE_NOENDBR
 #define ANNOTATE_RETPOLINE_SAFE		ANNOTATE type=ANNOTYPE_RETPOLINE_SAFE
@@ -194,6 +196,7 @@
 #define ANNOTATE_INTRA_FUNCTION_CALL	ANNOTATE type=ANNOTYPE_INTRA_FUNCTION_CALL
 #define ANNOTATE_UNRET_BEGIN		ANNOTATE type=ANNOTYPE_UNRET_BEGIN
 #define ANNOTATE_REACHABLE		ANNOTATE type=ANNOTYPE_REACHABLE
+#define ANNOTATE_NOCFI			ANNOTATE type=ANNOTYPE_NOCFI
 #endif
 
 #if defined(CONFIG_NOINSTR_VALIDATION) && \
--- a/include/linux/objtool_types.h
+++ b/include/linux/objtool_types.h
@@ -66,5 +66,6 @@ struct unwind_hint {
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
 #define ANNOTYPE_JUMP_TABLE		9
+#define ANNOTYPE_NOCFI			10
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/include/linux/objtool_types.h
+++ b/tools/include/linux/objtool_types.h
@@ -66,5 +66,6 @@ struct unwind_hint {
 #define ANNOTYPE_INTRA_FUNCTION_CALL	7
 #define ANNOTYPE_REACHABLE		8
 #define ANNOTYPE_JUMP_TABLE		9
+#define ANNOTYPE_NOCFI			10
 
 #endif /* _LINUX_OBJTOOL_TYPES_H */
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2387,6 +2387,8 @@ static int __annotate_ifc(struct objtool
 
 static int __annotate_late(struct objtool_file *file, int type, struct instruction *insn)
 {
+	struct symbol *sym;
+
 	switch (type) {
 	case ANNOTYPE_NOENDBR:
 		/* early */
@@ -2436,6 +2438,15 @@ static int __annotate_late(struct objtoo
 		insn->_jump_table = (void *)1;
 		break;
 
+	case ANNOTYPE_NOCFI:
+		sym = insn->sym;
+		if (!sym) {
+			WARN_INSN(insn, "dodgy NOCFI annotation");
+			break;
+		}
+		insn->sym->nocfi = 1;
+		break;
+
 	default:
 		ERROR_INSN(insn, "Unknown annotation type: %d", type);
 		return -1;
@@ -4006,6 +4017,36 @@ static int validate_retpoline(struct obj
 		warnings++;
 	}
 
+	if (!opts.cfi)
+		return warnings;
+
+	/*
+	 * kCFI call sites look like:
+	 *
+	 *     movl $(-0x12345678), %r10d
+	 *     addl -4(%r11), %r10d
+	 *     jz 1f
+	 *     ud2
+	 *  1: cs call __x86_indirect_thunk_r11
+	 *
+	 * Verify all indirect calls are kCFI adorned by checking for the
+	 * UD2. Notably, doing __nocfi calls to regular (cfi) functions is
+	 * broken.
+	 */
+	list_for_each_entry(insn, &file->retpoline_call_list, call_node) {
+		struct symbol *sym = insn->sym;
+
+		if (sym && sym->type == STT_FUNC && !sym->nocfi) {
+			struct instruction *prev =
+				prev_insn_same_sym(file, insn);
+
+			if (!prev || prev->type != INSN_BUG) {
+				WARN_INSN(insn, "no-cfi indirect call!");
+				warnings++;
+			}
+		}
+	}
+
 	return warnings;
 }
 
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -70,6 +70,7 @@ struct symbol {
 	u8 local_label       : 1;
 	u8 frame_pointer     : 1;
 	u8 ignore	     : 1;
+	u8 nocfi             : 1;
 	struct list_head pv_target;
 	struct reloc *relocs;
 };



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] objtool: Validate kCFI calls
  2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra
@ 2025-04-14 23:43   ` Josh Poimboeuf
  2025-04-29 16:10     ` Peter Zijlstra
  2025-04-29 16:18     ` Peter Zijlstra
  0 siblings, 2 replies; 32+ messages in thread
From: Josh Poimboeuf @ 2025-04-14 23:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 14, 2025 at 01:11:46PM +0200, Peter Zijlstra wrote:
>  SYM_FUNC_START(__efi_call)
> +	/*
> +	 * The EFI code doesn't have any CFI :-(
> +	 */
> +	ANNOTATE_NOCFI
>  	pushq %rbp
>  	movq %rsp, %rbp
>  	and $~0xf, %rsp

This looks like an insn annotation but is actually a func annotation.
ANNOTATE_NOCFI_SYM() would be a lot clearer, for all the asm code.

Though maybe it's better for ANNOTATE_NOCFI to annotate the call site
itself (for asm) while ANNOTATE_NOCFI_SYM annotates the entire function
(in C).  So either there would be two separate annotypes or the
annotation would get interpreted based on whether it's at the beginning
of the function.

> +++ b/include/linux/objtool.h
> @@ -185,6 +185,8 @@
>   */
>  #define ANNOTATE_REACHABLE(label)	__ASM_ANNOTATE(label, ANNOTYPE_REACHABLE)
>  
> +#define ANNOTATE_NOCFI_SYM(sym)		asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOCFI))

This needs a comment like the others.

> @@ -2436,6 +2438,15 @@ static int __annotate_late(struct objtoo
>  		insn->_jump_table = (void *)1;
>  		break;
>  
> +	case ANNOTYPE_NOCFI:
> +		sym = insn->sym;
> +		if (!sym) {
> +			WARN_INSN(insn, "dodgy NOCFI annotation");
> +			break;

Return an error?

> @@ -4006,6 +4017,36 @@ static int validate_retpoline(struct obj
> +	/*
> +	 * kCFI call sites look like:
> +	 *
> +	 *     movl $(-0x12345678), %r10d
> +	 *     addl -4(%r11), %r10d
> +	 *     jz 1f
> +	 *     ud2
> +	 *  1: cs call __x86_indirect_thunk_r11
> +	 *
> +	 * Verify all indirect calls are kCFI adorned by checking for the
> +	 * UD2. Notably, doing __nocfi calls to regular (cfi) functions is
> +	 * broken.

This "__nocfi calls" is confusing me.  IIUC, there are two completely
different meanings for "nocfi":

  - __nocfi: disable the kcfi function entry stuff
  
  - ANNOTATE_NOCFI_SYM: Regardless of whether the function is __nocfi,
    allow it to have non-CFI indirect call sites.

Can we call this ANNOTATE_NOCFI_SAFE or something?

-- 
Josh

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] objtool: Validate kCFI calls
  2025-04-14 23:43   ` Josh Poimboeuf
@ 2025-04-29 16:10     ` Peter Zijlstra
  2025-04-29 16:18     ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-29 16:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 14, 2025 at 04:43:26PM -0700, Josh Poimboeuf wrote:

> > +	 * Verify all indirect calls are kCFI adorned by checking for the
> > +	 * UD2. Notably, doing __nocfi calls to regular (cfi) functions is
> > +	 * broken.
> 
> This "__nocfi calls" is confusing me.  IIUC, there are two completely
> different meanings for "nocfi":
> 
>   - __nocfi: disable the kcfi function entry stuff

Ah, no. __nocfi is a bit of a mess, this is both the function entry
thing, but also very much the caller verification stuff for indirect
calls done inside this function.

This leads to lovely stuff like:

void (*foo)(void);

static __always_inline __nocfi void nocfi_caller(void)
{
	foo();
}

void bar(void)
{
	nocfi_caller();
	foo();
}

This actually compiles and has bar() have two distinctly different
indirect calls to foo, while bar itself has a __cfi preamble.


Anyway, let me have a poke at the annotation.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 6/6] objtool: Validate kCFI calls
  2025-04-14 23:43   ` Josh Poimboeuf
  2025-04-29 16:10     ` Peter Zijlstra
@ 2025-04-29 16:18     ` Peter Zijlstra
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-29 16:18 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen,
	hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
	Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
	samitolvanen, ojeda

On Mon, Apr 14, 2025 at 04:43:26PM -0700, Josh Poimboeuf wrote:
> Can we call this ANNOTATE_NOCFI_SAFE or something?

I'm hesitant to do so, because some of these sites really are not safe.
EFI and VMX interrupt crud really are a security issue.

EFI really is unfixable but not less brokene, because the EFI code is
out of our control, the VMX thing might be fixable, not sure I
understand KVM well enough.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-04-29 20:36 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra
2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra
2025-04-14 11:11 ` [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra
2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra
2025-04-14 22:36   ` Josh Poimboeuf
2025-04-15  7:44     ` Peter Zijlstra
2025-04-15 14:39       ` Josh Poimboeuf
2025-04-16  8:38         ` Peter Zijlstra
2025-04-26 10:01           ` Peter Zijlstra
2025-04-28 17:13             ` Sean Christopherson
2025-04-29 10:09               ` Peter Zijlstra
2025-04-29 14:05                 ` Sean Christopherson
2025-04-29 14:46                   ` Peter Zijlstra
2025-04-29 17:16                     ` Sean Christopherson
2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra
2025-04-14 11:47   ` Peter Zijlstra
2025-04-14 14:06   ` Uros Bizjak
2025-04-14 14:08     ` Peter Zijlstra
2025-04-21 18:27   ` Michael Kelley
2025-04-25 13:50     ` Peter Zijlstra
2025-04-29 15:18     ` Peter Zijlstra
2025-04-29 20:36       ` Michael Kelley
2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra
2025-04-21 18:28   ` Michael Kelley
2025-04-25 14:03     ` Peter Zijlstra
2025-04-25 14:32       ` Michael Kelley
2025-04-27  3:58         ` Michael Kelley
2025-04-29 15:19           ` Peter Zijlstra
2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra
2025-04-14 23:43   ` Josh Poimboeuf
2025-04-29 16:10     ` Peter Zijlstra
2025-04-29 16:18     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).