* [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions
@ 2025-04-14 11:11 Peter Zijlstra
2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra
` (5 more replies)
0 siblings, 6 replies; 32+ messages in thread
From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw)
To: x86
Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa,
peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees,
Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi,
samitolvanen, ojeda
Hi!
On kCFI (CONFIG_CFI_CLANG=y) builds all indirect calls should have the CFI
check on (with very few exceptions). Not having the CFI checks undermines the
protection provided by CFI and will make these sites candidates for people
wanting to steal your cookies.
Specifically the ABI changes are so that doing indirect calls without the CFI
magic, to a CFI adorned function is not compatible (although it happens to work
for some setups, it very much does not for FineIBT).
Rust people tripped over this the other day, since their 'core' happened to
have some no_sanitize(kcfi) bits in, which promptly exploded when ran with
FineIBT on.
Since this is very much not a supported model -- on purpose, have objtool
detect and warn about such constructs.
This effort [1] found all existins [2] non-cfi indirect calls in the kernel.
Notably the KVM fastop emulation stuff -- which reminded me I still had pending
patches there. Included here since they reduce the amount of fastop call sites,
and the final patch includes an annotation for that. Although ideally we should
look at means of doing fastops differently.
KVM has another; the interrupt injection stuff calls the IDT handler directly.
Is there an alternative? Can we keep a table of Linux functions slighly higher
up the call stack (asm_\cfunc ?) and add CFI to those?
HyperV hypercall page stuff, which I've previously suggested use direct calls,
and which I've now converted (after getting properly annoyed with that code).
[1] https://lkml.kernel.org/r/20250410154556.GB9003@noisy.programming.kicks-ass.net
[2] https://lkml.kernel.org/r/20250410194334.GA3248459@google.com
^ permalink raw reply [flat|nested] 32+ messages in thread* [PATCH 1/6] x86/nospec: JMP_NOSPEC 2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra @ 2025-04-14 11:11 ` Peter Zijlstra 2025-04-14 11:11 ` [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra ` (4 subsequent siblings) 5 siblings, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw) To: x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/include/asm/nospec-branch.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -438,6 +438,9 @@ static inline void call_depth_return_thu #define CALL_NOSPEC __CS_PREFIX("%V[thunk_target]") \ "call __x86_indirect_thunk_%V[thunk_target]\n" +#define JMP_NOSPEC __CS_PREFIX("%V[thunk_target]") \ + "jmp __x86_indirect_thunk_%V[thunk_target]\n" + # define THUNK_TARGET(addr) [thunk_target] "r" (addr) #else /* CONFIG_X86_32 */ @@ -468,10 +471,31 @@ static inline void call_depth_return_thu "call *%[thunk_target]\n", \ X86_FEATURE_RETPOLINE_LFENCE) +# define JMP_NOSPEC \ + ALTERNATIVE_2( \ + ANNOTATE_RETPOLINE_SAFE \ + "jmp *%[thunk_target]\n", \ + " jmp 901f;\n" \ + " .align 16\n" \ + "901: call 903f;\n" \ + "902: pause;\n" \ + " lfence;\n" \ + " jmp 902b;\n" \ + " .align 16\n" \ + "903: lea 4(%%esp), %%esp;\n" \ + " pushl %[thunk_target];\n" \ + " ret;\n", \ + X86_FEATURE_RETPOLINE, \ + "lfence;\n" \ + ANNOTATE_RETPOLINE_SAFE \ + "jmp *%[thunk_target]\n", \ + X86_FEATURE_RETPOLINE_LFENCE) + # define THUNK_TARGET(addr) [thunk_target] "rm" (addr) #endif #else /* No retpoline for C / inline asm */ # define CALL_NOSPEC "call *%[thunk_target]\n" +# define JMP_NOSPEC "jmp *%[thunk_target]\n" # define THUNK_TARGET(addr) [thunk_target] "rm" (addr) #endif ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C 2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra 2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra @ 2025-04-14 11:11 ` Peter Zijlstra 2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra ` (3 subsequent siblings) 5 siblings, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw) To: x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda Current test_cc() uses the fastop infrastructure to test flags using SETcc instructions. However, int3_emulate_jcc() already fully implements the flags->CC mapping, use that. Removes a pile of gnarly asm. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/include/asm/text-patching.h | 20 +++++++++++++------- arch/x86/kvm/emulate.c | 34 ++-------------------------------- 2 files changed, 15 insertions(+), 39 deletions(-) --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -177,9 +177,9 @@ void int3_emulate_ret(struct pt_regs *re } static __always_inline -void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp) +bool __emulate_cc(unsigned long flags, u8 cc) { - static const unsigned long jcc_mask[6] = { + static const unsigned long cc_mask[6] = { [0] = X86_EFLAGS_OF, [1] = X86_EFLAGS_CF, [2] = X86_EFLAGS_ZF, @@ -192,15 +192,21 @@ void int3_emulate_jcc(struct pt_regs *re bool match; if (cc < 0xc) { - match = regs->flags & jcc_mask[cc >> 1]; + match = flags & cc_mask[cc >> 1]; } else { - match = ((regs->flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^ - ((regs->flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT); + match = ((flags & X86_EFLAGS_SF) >> X86_EFLAGS_SF_BIT) ^ + ((flags & X86_EFLAGS_OF) >> X86_EFLAGS_OF_BIT); if (cc >= 0xe) - match = match || (regs->flags & X86_EFLAGS_ZF); + match = match || (flags & X86_EFLAGS_ZF); } - if ((match && !invert) || (!match && invert)) + return (match && !invert) || (!match && invert); +} + +static __always_inline +void int3_emulate_jcc(struct pt_regs *regs, u8 cc, unsigned long ip, unsigned long disp) +{ + if (__emulate_cc(regs->flags, cc)) ip += disp; int3_emulate_jmp(regs, ip); --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -26,6 +26,7 @@ #include <asm/debugreg.h> #include <asm/nospec-branch.h> #include <asm/ibt.h> +#include <asm/text-patching.h> #include "x86.h" #include "tss.h" @@ -416,31 +417,6 @@ static int fastop(struct x86_emulate_ctx ON64(FOP3E(op##q, rax, rdx, cl)) \ FOP_END -/* Special case for SETcc - 1 instruction per cc */ -#define FOP_SETCC(op) \ - FOP_FUNC(op) \ - #op " %al \n\t" \ - FOP_RET(op) - -FOP_START(setcc) -FOP_SETCC(seto) -FOP_SETCC(setno) -FOP_SETCC(setc) -FOP_SETCC(setnc) -FOP_SETCC(setz) -FOP_SETCC(setnz) -FOP_SETCC(setbe) -FOP_SETCC(setnbe) -FOP_SETCC(sets) -FOP_SETCC(setns) -FOP_SETCC(setp) -FOP_SETCC(setnp) -FOP_SETCC(setl) -FOP_SETCC(setnl) -FOP_SETCC(setle) -FOP_SETCC(setnle) -FOP_END; - FOP_START(salc) FOP_FUNC(salc) "pushf; sbb %al, %al; popf \n\t" @@ -1068,13 +1044,7 @@ static int em_bsr_c(struct x86_emulate_c static __always_inline u8 test_cc(unsigned int condition, unsigned long flags) { - u8 rc; - void (*fop)(void) = (void *)em_setcc + FASTOP_SIZE * (condition & 0xf); - - flags = (flags & EFLAGS_MASK) | X86_EFLAGS_IF; - asm("push %[flags]; popf; " CALL_NOSPEC - : "=a"(rc), ASM_CALL_CONSTRAINT : [thunk_target]"r"(fop), [flags]"r"(flags)); - return rc; + return __emulate_cc(flags, condition & 0xf); } static void fetch_register_operand(struct operand *op) ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra 2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra 2025-04-14 11:11 ` [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra @ 2025-04-14 11:11 ` Peter Zijlstra 2025-04-14 22:36 ` Josh Poimboeuf 2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra ` (2 subsequent siblings) 5 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw) To: x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda Since there is only a single fastop() function, convert the FASTOP stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return thunks and all that jazz. Specifically FASTOPs rely on the return thunk to preserve EFLAGS, which not all of them can trivially do (call depth tracing suffers here). Objtool strenuously complains about things, therefore fix up the various problems: - indirect call without a .rodata, fails to determine JUMP_TABLE, add an annotation for this. - fastop functions fall through, create an exception for this case - unreachable instruction after fastop_return, save/restore Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/kvm/emulate.c | 20 +++++++++++++++----- include/linux/objtool_types.h | 1 + tools/include/linux/objtool_types.h | 1 + tools/objtool/check.c | 11 ++++++++++- 4 files changed, 27 insertions(+), 6 deletions(-) --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -285,8 +285,8 @@ static void invalidate_registers(struct * different operand sizes can be reached by calculation, rather than a jump * table (which would be bigger than the code). * - * The 16 byte alignment, considering 5 bytes for the RET thunk, 3 for ENDBR - * and 1 for the straight line speculation INT3, leaves 7 bytes for the + * The 16 byte alignment, considering 5 bytes for the JMP, 4 for ENDBR + * and 1 for the straight line speculation INT3, leaves 6 bytes for the * body of the function. Currently none is larger than 4. */ static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop); @@ -304,7 +304,7 @@ static int fastop(struct x86_emulate_ctx __FOP_FUNC(#name) #define __FOP_RET(name) \ - "11: " ASM_RET \ + "11: jmp fastop_return; int3 \n\t" \ ".size " name ", .-" name "\n\t" #define FOP_RET(name) \ @@ -5044,14 +5044,24 @@ static void fetch_possible_mmx_operand(s kvm_read_mmx_reg(op->addr.mm, &op->mm_val); } -static int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop) +/* + * All the FASTOP magic above relies on there being *one* instance of this + * so it can JMP back, avoiding RET and it's various thunks. + */ +static noinline int fastop(struct x86_emulate_ctxt *ctxt, fastop_t fop) { ulong flags = (ctxt->eflags & EFLAGS_MASK) | X86_EFLAGS_IF; if (!(ctxt->d & ByteOp)) fop += __ffs(ctxt->dst.bytes) * FASTOP_SIZE; - asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n" + asm("push %[flags]; popf \n\t" + UNWIND_HINT(UNWIND_HINT_TYPE_SAVE, 0, 0, 0) + ASM_ANNOTATE(ANNOTYPE_JUMP_TABLE) + JMP_NOSPEC + "fastop_return: \n\t" + UNWIND_HINT(UNWIND_HINT_TYPE_RESTORE, 0, 0, 0) + "pushf; pop %[flags]\n" : "+a"(ctxt->dst.val), "+d"(ctxt->src.val), [flags]"+D"(flags), [thunk_target]"+S"(fop), ASM_CALL_CONSTRAINT : "c"(ctxt->src2.val)); --- a/include/linux/objtool_types.h +++ b/include/linux/objtool_types.h @@ -65,5 +65,6 @@ struct unwind_hint { #define ANNOTYPE_IGNORE_ALTS 6 #define ANNOTYPE_INTRA_FUNCTION_CALL 7 #define ANNOTYPE_REACHABLE 8 +#define ANNOTYPE_JUMP_TABLE 9 #endif /* _LINUX_OBJTOOL_TYPES_H */ --- a/tools/include/linux/objtool_types.h +++ b/tools/include/linux/objtool_types.h @@ -65,5 +65,6 @@ struct unwind_hint { #define ANNOTYPE_IGNORE_ALTS 6 #define ANNOTYPE_INTRA_FUNCTION_CALL 7 #define ANNOTYPE_REACHABLE 8 +#define ANNOTYPE_JUMP_TABLE 9 #endif /* _LINUX_OBJTOOL_TYPES_H */ --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2428,6 +2428,14 @@ static int __annotate_late(struct objtoo insn->dead_end = false; break; + /* + * Must be after add_jump_table(); for it doesn't set a sane + * _jump_table value. + */ + case ANNOTYPE_JUMP_TABLE: + insn->_jump_table = (void *)1; + break; + default: ERROR_INSN(insn, "Unknown annotation type: %d", type); return -1; @@ -3559,7 +3567,8 @@ static int validate_branch(struct objtoo if (func && insn_func(insn) && func != insn_func(insn)->pfunc) { /* Ignore KCFI type preambles, which always fall through */ if (!strncmp(func->name, "__cfi_", 6) || - !strncmp(func->name, "__pfx_", 6)) + !strncmp(func->name, "__pfx_", 6) || + !strcmp(insn_func(insn)->name, "fastop")) return 0; if (file->ignore_unreachables) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra @ 2025-04-14 22:36 ` Josh Poimboeuf 2025-04-15 7:44 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Josh Poimboeuf @ 2025-04-14 22:36 UTC (permalink / raw) To: Peter Zijlstra Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote: > Since there is only a single fastop() function, convert the FASTOP > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return > thunks and all that jazz. > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS, > which not all of them can trivially do (call depth tracing suffers > here). > > Objtool strenuously complains about things, therefore fix up the > various problems: > > - indirect call without a .rodata, fails to determine JUMP_TABLE, > add an annotation for this. > - fastop functions fall through, create an exception for this case > - unreachable instruction after fastop_return, save/restore I think this breaks unwinding. Each of the individual fastops inherits fastop()'s stack but the ORC doesn't reflect that. Should they just be moved to a proper .S file? -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-14 22:36 ` Josh Poimboeuf @ 2025-04-15 7:44 ` Peter Zijlstra 2025-04-15 14:39 ` Josh Poimboeuf 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-15 7:44 UTC (permalink / raw) To: Josh Poimboeuf Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote: > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote: > > Since there is only a single fastop() function, convert the FASTOP > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return > > thunks and all that jazz. > > > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS, > > which not all of them can trivially do (call depth tracing suffers > > here). > > > > Objtool strenuously complains about things, therefore fix up the > > various problems: > > > > - indirect call without a .rodata, fails to determine JUMP_TABLE, > > add an annotation for this. > > - fastop functions fall through, create an exception for this case > > - unreachable instruction after fastop_return, save/restore > > I think this breaks unwinding. Each of the individual fastops inherits > fastop()'s stack but the ORC doesn't reflect that. I'm not sure I understand. There is only the one location, and we simply save/restore the state around the one 'call'. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-15 7:44 ` Peter Zijlstra @ 2025-04-15 14:39 ` Josh Poimboeuf 2025-04-16 8:38 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Josh Poimboeuf @ 2025-04-15 14:39 UTC (permalink / raw) To: Peter Zijlstra Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote: > On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote: > > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote: > > > Since there is only a single fastop() function, convert the FASTOP > > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return > > > thunks and all that jazz. > > > > > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS, > > > which not all of them can trivially do (call depth tracing suffers > > > here). > > > > > > Objtool strenuously complains about things, therefore fix up the > > > various problems: > > > > > > - indirect call without a .rodata, fails to determine JUMP_TABLE, > > > add an annotation for this. > > > - fastop functions fall through, create an exception for this case > > > - unreachable instruction after fastop_return, save/restore > > > > I think this breaks unwinding. Each of the individual fastops inherits > > fastop()'s stack but the ORC doesn't reflect that. > > I'm not sure I understand. There is only the one location, and we > simply save/restore the state around the one 'call'. The problem isn't fastop() but rather the tiny functions it "calls". Each of those is marked STT_FUNC so it gets its own ORC data saying the return address is at RSP+8. Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address isn't pushed before the branch. Thus they become part of fastop() rather than separate functions. RSP+8 is only correct if it happens to have not pushed anything to the stack before the indirect JMP. The addresses aren't stored in an .rodata jump table so objtool doesn't know the control flow. Even if we made them non-FUNC, objtool wouldn't be able to transfer the stack state. -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-15 14:39 ` Josh Poimboeuf @ 2025-04-16 8:38 ` Peter Zijlstra 2025-04-26 10:01 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-16 8:38 UTC (permalink / raw) To: Josh Poimboeuf Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Tue, Apr 15, 2025 at 07:39:41AM -0700, Josh Poimboeuf wrote: > On Tue, Apr 15, 2025 at 09:44:21AM +0200, Peter Zijlstra wrote: > > On Mon, Apr 14, 2025 at 03:36:50PM -0700, Josh Poimboeuf wrote: > > > On Mon, Apr 14, 2025 at 01:11:43PM +0200, Peter Zijlstra wrote: > > > > Since there is only a single fastop() function, convert the FASTOP > > > > stuff from CALL_NOSPEC+RET to JMP_NOSPEC+JMP, avoiding the return > > > > thunks and all that jazz. > > > > > > > > Specifically FASTOPs rely on the return thunk to preserve EFLAGS, > > > > which not all of them can trivially do (call depth tracing suffers > > > > here). > > > > > > > > Objtool strenuously complains about things, therefore fix up the > > > > various problems: > > > > > > > > - indirect call without a .rodata, fails to determine JUMP_TABLE, > > > > add an annotation for this. > > > > - fastop functions fall through, create an exception for this case > > > > - unreachable instruction after fastop_return, save/restore > > > > > > I think this breaks unwinding. Each of the individual fastops inherits > > > fastop()'s stack but the ORC doesn't reflect that. > > > > I'm not sure I understand. There is only the one location, and we > > simply save/restore the state around the one 'call'. > > The problem isn't fastop() but rather the tiny functions it "calls". > Each of those is marked STT_FUNC so it gets its own ORC data saying the > return address is at RSP+8. > > Changing from CALL_NOSPEC+RET to JMP_NOSPEC+JMP means the return address > isn't pushed before the branch. Thus they become part of fastop() > rather than separate functions. RSP+8 is only correct if it happens to > have not pushed anything to the stack before the indirect JMP. Yeah, I finally got there. I'll go cook up something else. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-16 8:38 ` Peter Zijlstra @ 2025-04-26 10:01 ` Peter Zijlstra 2025-04-28 17:13 ` Sean Christopherson 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-26 10:01 UTC (permalink / raw) To: Josh Poimboeuf Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote: > Yeah, I finally got there. I'll go cook up something else. Sean, Paolo, can I once again ask how best to test this fastop crud? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-26 10:01 ` Peter Zijlstra @ 2025-04-28 17:13 ` Sean Christopherson 2025-04-29 10:09 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Sean Christopherson @ 2025-04-28 17:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Sat, Apr 26, 2025, Peter Zijlstra wrote: > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote: > > > Yeah, I finally got there. I'll go cook up something else. > > Sean, Paolo, can I once again ask how best to test this fastop crud? Apply the below, build KVM selftests, enable forced emulation in KVM, and then run fastops_test. It's well past time we had a selftest for this. It won't detect bugs that are specific to 32-bit kernels, e.g. b63f20a778c8 ("x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386"), since KVM selftests are 64-bit only, but for what you're doing, it should suffice. For 32-bit kernels, it requires a 32-bit QEMU and KVM-Unit-Tests (or maybe even a full blown 32-bit guest image; I forget how much coverage KUT provides). Regardless, I don't see any reason to put you through that pain, I can do that sanity testing. I'll post a proper patch for the new selftest after testing on AMD. The test relies on hardware providing deterministic behavior for undefined output (RFLAGS and GPRs); I don't know if that holds true on AMD. To enable forced emulation, set /sys/module/kvm/parameters/force_emulation_prefix to '1' (for the purposes of this test, the value doesn't matter). The param is writable at runtime, so it doesn't matter if kvm.ko is built-in or a module. --- From: Sean Christopherson <seanjc@google.com> Date: Mon, 28 Apr 2025 08:55:44 -0700 Subject: [PATCH] KVM: selftests: Add a test for x86's fastops emulation Signed-off-by: Sean Christopherson <seanjc@google.com> --- tools/testing/selftests/kvm/Makefile.kvm | 1 + .../testing/selftests/kvm/x86/fastops_test.c | 165 ++++++++++++++++++ 2 files changed, 166 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86/fastops_test.c diff --git a/tools/testing/selftests/kvm/Makefile.kvm b/tools/testing/selftests/kvm/Makefile.kvm index f62b0a5aba35..411c3d5eb5b1 100644 --- a/tools/testing/selftests/kvm/Makefile.kvm +++ b/tools/testing/selftests/kvm/Makefile.kvm @@ -66,6 +66,7 @@ TEST_GEN_PROGS_x86 += x86/cr4_cpuid_sync_test TEST_GEN_PROGS_x86 += x86/dirty_log_page_splitting_test TEST_GEN_PROGS_x86 += x86/feature_msrs_test TEST_GEN_PROGS_x86 += x86/exit_on_emulation_failure_test +TEST_GEN_PROGS_x86 += x86/fastops_test TEST_GEN_PROGS_x86 += x86/fix_hypercall_test TEST_GEN_PROGS_x86 += x86/hwcr_msr_test TEST_GEN_PROGS_x86 += x86/hyperv_clock diff --git a/tools/testing/selftests/kvm/x86/fastops_test.c b/tools/testing/selftests/kvm/x86/fastops_test.c new file mode 100644 index 000000000000..c3799edb5d0c --- /dev/null +++ b/tools/testing/selftests/kvm/x86/fastops_test.c @@ -0,0 +1,165 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" + +/* + * Execute a fastop() instruction, with or without forced emulation. BT bit 0 + * to set RFLAGS.CF based on whether or not the input is even or odd, so that + * instructions like ADC and SBB are deterministic. + */ +#define guest_execute_fastop_1(FEP, insn, type_t, __val, __flags) \ +do { \ + __asm__ __volatile__("bt $0, %[val]\n\t" \ + FEP insn " %[val]\n\t" \ + "pushfq\n\t" \ + "pop %[flags]\n\t" \ + : [val]"+r"(__val), [flags]"=r"(__flags) \ + : : "cc", "memory"); \ +} while (0) + +#define guest_test_fastop_1(insn, type_t, __val) \ +do { \ + type_t val = __val, ex_val = __val, input = __val; \ + uint64_t flags, ex_flags; \ + \ + guest_execute_fastop_1("", insn, type_t, ex_val, ex_flags); \ + guest_execute_fastop_1(KVM_FEP, insn, type_t, val, flags); \ + \ + __GUEST_ASSERT(val == ex_val, \ + "Wanted 0x%lx for '%s 0x%lx', got 0x%lx", \ + (uint64_t)ex_val, insn, (uint64_t)input, (uint64_t)val); \ + __GUEST_ASSERT(flags == ex_flags, \ + "Wanted flags 0x%lx for '%s 0x%lx', got 0x%lx", \ + ex_flags, insn, (uint64_t)input, flags); \ +} while (0) + +#define guest_execute_fastop_2(FEP, insn, type_t, __input, __output, __flags) \ +do { \ + __asm__ __volatile__("bt $0, %[output]\n\t" \ + FEP insn " %[input], %[output]\n\t" \ + "pushfq\n\t" \ + "pop %[flags]\n\t" \ + : [output]"+r"(__output), [flags]"=r"(__flags) \ + : [input]"r"(__input) : "cc", "memory"); \ +} while (0) + +#define guest_test_fastop_2(insn, type_t, __val1, __val2) \ +do { \ + type_t input = __val1, input2 = __val2, output = __val2, ex_output = __val2; \ + uint64_t flags, ex_flags; \ + \ + guest_execute_fastop_2("", insn, type_t, input, ex_output, ex_flags); \ + guest_execute_fastop_2(KVM_FEP, insn, type_t, input, output, flags); \ + \ + __GUEST_ASSERT(output == ex_output, \ + "Wanted 0x%lx for '%s 0x%lx 0x%lx', got 0x%lx", \ + (uint64_t)ex_output, insn, (uint64_t)input, \ + (uint64_t)input2, (uint64_t)output); \ + __GUEST_ASSERT(flags == ex_flags, \ + "Wanted flags 0x%lx for '%s 0x%lx, 0x%lx', got 0x%lx", \ + ex_flags, insn, (uint64_t)input, (uint64_t)input2, flags); \ +} while (0) + +#define guest_execute_fastop_cl(FEP, insn, type_t, __shift, __output, __flags) \ +do { \ + __asm__ __volatile__("bt $0, %[output]\n\t" \ + FEP insn " %%cl, %[output]\n\t" \ + "pushfq\n\t" \ + "pop %[flags]\n\t" \ + : [output]"+r"(__output), [flags]"=r"(__flags) \ + : "c"(__shift) : "cc", "memory"); \ +} while (0) + +#define guest_test_fastop_cl(insn, type_t, __val1, __val2) \ +do { \ + type_t output = __val2, ex_output = __val2, input = __val2; \ + uint8_t shift = __val1; \ + uint64_t flags, ex_flags; \ + \ + guest_execute_fastop_cl("", insn, type_t, shift, ex_output, ex_flags); \ + guest_execute_fastop_cl(KVM_FEP, insn, type_t, shift, output, flags); \ + \ + __GUEST_ASSERT(output == ex_output, \ + "Wanted 0x%lx for '%s 0x%x, 0x%lx', got 0x%lx", \ + (uint64_t)ex_output, insn, shift, (uint64_t)input, \ + (uint64_t)output); \ + __GUEST_ASSERT(flags == ex_flags, \ + "Wanted flags 0x%lx for '%s 0x%x, 0x%lx', got 0x%lx", \ + ex_flags, insn, shift, (uint64_t)input, flags); \ +} while (0) + +static const uint64_t vals[] = { + 0, + 1, + 2, + 4, + 7, + 0x5555555555555555, + 0xaaaaaaaaaaaaaaaa, + 0xfefefefefefefefe, + 0xffffffffffffffff, +}; + +#define guest_test_fastops(type_t, suffix) \ +do { \ + int i, j; \ + \ + for (i = 0; i < ARRAY_SIZE(vals); i++) { \ + guest_test_fastop_1("dec" suffix, type_t, vals[i]); \ + guest_test_fastop_1("inc" suffix, type_t, vals[i]); \ + guest_test_fastop_1("neg" suffix, type_t, vals[i]); \ + guest_test_fastop_1("not" suffix, type_t, vals[i]); \ + \ + for (j = 0; j < ARRAY_SIZE(vals); j++) { \ + guest_test_fastop_2("add" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("adc" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("and" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("bsf" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("bsr" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("bt" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("btc" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("btr" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("bts" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("cmp" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("imul" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("or" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("sbb" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("sub" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("test" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_2("xor" suffix, type_t, vals[i], vals[j]); \ + \ + guest_test_fastop_cl("rol" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_cl("ror" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_cl("rcl" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_cl("rcr" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_cl("sar" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_cl("shl" suffix, type_t, vals[i], vals[j]); \ + guest_test_fastop_cl("shr" suffix, type_t, vals[i], vals[j]); \ + } \ + } \ +} while (0) + +static void guest_code(void) +{ + guest_test_fastops(uint16_t, "w"); + guest_test_fastops(uint32_t, "l"); + guest_test_fastops(uint64_t, "q"); + + GUEST_DONE(); +} + +int main(int argc, char *argv[]) +{ + struct kvm_vcpu *vcpu; + struct kvm_vm *vm; + + TEST_REQUIRE(is_forced_emulation_enabled); + + vm = vm_create_with_one_vcpu(&vcpu, guest_code); + + vcpu_run(vcpu); + TEST_ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_DONE); + + kvm_vm_free(vm); +} base-commit: 661b7ddb2d10258b53106d7c39c309806b00a99c -- ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-28 17:13 ` Sean Christopherson @ 2025-04-29 10:09 ` Peter Zijlstra 2025-04-29 14:05 ` Sean Christopherson 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-29 10:09 UTC (permalink / raw) To: Sean Christopherson Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote: > On Sat, Apr 26, 2025, Peter Zijlstra wrote: > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote: > > > > > Yeah, I finally got there. I'll go cook up something else. > > > > Sean, Paolo, can I once again ask how best to test this fastop crud? > > Apply the below, build KVM selftests, Patch applied, my own hackery applied, host kernel built and booted, foce_emulation_prefix set, but now I'm stuck at this seemingly simple step.. $ cd tools/testing/selftests/kvm/ $ make ... metric ton of fail ... Clearly I'm doing something wrong :/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-29 10:09 ` Peter Zijlstra @ 2025-04-29 14:05 ` Sean Christopherson 2025-04-29 14:46 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Sean Christopherson @ 2025-04-29 14:05 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Tue, Apr 29, 2025, Peter Zijlstra wrote: > On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote: > > On Sat, Apr 26, 2025, Peter Zijlstra wrote: > > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote: > > > > > > > Yeah, I finally got there. I'll go cook up something else. > > > > > > Sean, Paolo, can I once again ask how best to test this fastop crud? > > > > Apply the below, build KVM selftests, > > Patch applied, my own hackery applied, host kernel built and booted, > foce_emulation_prefix set, but now I'm stuck at this seemingly simple > step.. > > $ cd tools/testing/selftests/kvm/ > $ make > ... metric ton of fail ... > > Clearly I'm doing something wrong :/ Did you install headers in the top level directory? I.e. make headers_install. The selftests build system was change a while back to require users to manually install headers (I forget why, but it is indeed annoying). ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-29 14:05 ` Sean Christopherson @ 2025-04-29 14:46 ` Peter Zijlstra 2025-04-29 17:16 ` Sean Christopherson 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-29 14:46 UTC (permalink / raw) To: Sean Christopherson Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda, shuah On Tue, Apr 29, 2025 at 07:05:35AM -0700, Sean Christopherson wrote: > On Tue, Apr 29, 2025, Peter Zijlstra wrote: > > On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote: > > > On Sat, Apr 26, 2025, Peter Zijlstra wrote: > > > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote: > > > > > > > > > Yeah, I finally got there. I'll go cook up something else. > > > > > > > > Sean, Paolo, can I once again ask how best to test this fastop crud? > > > > > > Apply the below, build KVM selftests, > > > > Patch applied, my own hackery applied, host kernel built and booted, > > foce_emulation_prefix set, but now I'm stuck at this seemingly simple > > step.. > > > > $ cd tools/testing/selftests/kvm/ > > $ make > > ... metric ton of fail ... > > > > Clearly I'm doing something wrong :/ > > Did you install headers in the top level directory? I.e. make headers_install. No, of course not :-) I don't use the top directory to build anything, ever. All my builds are into build directories, using make O=foo. This allows me to do parallel builds for multiple architectures etc. Also, much easier to wipe a complete build directory than it is to clean out the top level dir. > The selftests build system was change a while back to require users to manually > install headers (I forget why, but it is indeed annoying). Bah, I remember NAK-ing that. Clearly the selftest people don't want selftests to be usable :-( Anyway, mingo build me a copy of the fastop selftest, and aside from a few stupid mistakes, I seem to now pass it \o/ I'll attempt a Changelog and update the hyper-v patches and then post the lot. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops 2025-04-29 14:46 ` Peter Zijlstra @ 2025-04-29 17:16 ` Sean Christopherson 0 siblings, 0 replies; 32+ messages in thread From: Sean Christopherson @ 2025-04-29 17:16 UTC (permalink / raw) To: Peter Zijlstra Cc: Josh Poimboeuf, x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda, shuah On Tue, Apr 29, 2025, Peter Zijlstra wrote: > On Tue, Apr 29, 2025 at 07:05:35AM -0700, Sean Christopherson wrote: > > On Tue, Apr 29, 2025, Peter Zijlstra wrote: > > > On Mon, Apr 28, 2025 at 10:13:31AM -0700, Sean Christopherson wrote: > > > > On Sat, Apr 26, 2025, Peter Zijlstra wrote: > > > > > On Wed, Apr 16, 2025 at 10:38:59AM +0200, Peter Zijlstra wrote: > > > > > > > > > > > Yeah, I finally got there. I'll go cook up something else. > > > > > > > > > > Sean, Paolo, can I once again ask how best to test this fastop crud? > > > > > > > > Apply the below, build KVM selftests, > > > > > > Patch applied, my own hackery applied, host kernel built and booted, > > > foce_emulation_prefix set, but now I'm stuck at this seemingly simple > > > step.. > > > > > > $ cd tools/testing/selftests/kvm/ > > > $ make > > > ... metric ton of fail ... > > > > > > Clearly I'm doing something wrong :/ > > > > Did you install headers in the top level directory? I.e. make headers_install. > > No, of course not :-) I don't use the top directory to build anything, > ever. > > All my builds are into build directories, using make O=foo. This allows > me to do parallel builds for multiple architectures etc. Also, much > easier to wipe a complete build directory than it is to clean out the > top level dir. FWIW, you can do the same with KVM selftests (and presumably others?), although the syntax is kinda weird (no idea why lib.mk uses OUTPUT instead of O). E.g. to build KVM selftests in $HOME/build/selftests/x86 make O=$HOME/build/selftests/x86 headers_install make OUTPUT=$HOME/build/selftests/x86 ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra ` (2 preceding siblings ...) 2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra @ 2025-04-14 11:11 ` Peter Zijlstra 2025-04-14 11:47 ` Peter Zijlstra ` (2 more replies) 2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra 2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra 5 siblings, 3 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw) To: x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda What used to be a simple few instructions has turned into a giant mess (for x86_64). Not only does it use static_branch wrong, it mixes it with dynamic branches for no apparent reason. Notably it uses static_branch through an out-of-line function call, which completely defeats the purpose, since instead of a simple JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is absolutely idiotic. Add to that a dynamic test of hyperv_paravisor_present, something which is set once and never changed. Replace all this idiocy with a single direct function call to the right hypercall variant. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/hyperv/hv_init.c | 21 ++++++ arch/x86/hyperv/ivm.c | 14 ++++ arch/x86/include/asm/mshyperv.h | 137 +++++++++++----------------------------- arch/x86/kernel/cpu/mshyperv.c | 18 +++-- 4 files changed, 88 insertions(+), 102 deletions(-) --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -35,7 +35,28 @@ #include <linux/highmem.h> void *hv_hypercall_pg; + +#ifdef CONFIG_X86_64 +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) +{ + u64 hv_status; + + if (!hv_hypercall_pg) + return U64_MAX; + + register u64 __r8 asm("r8") = param2; + asm volatile (CALL_NOSPEC + : "=a" (hv_status), ASM_CALL_CONSTRAINT, + "+c" (control), "+d" (param1) + : "r" (__r8), + THUNK_TARGET(hv_hypercall_pg) + : "cc", "memory", "r9", "r10", "r11"); + + return hv_status; +} +#else EXPORT_SYMBOL_GPL(hv_hypercall_pg); +#endif union hv_ghcb * __percpu *hv_ghcb_pg; --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -376,6 +376,20 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon return ret; } +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) +{ + u64 hv_status; + + register u64 __r8 asm("r8") = param2; + asm volatile("vmmcall" + : "=a" (hv_status), ASM_CALL_CONSTRAINT, + "+c" (control), "+d" (param1) + : "r" (__r8) + : "cc", "memory", "r9", "r10", "r11"); + + return hv_status; +} + #else static inline void hv_ghcb_msr_write(u64 msr, u64 value) {} static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {} --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -6,6 +6,7 @@ #include <linux/nmi.h> #include <linux/msi.h> #include <linux/io.h> +#include <linux/static_call.h> #include <asm/nospec-branch.h> #include <asm/paravirt.h> #include <hyperv/hvhdk.h> @@ -39,15 +40,20 @@ static inline unsigned char hv_get_nmi_r } #if IS_ENABLED(CONFIG_HYPERV) -extern bool hyperv_paravisor_present; - extern void *hv_hypercall_pg; extern union hv_ghcb * __percpu *hv_ghcb_pg; bool hv_isolation_type_snp(void); bool hv_isolation_type_tdx(void); -u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); + +#ifdef CONFIG_X86_64 +extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); +extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2); +extern u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2); + +DECLARE_STATIC_CALL(hv_hypercall, hv_pg_hypercall); +#endif /* * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA @@ -64,37 +70,15 @@ static inline u64 hv_do_hypercall(u64 co { u64 input_address = input ? virt_to_phys(input) : 0; u64 output_address = output ? virt_to_phys(output) : 0; - u64 hv_status; #ifdef CONFIG_X86_64 - if (hv_isolation_type_tdx() && !hyperv_paravisor_present) - return hv_tdx_hypercall(control, input_address, output_address); - - if (hv_isolation_type_snp() && !hyperv_paravisor_present) { - __asm__ __volatile__("mov %[output_address], %%r8\n" - "vmmcall" - : "=a" (hv_status), ASM_CALL_CONSTRAINT, - "+c" (control), "+d" (input_address) - : [output_address] "r" (output_address) - : "cc", "memory", "r8", "r9", "r10", "r11"); - return hv_status; - } - - if (!hv_hypercall_pg) - return U64_MAX; - - __asm__ __volatile__("mov %[output_address], %%r8\n" - CALL_NOSPEC - : "=a" (hv_status), ASM_CALL_CONSTRAINT, - "+c" (control), "+d" (input_address) - : [output_address] "r" (output_address), - THUNK_TARGET(hv_hypercall_pg) - : "cc", "memory", "r8", "r9", "r10", "r11"); + return static_call_mod(hv_hypercall)(control, input_address, output_address); #else u32 input_address_hi = upper_32_bits(input_address); u32 input_address_lo = lower_32_bits(input_address); u32 output_address_hi = upper_32_bits(output_address); u32 output_address_lo = lower_32_bits(output_address); + u64 hv_status; if (!hv_hypercall_pg) return U64_MAX; @@ -107,8 +91,8 @@ static inline u64 hv_do_hypercall(u64 co "D"(output_address_hi), "S"(output_address_lo), THUNK_TARGET(hv_hypercall_pg) : "cc", "memory"); -#endif /* !x86_64 */ return hv_status; +#endif /* !x86_64 */ } /* Hypercall to the L0 hypervisor */ @@ -120,41 +104,23 @@ static inline u64 hv_do_nested_hypercall /* Fast hypercall with 8 bytes of input and no output */ static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1) { - u64 hv_status; - #ifdef CONFIG_X86_64 - if (hv_isolation_type_tdx() && !hyperv_paravisor_present) - return hv_tdx_hypercall(control, input1, 0); - - if (hv_isolation_type_snp() && !hyperv_paravisor_present) { - __asm__ __volatile__( - "vmmcall" - : "=a" (hv_status), ASM_CALL_CONSTRAINT, - "+c" (control), "+d" (input1) - :: "cc", "r8", "r9", "r10", "r11"); - } else { - __asm__ __volatile__(CALL_NOSPEC - : "=a" (hv_status), ASM_CALL_CONSTRAINT, - "+c" (control), "+d" (input1) - : THUNK_TARGET(hv_hypercall_pg) - : "cc", "r8", "r9", "r10", "r11"); - } + return static_call_mod(hv_hypercall)(control, input1, 0); #else - { - u32 input1_hi = upper_32_bits(input1); - u32 input1_lo = lower_32_bits(input1); - - __asm__ __volatile__ (CALL_NOSPEC - : "=A"(hv_status), - "+c"(input1_lo), - ASM_CALL_CONSTRAINT - : "A" (control), - "b" (input1_hi), - THUNK_TARGET(hv_hypercall_pg) - : "cc", "edi", "esi"); - } -#endif + u32 input1_hi = upper_32_bits(input1); + u32 input1_lo = lower_32_bits(input1); + u64 hv_status; + + __asm__ __volatile__ (CALL_NOSPEC + : "=A"(hv_status), + "+c"(input1_lo), + ASM_CALL_CONSTRAINT + : "A" (control), + "b" (input1_hi), + THUNK_TARGET(hv_hypercall_pg) + : "cc", "edi", "esi"); return hv_status; +#endif } static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1) @@ -174,45 +140,24 @@ static inline u64 hv_do_fast_nested_hype /* Fast hypercall with 16 bytes of input */ static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2) { - u64 hv_status; - #ifdef CONFIG_X86_64 - if (hv_isolation_type_tdx() && !hyperv_paravisor_present) - return hv_tdx_hypercall(control, input1, input2); - - if (hv_isolation_type_snp() && !hyperv_paravisor_present) { - __asm__ __volatile__("mov %[input2], %%r8\n" - "vmmcall" - : "=a" (hv_status), ASM_CALL_CONSTRAINT, - "+c" (control), "+d" (input1) - : [input2] "r" (input2) - : "cc", "r8", "r9", "r10", "r11"); - } else { - __asm__ __volatile__("mov %[input2], %%r8\n" - CALL_NOSPEC - : "=a" (hv_status), ASM_CALL_CONSTRAINT, - "+c" (control), "+d" (input1) - : [input2] "r" (input2), - THUNK_TARGET(hv_hypercall_pg) - : "cc", "r8", "r9", "r10", "r11"); - } + return static_call_mod(hv_hypercall)(control, input1, input2); #else - { - u32 input1_hi = upper_32_bits(input1); - u32 input1_lo = lower_32_bits(input1); - u32 input2_hi = upper_32_bits(input2); - u32 input2_lo = lower_32_bits(input2); - - __asm__ __volatile__ (CALL_NOSPEC - : "=A"(hv_status), - "+c"(input1_lo), ASM_CALL_CONSTRAINT - : "A" (control), "b" (input1_hi), - "D"(input2_hi), "S"(input2_lo), - THUNK_TARGET(hv_hypercall_pg) - : "cc"); - } -#endif + u32 input1_hi = upper_32_bits(input1); + u32 input1_lo = lower_32_bits(input1); + u32 input2_hi = upper_32_bits(input2); + u32 input2_lo = lower_32_bits(input2); + u64 hv_status; + + __asm__ __volatile__ (CALL_NOSPEC + : "=A"(hv_status), + "+c"(input1_lo), ASM_CALL_CONSTRAINT + : "A" (control), "b" (input1_hi), + "D"(input2_hi), "S"(input2_lo), + THUNK_TARGET(hv_hypercall_pg) + : "cc"); return hv_status; +#endif } static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2) --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -37,10 +37,6 @@ bool hv_nested; struct ms_hyperv_info ms_hyperv; -/* Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h */ -bool hyperv_paravisor_present __ro_after_init; -EXPORT_SYMBOL_GPL(hyperv_paravisor_present); - #if IS_ENABLED(CONFIG_HYPERV) static inline unsigned int hv_get_nested_msr(unsigned int reg) { @@ -287,6 +283,11 @@ static void __init x86_setup_ops_for_tsc old_restore_sched_clock_state = x86_platform.restore_sched_clock_state; x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state; } + +#ifdef CONFIG_X86_64 +DEFINE_STATIC_CALL(hv_hypercall, hv_pg_hypercall); +EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall); +#endif #endif /* CONFIG_HYPERV */ static uint32_t __init ms_hyperv_platform(void) @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo ms_hyperv.shared_gpa_boundary = BIT_ULL(ms_hyperv.shared_gpa_boundary_bits); - hyperv_paravisor_present = !!ms_hyperv.paravisor_present; - pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(&isolation_type_snp); +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV) + if (!ms_hyperv.paravisor_present) + static_call_update(hv_hypercall, hv_snp_hypercall); +#endif } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) { static_branch_enable(&isolation_type_tdx); @@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED; if (!ms_hyperv.paravisor_present) { +#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV) + static_call_update(hv_hypercall, hv_tdx_hypercall); +#endif /* * Mark the Hyper-V TSC page feature as disabled * in a TDX VM without paravisor so that the ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra @ 2025-04-14 11:47 ` Peter Zijlstra 2025-04-14 14:06 ` Uros Bizjak 2025-04-21 18:27 ` Michael Kelley 2 siblings, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 11:47 UTC (permalink / raw) To: x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 14, 2025 at 01:11:44PM +0200, Peter Zijlstra wrote: > +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > +{ > + u64 hv_status; > + > + if (!hv_hypercall_pg) > + return U64_MAX; > + > + register u64 __r8 asm("r8") = param2; > + asm volatile (CALL_NOSPEC > + : "=a" (hv_status), ASM_CALL_CONSTRAINT, > + "+c" (control), "+d" (param1) > + : "r" (__r8), > + THUNK_TARGET(hv_hypercall_pg) > + : "cc", "memory", "r9", "r10", "r11"); > + > + return hv_status; > +} I've tried making this a tail-call, but even without the ASM_CALL_CONSTRAINT on it confuses the compiler (objtool warns on frame-pointer builds about non-matching stack). I could of course have written the entire thing in asm, but meh. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra 2025-04-14 11:47 ` Peter Zijlstra @ 2025-04-14 14:06 ` Uros Bizjak 2025-04-14 14:08 ` Peter Zijlstra 2025-04-21 18:27 ` Michael Kelley 2 siblings, 1 reply; 32+ messages in thread From: Uros Bizjak @ 2025-04-14 14:06 UTC (permalink / raw) To: Peter Zijlstra, x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On 14. 04. 25 13:11, Peter Zijlstra wrote: > What used to be a simple few instructions has turned into a giant mess > (for x86_64). Not only does it use static_branch wrong, it mixes it > with dynamic branches for no apparent reason. > > Notably it uses static_branch through an out-of-line function call, > which completely defeats the purpose, since instead of a simple > JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is > absolutely idiotic. > > Add to that a dynamic test of hyperv_paravisor_present, something > which is set once and never changed. > > Replace all this idiocy with a single direct function call to the > right hypercall variant. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/hyperv/hv_init.c | 21 ++++++ > arch/x86/hyperv/ivm.c | 14 ++++ > arch/x86/include/asm/mshyperv.h | 137 +++++++++++----------------------------- > arch/x86/kernel/cpu/mshyperv.c | 18 +++-- > 4 files changed, 88 insertions(+), 102 deletions(-) > > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -35,7 +35,28 @@ > #include <linux/highmem.h> > > void *hv_hypercall_pg; > + > +#ifdef CONFIG_X86_64 > +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > +{ > + u64 hv_status; > + > + if (!hv_hypercall_pg) > + return U64_MAX; > + > + register u64 __r8 asm("r8") = param2; > + asm volatile (CALL_NOSPEC > + : "=a" (hv_status), ASM_CALL_CONSTRAINT, > + "+c" (control), "+d" (param1) > + : "r" (__r8), r8 is call-clobbered register, so you should use "+r" (__r8) to properly clobber it: "+c" (control), "+d" (param1), "+r" (__r8) : THUNK_TARGET(hv_hypercall_pg) > + : "cc", "memory", "r9", "r10", "r11"); > + > + return hv_status; > +} > +#else > EXPORT_SYMBOL_GPL(hv_hypercall_pg); > +#endif > > union hv_ghcb * __percpu *hv_ghcb_pg; > > --- a/arch/x86/hyperv/ivm.c > +++ b/arch/x86/hyperv/ivm.c > @@ -376,6 +376,20 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon > return ret; > } > > +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) > +{ > + u64 hv_status; > + > + register u64 __r8 asm("r8") = param2; > + asm volatile("vmmcall" > + : "=a" (hv_status), ASM_CALL_CONSTRAINT, > + "+c" (control), "+d" (param1) > + : "r" (__r8) Also here: "+c" (control), "+d" (param1), "+r" (__r8) : > + : "cc", "memory", "r9", "r10", "r11"); > + > + return hv_status; > +} Uros. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-14 14:06 ` Uros Bizjak @ 2025-04-14 14:08 ` Peter Zijlstra 0 siblings, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 14:08 UTC (permalink / raw) To: Uros Bizjak Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 14, 2025 at 04:06:41PM +0200, Uros Bizjak wrote: > > > On 14. 04. 25 13:11, Peter Zijlstra wrote: > > What used to be a simple few instructions has turned into a giant mess > > (for x86_64). Not only does it use static_branch wrong, it mixes it > > with dynamic branches for no apparent reason. > > > > Notably it uses static_branch through an out-of-line function call, > > which completely defeats the purpose, since instead of a simple > > JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is > > absolutely idiotic. > > > > Add to that a dynamic test of hyperv_paravisor_present, something > > which is set once and never changed. > > > > Replace all this idiocy with a single direct function call to the > > right hypercall variant. > > > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > > --- > > arch/x86/hyperv/hv_init.c | 21 ++++++ > > arch/x86/hyperv/ivm.c | 14 ++++ > > arch/x86/include/asm/mshyperv.h | 137 +++++++++++----------------------------- > > arch/x86/kernel/cpu/mshyperv.c | 18 +++-- > > 4 files changed, 88 insertions(+), 102 deletions(-) > > > > --- a/arch/x86/hyperv/hv_init.c > > +++ b/arch/x86/hyperv/hv_init.c > > @@ -35,7 +35,28 @@ > > #include <linux/highmem.h> > > void *hv_hypercall_pg; > > + > > +#ifdef CONFIG_X86_64 > > +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > > +{ > > + u64 hv_status; > > + > > + if (!hv_hypercall_pg) > > + return U64_MAX; > > + > > + register u64 __r8 asm("r8") = param2; > > + asm volatile (CALL_NOSPEC > > + : "=a" (hv_status), ASM_CALL_CONSTRAINT, > > + "+c" (control), "+d" (param1) > > + : "r" (__r8), > > r8 is call-clobbered register, so you should use "+r" (__r8) to properly > clobber it: Ah, okay. ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra 2025-04-14 11:47 ` Peter Zijlstra 2025-04-14 14:06 ` Uros Bizjak @ 2025-04-21 18:27 ` Michael Kelley 2025-04-25 13:50 ` Peter Zijlstra 2025-04-29 15:18 ` Peter Zijlstra 2 siblings, 2 replies; 32+ messages in thread From: Michael Kelley @ 2025-04-21 18:27 UTC (permalink / raw) To: Peter Zijlstra, x86@kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org From: Peter Zijlstra <peterz@infradead.org> Sent: Monday, April 14, 2025 4:12 AM > > What used to be a simple few instructions has turned into a giant mess > (for x86_64). Not only does it use static_branch wrong, it mixes it > with dynamic branches for no apparent reason. > > Notably it uses static_branch through an out-of-line function call, > which completely defeats the purpose, since instead of a simple > JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is > absolutely idiotic. > > Add to that a dynamic test of hyperv_paravisor_present, something > which is set once and never changed. > > Replace all this idiocy with a single direct function call to the > right hypercall variant. This did indeed need cleaning after all the CoCo VM and paravisor stuff got added. Thanks for doing it. From looking at the code changes, I believe the 32-bit hypercall paths are unchanged, as they weren't affected the CoCo VM and paravisor additions. Perhaps explicitly state that intent in the commit message. I've tested this patch set against linux-next-20250411 on normal Hyper-V guests. Basic smoke tests pass, along with taking a panic, and suspend/resume for guest hibernation. But getting into kdump after a panic does not work. See comments in Patch 5 for the likely reason why. I've also tested SNP and TDX VMs with a paravisor, and basic smoke tests pass. But I'm testing in the Azure cloud, and I don't have access to an environment where I can test without a paravisor. So my testing doesn't cover the SNP and TDX specific static call paths. Maybe someone at Microsoft can test that configuration. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/hyperv/hv_init.c | 21 ++++++ > arch/x86/hyperv/ivm.c | 14 ++++ > arch/x86/include/asm/mshyperv.h | 137 +++++++++++----------------------------- > arch/x86/kernel/cpu/mshyperv.c | 18 +++-- > 4 files changed, 88 insertions(+), 102 deletions(-) > > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -35,7 +35,28 @@ > #include <linux/highmem.h> > > void *hv_hypercall_pg; > + > +#ifdef CONFIG_X86_64 > +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) Could this get a different name so we don't have the confusion of hv_hypercall_pg vs hv_pg_hypercall? Some possibilities: hv_std_hypercall hv_basic_hypercall hv_core_hypercall hv_normal_hypercall hv_simple_hypercall > +{ > + u64 hv_status; > + > + if (!hv_hypercall_pg) > + return U64_MAX; > + > + register u64 __r8 asm("r8") = param2; > + asm volatile (CALL_NOSPEC > + : "=a" (hv_status), ASM_CALL_CONSTRAINT, > + "+c" (control), "+d" (param1) > + : "r" (__r8), > + THUNK_TARGET(hv_hypercall_pg) > + : "cc", "memory", "r9", "r10", "r11"); > + > + return hv_status; > +} > +#else > EXPORT_SYMBOL_GPL(hv_hypercall_pg); > +#endif > > union hv_ghcb * __percpu *hv_ghcb_pg; > > --- a/arch/x86/hyperv/ivm.c > +++ b/arch/x86/hyperv/ivm.c > @@ -376,6 +376,20 @@ int hv_snp_boot_ap(u32 cpu, unsigned lon > return ret; > } > > +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) > +{ > + u64 hv_status; > + > + register u64 __r8 asm("r8") = param2; > + asm volatile("vmmcall" > + : "=a" (hv_status), ASM_CALL_CONSTRAINT, > + "+c" (control), "+d" (param1) > + : "r" (__r8) > + : "cc", "memory", "r9", "r10", "r11"); > + > + return hv_status; > +} > + > #else > static inline void hv_ghcb_msr_write(u64 msr, u64 value) {} > static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {} > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -6,6 +6,7 @@ > #include <linux/nmi.h> > #include <linux/msi.h> > #include <linux/io.h> > +#include <linux/static_call.h> > #include <asm/nospec-branch.h> > #include <asm/paravirt.h> > #include <hyperv/hvhdk.h> > @@ -39,15 +40,20 @@ static inline unsigned char hv_get_nmi_r > } > > #if IS_ENABLED(CONFIG_HYPERV) > -extern bool hyperv_paravisor_present; > - > extern void *hv_hypercall_pg; > > extern union hv_ghcb * __percpu *hv_ghcb_pg; > > bool hv_isolation_type_snp(void); > bool hv_isolation_type_tdx(void); > -u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); > + > +#ifdef CONFIG_X86_64 > +extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); > +extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2); > +extern u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2); > + > +DECLARE_STATIC_CALL(hv_hypercall, hv_pg_hypercall); > +#endif > > /* > * DEFAULT INIT GPAT and SEGMENT LIMIT value in struct VMSA > @@ -64,37 +70,15 @@ static inline u64 hv_do_hypercall(u64 co > { > u64 input_address = input ? virt_to_phys(input) : 0; > u64 output_address = output ? virt_to_phys(output) : 0; > - u64 hv_status; > > #ifdef CONFIG_X86_64 > - if (hv_isolation_type_tdx() && !hyperv_paravisor_present) > - return hv_tdx_hypercall(control, input_address, output_address); > - > - if (hv_isolation_type_snp() && !hyperv_paravisor_present) { > - __asm__ __volatile__("mov %[output_address], %%r8\n" > - "vmmcall" > - : "=a" (hv_status), ASM_CALL_CONSTRAINT, > - "+c" (control), "+d" (input_address) > - : [output_address] "r" (output_address) > - : "cc", "memory", "r8", "r9", "r10", "r11"); > - return hv_status; > - } > - > - if (!hv_hypercall_pg) > - return U64_MAX; > - > - __asm__ __volatile__("mov %[output_address], %%r8\n" > - CALL_NOSPEC > - : "=a" (hv_status), ASM_CALL_CONSTRAINT, > - "+c" (control), "+d" (input_address) > - : [output_address] "r" (output_address), > - THUNK_TARGET(hv_hypercall_pg) > - : "cc", "memory", "r8", "r9", "r10", "r11"); > + return static_call_mod(hv_hypercall)(control, input_address, output_address); > #else > u32 input_address_hi = upper_32_bits(input_address); > u32 input_address_lo = lower_32_bits(input_address); > u32 output_address_hi = upper_32_bits(output_address); > u32 output_address_lo = lower_32_bits(output_address); > + u64 hv_status; > > if (!hv_hypercall_pg) > return U64_MAX; > @@ -107,8 +91,8 @@ static inline u64 hv_do_hypercall(u64 co > "D"(output_address_hi), "S"(output_address_lo), > THUNK_TARGET(hv_hypercall_pg) > : "cc", "memory"); > -#endif /* !x86_64 */ > return hv_status; > +#endif /* !x86_64 */ > } > > /* Hypercall to the L0 hypervisor */ > @@ -120,41 +104,23 @@ static inline u64 hv_do_nested_hypercall > /* Fast hypercall with 8 bytes of input and no output */ > static inline u64 _hv_do_fast_hypercall8(u64 control, u64 input1) > { > - u64 hv_status; > - > #ifdef CONFIG_X86_64 > - if (hv_isolation_type_tdx() && !hyperv_paravisor_present) > - return hv_tdx_hypercall(control, input1, 0); > - > - if (hv_isolation_type_snp() && !hyperv_paravisor_present) { > - __asm__ __volatile__( > - "vmmcall" > - : "=a" (hv_status), ASM_CALL_CONSTRAINT, > - "+c" (control), "+d" (input1) > - :: "cc", "r8", "r9", "r10", "r11"); > - } else { > - __asm__ __volatile__(CALL_NOSPEC > - : "=a" (hv_status), ASM_CALL_CONSTRAINT, > - "+c" (control), "+d" (input1) > - : THUNK_TARGET(hv_hypercall_pg) > - : "cc", "r8", "r9", "r10", "r11"); > - } > + return static_call_mod(hv_hypercall)(control, input1, 0); > #else > - { > - u32 input1_hi = upper_32_bits(input1); > - u32 input1_lo = lower_32_bits(input1); > - > - __asm__ __volatile__ (CALL_NOSPEC > - : "=A"(hv_status), > - "+c"(input1_lo), > - ASM_CALL_CONSTRAINT > - : "A" (control), > - "b" (input1_hi), > - THUNK_TARGET(hv_hypercall_pg) > - : "cc", "edi", "esi"); > - } > -#endif > + u32 input1_hi = upper_32_bits(input1); > + u32 input1_lo = lower_32_bits(input1); > + u64 hv_status; > + > + __asm__ __volatile__ (CALL_NOSPEC > + : "=A"(hv_status), > + "+c"(input1_lo), > + ASM_CALL_CONSTRAINT > + : "A" (control), > + "b" (input1_hi), > + THUNK_TARGET(hv_hypercall_pg) > + : "cc", "edi", "esi"); > return hv_status; > +#endif > } > > static inline u64 hv_do_fast_hypercall8(u16 code, u64 input1) > @@ -174,45 +140,24 @@ static inline u64 hv_do_fast_nested_hype > /* Fast hypercall with 16 bytes of input */ > static inline u64 _hv_do_fast_hypercall16(u64 control, u64 input1, u64 input2) > { > - u64 hv_status; > - > #ifdef CONFIG_X86_64 > - if (hv_isolation_type_tdx() && !hyperv_paravisor_present) > - return hv_tdx_hypercall(control, input1, input2); > - > - if (hv_isolation_type_snp() && !hyperv_paravisor_present) { > - __asm__ __volatile__("mov %[input2], %%r8\n" > - "vmmcall" > - : "=a" (hv_status), ASM_CALL_CONSTRAINT, > - "+c" (control), "+d" (input1) > - : [input2] "r" (input2) > - : "cc", "r8", "r9", "r10", "r11"); > - } else { > - __asm__ __volatile__("mov %[input2], %%r8\n" > - CALL_NOSPEC > - : "=a" (hv_status), ASM_CALL_CONSTRAINT, > - "+c" (control), "+d" (input1) > - : [input2] "r" (input2), > - THUNK_TARGET(hv_hypercall_pg) > - : "cc", "r8", "r9", "r10", "r11"); > - } > + return static_call_mod(hv_hypercall)(control, input1, input2); > #else > - { > - u32 input1_hi = upper_32_bits(input1); > - u32 input1_lo = lower_32_bits(input1); > - u32 input2_hi = upper_32_bits(input2); > - u32 input2_lo = lower_32_bits(input2); > - > - __asm__ __volatile__ (CALL_NOSPEC > - : "=A"(hv_status), > - "+c"(input1_lo), ASM_CALL_CONSTRAINT > - : "A" (control), "b" (input1_hi), > - "D"(input2_hi), "S"(input2_lo), > - THUNK_TARGET(hv_hypercall_pg) > - : "cc"); > - } > -#endif > + u32 input1_hi = upper_32_bits(input1); > + u32 input1_lo = lower_32_bits(input1); > + u32 input2_hi = upper_32_bits(input2); > + u32 input2_lo = lower_32_bits(input2); > + u64 hv_status; > + > + __asm__ __volatile__ (CALL_NOSPEC > + : "=A"(hv_status), > + "+c"(input1_lo), ASM_CALL_CONSTRAINT > + : "A" (control), "b" (input1_hi), > + "D"(input2_hi), "S"(input2_lo), > + THUNK_TARGET(hv_hypercall_pg) > + : "cc"); > return hv_status; > +#endif > } > > static inline u64 hv_do_fast_hypercall16(u16 code, u64 input1, u64 input2) > --- a/arch/x86/kernel/cpu/mshyperv.c > +++ b/arch/x86/kernel/cpu/mshyperv.c > @@ -37,10 +37,6 @@ > bool hv_nested; > struct ms_hyperv_info ms_hyperv; > > -/* Used in modules via hv_do_hypercall(): see arch/x86/include/asm/mshyperv.h */ > -bool hyperv_paravisor_present __ro_after_init; > -EXPORT_SYMBOL_GPL(hyperv_paravisor_present); > - > #if IS_ENABLED(CONFIG_HYPERV) > static inline unsigned int hv_get_nested_msr(unsigned int reg) > { > @@ -287,6 +283,11 @@ static void __init x86_setup_ops_for_tsc > old_restore_sched_clock_state = x86_platform.restore_sched_clock_state; > x86_platform.restore_sched_clock_state = hv_restore_sched_clock_state; > } > + > +#ifdef CONFIG_X86_64 > +DEFINE_STATIC_CALL(hv_hypercall, hv_pg_hypercall); > +EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall); > +#endif > #endif /* CONFIG_HYPERV */ > > static uint32_t __init ms_hyperv_platform(void) > @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo > ms_hyperv.shared_gpa_boundary = > BIT_ULL(ms_hyperv.shared_gpa_boundary_bits); > > - hyperv_paravisor_present = !!ms_hyperv.paravisor_present; > - > pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", > ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); > > > if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { > static_branch_enable(&isolation_type_snp); > +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV) > + if (!ms_hyperv.paravisor_present) > + static_call_update(hv_hypercall, hv_snp_hypercall); > +#endif This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to not do that? > } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) { > static_branch_enable(&isolation_type_tdx); > > @@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo > ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED; > > if (!ms_hyperv.paravisor_present) { > +#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV) > + static_call_update(hv_hypercall, hv_tdx_hypercall); > +#endif > /* > * Mark the Hyper-V TSC page feature as disabled > * in a TDX VM without paravisor so that the > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-21 18:27 ` Michael Kelley @ 2025-04-25 13:50 ` Peter Zijlstra 2025-04-29 15:18 ` Peter Zijlstra 1 sibling, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-25 13:50 UTC (permalink / raw) To: Michael Kelley Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org On Mon, Apr 21, 2025 at 06:27:57PM +0000, Michael Kelley wrote: > From: Peter Zijlstra <peterz@infradead.org> Sent: Monday, April 14, 2025 4:12 AM > > > > What used to be a simple few instructions has turned into a giant mess > > (for x86_64). Not only does it use static_branch wrong, it mixes it > > with dynamic branches for no apparent reason. > > > > Notably it uses static_branch through an out-of-line function call, > > which completely defeats the purpose, since instead of a simple > > JMP/NOP site, you get a CALL+RET+TEST+Jcc sequence in return, which is > > absolutely idiotic. > > > > Add to that a dynamic test of hyperv_paravisor_present, something > > which is set once and never changed. > > > > Replace all this idiocy with a single direct function call to the > > right hypercall variant. > > This did indeed need cleaning after all the CoCo VM and paravisor > stuff got added. Thanks for doing it. > > From looking at the code changes, I believe the 32-bit hypercall paths > are unchanged, as they weren't affected the CoCo VM and paravisor > additions. Perhaps explicitly state that intent in the commit message. > > I've tested this patch set against linux-next-20250411 on normal Hyper-V > guests. Basic smoke tests pass, along with taking a panic, and > suspend/resume for guest hibernation. But getting into kdump after a > panic does not work. See comments in Patch 5 for the likely reason why. > > I've also tested SNP and TDX VMs with a paravisor, and basic smoke > tests pass. But I'm testing in the Azure cloud, and I don't have access to an > environment where I can test without a paravisor. So my testing doesn't > cover the SNP and TDX specific static call paths. Maybe someone at > Microsoft can test that configuration. Excellent, thanks! > > +#ifdef CONFIG_X86_64 > > +u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > > Could this get a different name so we don't have the confusion of > hv_hypercall_pg vs hv_pg_hypercall? Some possibilities: > > hv_std_hypercall > hv_basic_hypercall > hv_core_hypercall > hv_normal_hypercall > hv_simple_hypercall Sure, I'll throw a dice an pick one ;-) > > @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo > > ms_hyperv.shared_gpa_boundary = > > BIT_ULL(ms_hyperv.shared_gpa_boundary_bits); > > > > - hyperv_paravisor_present = !!ms_hyperv.paravisor_present; > > - > > pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", > > ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); > > > > > > if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { > > static_branch_enable(&isolation_type_snp); > > +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV) > > + if (!ms_hyperv.paravisor_present) > > + static_call_update(hv_hypercall, hv_snp_hypercall); > > +#endif > > This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding > stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static > call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to > not do that? I'll try and make it so. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-21 18:27 ` Michael Kelley 2025-04-25 13:50 ` Peter Zijlstra @ 2025-04-29 15:18 ` Peter Zijlstra 2025-04-29 20:36 ` Michael Kelley 1 sibling, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-29 15:18 UTC (permalink / raw) To: Michael Kelley Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org On Mon, Apr 21, 2025 at 06:27:57PM +0000, Michael Kelley wrote: > > @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo > > ms_hyperv.shared_gpa_boundary = > > BIT_ULL(ms_hyperv.shared_gpa_boundary_bits); > > > > - hyperv_paravisor_present = !!ms_hyperv.paravisor_present; > > - > > pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", > > ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); > > > > > > if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { > > static_branch_enable(&isolation_type_snp); > > +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV) > > + if (!ms_hyperv.paravisor_present) > > + static_call_update(hv_hypercall, hv_snp_hypercall); > > +#endif > > This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding > stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static > call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to > not do that? > > > } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) { > > static_branch_enable(&isolation_type_tdx); > > > > @@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo > > ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED; > > > > if (!ms_hyperv.paravisor_present) { > > +#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV) > > + static_call_update(hv_hypercall, hv_tdx_hypercall); > > +#endif > > /* > > * Mark the Hyper-V TSC page feature as disabled > > * in a TDX VM without paravisor so that the > > > > I've ended up with the below.. I thought it a waste to make all that stuff available to 32bit and !HYPERV. --- a/arch/x86/hyperv/ivm.c +++ b/arch/x86/hyperv/ivm.c @@ -392,6 +392,7 @@ u64 hv_snp_hypercall(u64 control, u64 pa #else static inline void hv_ghcb_msr_write(u64 msr, u64 value) {} static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {} +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) {} #endif /* CONFIG_AMD_MEM_ENCRYPT */ #ifdef CONFIG_INTEL_TDX_GUEST @@ -441,6 +442,7 @@ u64 hv_tdx_hypercall(u64 control, u64 pa #else static inline void hv_tdx_msr_write(u64 msr, u64 value) {} static inline void hv_tdx_msr_read(u64 msr, u64 *value) {} +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2) {} #endif /* CONFIG_INTEL_TDX_GUEST */ #if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST) --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -39,6 +39,10 @@ static inline unsigned char hv_get_nmi_r return 0; } +extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); +extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2); +extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2); + #if IS_ENABLED(CONFIG_HYPERV) extern void *hv_hypercall_pg; @@ -48,10 +52,6 @@ bool hv_isolation_type_snp(void); bool hv_isolation_type_tdx(void); #ifdef CONFIG_X86_64 -extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); -extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2); -extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2); - DECLARE_STATIC_CALL(hv_hypercall, hv_std_hypercall); #endif --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -287,9 +287,14 @@ static void __init x86_setup_ops_for_tsc #ifdef CONFIG_X86_64 DEFINE_STATIC_CALL(hv_hypercall, hv_std_hypercall); EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall); +#define hypercall_update(hc) static_call_update(hv_hypercall, hc) #endif #endif /* CONFIG_HYPERV */ +#ifndef hypercall_update +#define hypercall_update(hc) (void)hc +#endif + static uint32_t __init ms_hyperv_platform(void) { u32 eax; @@ -490,10 +495,8 @@ static void __init ms_hyperv_init_platfo if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { static_branch_enable(&isolation_type_snp); -#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV) if (!ms_hyperv.paravisor_present) - static_call_update(hv_hypercall, hv_snp_hypercall); -#endif + hypercall_update(hv_snp_hypercall); } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) { static_branch_enable(&isolation_type_tdx); @@ -501,9 +504,7 @@ static void __init ms_hyperv_init_platfo ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED; if (!ms_hyperv.paravisor_present) { -#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV) - static_call_update(hv_hypercall, hv_tdx_hypercall); -#endif + hypercall_update(hv_tdx_hypercall); /* * Mark the Hyper-V TSC page feature as disabled * in a TDX VM without paravisor so that the ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() 2025-04-29 15:18 ` Peter Zijlstra @ 2025-04-29 20:36 ` Michael Kelley 0 siblings, 0 replies; 32+ messages in thread From: Michael Kelley @ 2025-04-29 20:36 UTC (permalink / raw) To: Peter Zijlstra Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org From: Peter Zijlstra <peterz@infradead.org> Sent: Tuesday, April 29, 2025 8:18 AM > > On Mon, Apr 21, 2025 at 06:27:57PM +0000, Michael Kelley wrote: > > > > @@ -483,14 +484,16 @@ static void __init ms_hyperv_init_platfo > > > ms_hyperv.shared_gpa_boundary = > > > BIT_ULL(ms_hyperv.shared_gpa_boundary_bits); > > > > > > - hyperv_paravisor_present = !!ms_hyperv.paravisor_present; > > > - > > > pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 0x%x\n", > > > ms_hyperv.isolation_config_a, ms_hyperv.isolation_config_b); > > > > > > > > > if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { > > > static_branch_enable(&isolation_type_snp); > > > +#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV) > > > + if (!ms_hyperv.paravisor_present) > > > + static_call_update(hv_hypercall, hv_snp_hypercall); > > > +#endif > > > > This #ifdef (and one below for TDX) are really ugly. They could be avoided by adding > > stubs for hv_snp_hypercall() and hv_tdx_hypercall(), and making the hv_hypercall static > > call exist even when !CONFIG_HYPERV (and for 32-bit builds). Or is there a reason to > > not do that? > > > > > } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) { > > > static_branch_enable(&isolation_type_tdx); > > > > > > @@ -498,6 +501,9 @@ static void __init ms_hyperv_init_platfo > > > ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED; > > > > > > if (!ms_hyperv.paravisor_present) { > > > +#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV) > > > + static_call_update(hv_hypercall, hv_tdx_hypercall); > > > +#endif > > > /* > > > * Mark the Hyper-V TSC page feature as disabled > > > * in a TDX VM without paravisor so that the > > > > > > > > I've ended up with the below.. I thought it a waste to make all that > stuff available to 32bit and !HYPERV. > > > --- a/arch/x86/hyperv/ivm.c > +++ b/arch/x86/hyperv/ivm.c > @@ -392,6 +392,7 @@ u64 hv_snp_hypercall(u64 control, u64 pa > #else > static inline void hv_ghcb_msr_write(u64 msr, u64 value) {} > static inline void hv_ghcb_msr_read(u64 msr, u64 *value) {} > +u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2) {} > #endif /* CONFIG_AMD_MEM_ENCRYPT */ > > #ifdef CONFIG_INTEL_TDX_GUEST > @@ -441,6 +442,7 @@ u64 hv_tdx_hypercall(u64 control, u64 pa > #else > static inline void hv_tdx_msr_write(u64 msr, u64 value) {} > static inline void hv_tdx_msr_read(u64 msr, u64 *value) {} > +u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2) {} > #endif /* CONFIG_INTEL_TDX_GUEST */ > > #if defined(CONFIG_AMD_MEM_ENCRYPT) || defined(CONFIG_INTEL_TDX_GUEST) > --- a/arch/x86/include/asm/mshyperv.h > +++ b/arch/x86/include/asm/mshyperv.h > @@ -39,6 +39,10 @@ static inline unsigned char hv_get_nmi_r > return 0; > } > > +extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); > +extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2); > +extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2); > + > #if IS_ENABLED(CONFIG_HYPERV) > extern void *hv_hypercall_pg; > > @@ -48,10 +52,6 @@ bool hv_isolation_type_snp(void); > bool hv_isolation_type_tdx(void); > > #ifdef CONFIG_X86_64 > -extern u64 hv_tdx_hypercall(u64 control, u64 param1, u64 param2); > -extern u64 hv_snp_hypercall(u64 control, u64 param1, u64 param2); > -extern u64 hv_std_hypercall(u64 control, u64 param1, u64 param2); > - > DECLARE_STATIC_CALL(hv_hypercall, hv_std_hypercall); > #endif > > --- a/arch/x86/kernel/cpu/mshyperv.c > +++ b/arch/x86/kernel/cpu/mshyperv.c > @@ -287,9 +287,14 @@ static void __init x86_setup_ops_for_tsc > #ifdef CONFIG_X86_64 > DEFINE_STATIC_CALL(hv_hypercall, hv_std_hypercall); > EXPORT_STATIC_CALL_TRAMP_GPL(hv_hypercall); > +#define hypercall_update(hc) static_call_update(hv_hypercall, hc) > #endif > #endif /* CONFIG_HYPERV */ > > +#ifndef hypercall_update > +#define hypercall_update(hc) (void)hc > +#endif > + > static uint32_t __init ms_hyperv_platform(void) > { > u32 eax; > @@ -490,10 +495,8 @@ static void __init ms_hyperv_init_platfo > > if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) { > static_branch_enable(&isolation_type_snp); > -#if defined(CONFIG_AMD_MEM_ENCRYPT) && defined(CONFIG_HYPERV) > if (!ms_hyperv.paravisor_present) > - static_call_update(hv_hypercall, hv_snp_hypercall); > -#endif > + hypercall_update(hv_snp_hypercall); > } else if (hv_get_isolation_type() == HV_ISOLATION_TYPE_TDX) { > static_branch_enable(&isolation_type_tdx); > > @@ -501,9 +504,7 @@ static void __init ms_hyperv_init_platfo > ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED; > > if (!ms_hyperv.paravisor_present) { > -#if defined(CONFIG_INTEL_TDX_GUEST) && defined(CONFIG_HYPERV) > - static_call_update(hv_hypercall, hv_tdx_hypercall); > -#endif > + hypercall_update(hv_tdx_hypercall); > /* > * Mark the Hyper-V TSC page feature as disabled > * in a TDX VM without paravisor so that the Yes, that's a reasonable improvement that I can live with. This source code file is certainly not a model for avoiding ugly #ifdef's, but your new approach avoids adding to the problem quite so egregiously. Michael ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page 2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra ` (3 preceding siblings ...) 2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra @ 2025-04-14 11:11 ` Peter Zijlstra 2025-04-21 18:28 ` Michael Kelley 2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra 5 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw) To: x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda Instead of using an indirect call to the hypercall page, use a direct call instead. This avoids all CFI problems, including the one where the hypercall page doesn't have IBT on. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/hyperv/hv_init.c | 62 +++++++++++++++++++++++----------------------- 1 file changed, 31 insertions(+), 31 deletions(-) --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -37,24 +37,42 @@ void *hv_hypercall_pg; #ifdef CONFIG_X86_64 +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2) +{ + return U64_MAX; +} + +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail); + u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) { u64 hv_status; - if (!hv_hypercall_pg) - return U64_MAX; - register u64 __r8 asm("r8") = param2; - asm volatile (CALL_NOSPEC + asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall) : "=a" (hv_status), ASM_CALL_CONSTRAINT, "+c" (control), "+d" (param1) - : "r" (__r8), - THUNK_TARGET(hv_hypercall_pg) + : "r" (__r8) : "cc", "memory", "r9", "r10", "r11"); return hv_status; } + +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2); + +static inline void hv_set_hypercall_pg(void *ptr) +{ + hv_hypercall_pg = ptr; + + if (!ptr) + ptr = &__hv_hyperfail; + static_call_update(__hv_hypercall, (hv_hypercall_f)ptr); +} #else +static inline void hv_set_hypercall_pg(void *ptr) +{ + hv_hypercall_pg = ptr; +} EXPORT_SYMBOL_GPL(hv_hypercall_pg); #endif @@ -349,7 +367,7 @@ static int hv_suspend(void) * pointer is restored on resume. */ hv_hypercall_pg_saved = hv_hypercall_pg; - hv_hypercall_pg = NULL; + hv_set_hypercall_pg(NULL); /* Disable the hypercall page in the hypervisor */ rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); @@ -375,7 +393,7 @@ static void hv_resume(void) vmalloc_to_pfn(hv_hypercall_pg_saved); wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); - hv_hypercall_pg = hv_hypercall_pg_saved; + hv_set_hypercall_pg(hv_hypercall_pg_saved); hv_hypercall_pg_saved = NULL; /* @@ -529,8 +547,8 @@ void __init hyperv_init(void) if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present) goto skip_hypercall_pg_init; - hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, - VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX, + hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR, + MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX, VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, __builtin_return_address(0)); if (hv_hypercall_pg == NULL) @@ -568,27 +586,9 @@ void __init hyperv_init(void) wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); } -skip_hypercall_pg_init: - /* - * Some versions of Hyper-V that provide IBT in guest VMs have a bug - * in that there's no ENDBR64 instruction at the entry to the - * hypercall page. Because hypercalls are invoked via an indirect call - * to the hypercall page, all hypercall attempts fail when IBT is - * enabled, and Linux panics. For such buggy versions, disable IBT. - * - * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall - * page, so if future Linux kernel versions enable IBT for 32-bit - * builds, additional hypercall page hackery will be required here - * to provide an ENDBR32. - */ -#ifdef CONFIG_X86_KERNEL_IBT - if (cpu_feature_enabled(X86_FEATURE_IBT) && - *(u32 *)hv_hypercall_pg != gen_endbr()) { - setup_clear_cpu_cap(X86_FEATURE_IBT); - pr_warn("Disabling IBT because of Hyper-V bug\n"); - } -#endif + hv_set_hypercall_pg(hv_hypercall_pg); +skip_hypercall_pg_init: /* * hyperv_init() is called before LAPIC is initialized: see * apic_intr_mode_init() -> x86_platform.apic_post_init() and @@ -658,7 +658,7 @@ void hyperv_cleanup(void) * let hypercall operations fail safely rather than * panic the kernel for using invalid hypercall page */ - hv_hypercall_pg = NULL; + hv_set_hypercall_pg(NULL); /* Reset the hypercall page */ hypercall_msr.as_uint64 = hv_get_msr(HV_X64_MSR_HYPERCALL); ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page 2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra @ 2025-04-21 18:28 ` Michael Kelley 2025-04-25 14:03 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Michael Kelley @ 2025-04-21 18:28 UTC (permalink / raw) To: Peter Zijlstra, x86@kernel.org Cc: kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org From: Peter Zijlstra <peterz@infradead.org> Sent: Monday, April 14, 2025 4:12 AM > > Instead of using an indirect call to the hypercall page, use a direct > call instead. This avoids all CFI problems, including the one where > the hypercall page doesn't have IBT on. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/x86/hyperv/hv_init.c | 62 +++++++++++++++++++++++----------------------- > 1 file changed, 31 insertions(+), 31 deletions(-) > > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -37,24 +37,42 @@ > void *hv_hypercall_pg; > > #ifdef CONFIG_X86_64 > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2) > +{ > + return U64_MAX; > +} > + > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail); > + > u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > { > u64 hv_status; > > - if (!hv_hypercall_pg) > - return U64_MAX; > - > register u64 __r8 asm("r8") = param2; > - asm volatile (CALL_NOSPEC > + asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall) > : "=a" (hv_status), ASM_CALL_CONSTRAINT, > "+c" (control), "+d" (param1) > - : "r" (__r8), > - THUNK_TARGET(hv_hypercall_pg) > + : "r" (__r8) > : "cc", "memory", "r9", "r10", "r11"); > > return hv_status; > } > + > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2); > + > +static inline void hv_set_hypercall_pg(void *ptr) > +{ > + hv_hypercall_pg = ptr; > + > + if (!ptr) > + ptr = &__hv_hyperfail; > + static_call_update(__hv_hypercall, (hv_hypercall_f)ptr); > +} > #else > +static inline void hv_set_hypercall_pg(void *ptr) > +{ > + hv_hypercall_pg = ptr; > +} > EXPORT_SYMBOL_GPL(hv_hypercall_pg); > #endif > > @@ -349,7 +367,7 @@ static int hv_suspend(void) > * pointer is restored on resume. > */ > hv_hypercall_pg_saved = hv_hypercall_pg; > - hv_hypercall_pg = NULL; > + hv_set_hypercall_pg(NULL); > > /* Disable the hypercall page in the hypervisor */ > rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > @@ -375,7 +393,7 @@ static void hv_resume(void) > vmalloc_to_pfn(hv_hypercall_pg_saved); > wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > > - hv_hypercall_pg = hv_hypercall_pg_saved; > + hv_set_hypercall_pg(hv_hypercall_pg_saved); > hv_hypercall_pg_saved = NULL; > > /* > @@ -529,8 +547,8 @@ void __init hyperv_init(void) > if (hv_isolation_type_tdx() && !ms_hyperv.paravisor_present) > goto skip_hypercall_pg_init; > > - hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, VMALLOC_START, > - VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX, > + hv_hypercall_pg = __vmalloc_node_range(PAGE_SIZE, 1, MODULES_VADDR, > + MODULES_END, GFP_KERNEL, PAGE_KERNEL_ROX, > VM_FLUSH_RESET_PERMS, NUMA_NO_NODE, > __builtin_return_address(0)); > if (hv_hypercall_pg == NULL) > @@ -568,27 +586,9 @@ void __init hyperv_init(void) > wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64); > } > > -skip_hypercall_pg_init: > - /* > - * Some versions of Hyper-V that provide IBT in guest VMs have a bug > - * in that there's no ENDBR64 instruction at the entry to the > - * hypercall page. Because hypercalls are invoked via an indirect call > - * to the hypercall page, all hypercall attempts fail when IBT is > - * enabled, and Linux panics. For such buggy versions, disable IBT. > - * > - * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall > - * page, so if future Linux kernel versions enable IBT for 32-bit > - * builds, additional hypercall page hackery will be required here > - * to provide an ENDBR32. > - */ > -#ifdef CONFIG_X86_KERNEL_IBT > - if (cpu_feature_enabled(X86_FEATURE_IBT) && > - *(u32 *)hv_hypercall_pg != gen_endbr()) { > - setup_clear_cpu_cap(X86_FEATURE_IBT); > - pr_warn("Disabling IBT because of Hyper-V bug\n"); > - } > -#endif With this patch set, it's nice to see IBT working in a Hyper-V guest! I had previously tested IBT with some hackery to the hypercall page to add the missing ENDBR64, and didn't see any problems. Same after these changes -- no complaints from IBT. > + hv_set_hypercall_pg(hv_hypercall_pg); > > +skip_hypercall_pg_init: > /* > * hyperv_init() is called before LAPIC is initialized: see > * apic_intr_mode_init() -> x86_platform.apic_post_init() and > @@ -658,7 +658,7 @@ void hyperv_cleanup(void) > * let hypercall operations fail safely rather than > * panic the kernel for using invalid hypercall page > */ > - hv_hypercall_pg = NULL; > + hv_set_hypercall_pg(NULL); This causes a hang getting into the kdump kernel after a panic. hyperv_cleanup() is called after native_machine_crash_shutdown() has done crash_smp_send_stop() on all the other CPUs. I don't know the details of how static_call_update() works, but it's easy to imagine that it wouldn't work when the kernel is in such a state. The original code setting hv_hypercall_pg to NULL is just tidiness. Other CPUs are stopped and can't be making hypercalls, and this CPU shouldn't be making hypercalls either, so setting it to NULL more cleanly catches some erroneous hypercall (vs. accessing the hypercall page after Hyper-V has been told to reset it). If I remove this change and let hv_hypercall_pg be set to NULL without doing static_call_update(), then we correctly get to the kdump kernel and everything works. Michael > > /* Reset the hypercall page */ > hypercall_msr.as_uint64 = hv_get_msr(HV_X64_MSR_HYPERCALL); > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page 2025-04-21 18:28 ` Michael Kelley @ 2025-04-25 14:03 ` Peter Zijlstra 2025-04-25 14:32 ` Michael Kelley 0 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-25 14:03 UTC (permalink / raw) To: Michael Kelley Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org On Mon, Apr 21, 2025 at 06:28:42PM +0000, Michael Kelley wrote: > > #ifdef CONFIG_X86_64 > > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2) > > +{ > > + return U64_MAX; > > +} > > + > > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail); > > + > > u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > > { > > u64 hv_status; > > > > + asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall) > > : "=a" (hv_status), ASM_CALL_CONSTRAINT, > > "+c" (control), "+d" (param1) > > + : "r" (__r8) > > : "cc", "memory", "r9", "r10", "r11"); > > > > return hv_status; > > } > > + > > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2); > > + > > +static inline void hv_set_hypercall_pg(void *ptr) > > +{ > > + hv_hypercall_pg = ptr; > > + > > + if (!ptr) > > + ptr = &__hv_hyperfail; > > + static_call_update(__hv_hypercall, (hv_hypercall_f)ptr); > > +} ^ kept for reference, as I try and explain how static_call() works below. > > -skip_hypercall_pg_init: > > - /* > > - * Some versions of Hyper-V that provide IBT in guest VMs have a bug > > - * in that there's no ENDBR64 instruction at the entry to the > > - * hypercall page. Because hypercalls are invoked via an indirect call > > - * to the hypercall page, all hypercall attempts fail when IBT is > > - * enabled, and Linux panics. For such buggy versions, disable IBT. > > - * > > - * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall > > - * page, so if future Linux kernel versions enable IBT for 32-bit > > - * builds, additional hypercall page hackery will be required here > > - * to provide an ENDBR32. > > - */ > > -#ifdef CONFIG_X86_KERNEL_IBT > > - if (cpu_feature_enabled(X86_FEATURE_IBT) && > > - *(u32 *)hv_hypercall_pg != gen_endbr()) { > > - setup_clear_cpu_cap(X86_FEATURE_IBT); > > - pr_warn("Disabling IBT because of Hyper-V bug\n"); > > - } > > -#endif > > With this patch set, it's nice to see IBT working in a Hyper-V guest! > I had previously tested IBT with some hackery to the hypercall page > to add the missing ENDBR64, and didn't see any problems. Same > after these changes -- no complaints from IBT. No indirect calls left, no IBT complaints ;-) > > + hv_set_hypercall_pg(hv_hypercall_pg); > > > > +skip_hypercall_pg_init: > > /* > > * hyperv_init() is called before LAPIC is initialized: see > > * apic_intr_mode_init() -> x86_platform.apic_post_init() and > > @@ -658,7 +658,7 @@ void hyperv_cleanup(void) > > * let hypercall operations fail safely rather than > > * panic the kernel for using invalid hypercall page > > */ > > - hv_hypercall_pg = NULL; > > + hv_set_hypercall_pg(NULL); > > This causes a hang getting into the kdump kernel after a panic. > hyperv_cleanup() is called after native_machine_crash_shutdown() > has done crash_smp_send_stop() on all the other CPUs. I don't know > the details of how static_call_update() works, Right, so let me try and explain this :-) So we get the compiler to emit direct calls (CALL/JMP) to symbols prefixed with "__SCT__", in this case from asm, but more usually by means of the static_call() macro mess. Meanwhile DEFINE_STATIC_CALL() ensures such a symbol actually exists. This symbol is a little trampoline that redirects to the actual target function given to DEFINE_STATIC_CALL() -- __hv_hyperfail() in the above case. Then objtool runs through the resulting object file and stores the location of every call to these __STC__ prefixed symbols in a custom section. This enables static_call init (boot time) to go through the section and rewrite all the trampoline calls to direct calls to the target. Subsequent static_call_update() calls will again rewrite the direct call to point elsewhere. So very much how static_branch() does a NOP/JMP rewrite to toggle branches, static_call() rewrites (direct) call targets. > but it's easy to imagine that > it wouldn't work when the kernel is in such a state. > > The original code setting hv_hypercall_pg to NULL is just tidiness. > Other CPUs are stopped and can't be making hypercalls, and this CPU > shouldn't be making hypercalls either, so setting it to NULL more > cleanly catches some erroneous hypercall (vs. accessing the hypercall > page after Hyper-V has been told to reset it). So if you look at (retained above) hv_set_hypercall_pg(), when given NULL, the call target is set to __hv_hyperfail(), which does an unconditional U64_MAX return. Combined with the fact that the thing *should* not be doing hypercalls anymore at this point, something is iffy. I can easily remove it, but it *should* be equivalent to before, where it dynamicall checked for hv_hypercall_pg being NULL. ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page 2025-04-25 14:03 ` Peter Zijlstra @ 2025-04-25 14:32 ` Michael Kelley 2025-04-27 3:58 ` Michael Kelley 0 siblings, 1 reply; 32+ messages in thread From: Michael Kelley @ 2025-04-25 14:32 UTC (permalink / raw) To: Peter Zijlstra Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, April 25, 2025 7:04 AM > > On Mon, Apr 21, 2025 at 06:28:42PM +0000, Michael Kelley wrote: > > > > #ifdef CONFIG_X86_64 > > > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2) > > > +{ > > > + return U64_MAX; > > > +} > > > + > > > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail); > > > + > > > u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > > > { > > > u64 hv_status; > > > > > > + asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall) > > > : "=a" (hv_status), ASM_CALL_CONSTRAINT, > > > "+c" (control), "+d" (param1) > > > + : "r" (__r8) > > > : "cc", "memory", "r9", "r10", "r11"); > > > > > > return hv_status; > > > } > > > + > > > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2); > > > + > > > +static inline void hv_set_hypercall_pg(void *ptr) > > > +{ > > > + hv_hypercall_pg = ptr; > > > + > > > + if (!ptr) > > > + ptr = &__hv_hyperfail; > > > + static_call_update(__hv_hypercall, (hv_hypercall_f)ptr); > > > +} > > ^ kept for reference, as I try and explain how static_call() works > below. > > > > -skip_hypercall_pg_init: > > > - /* > > > - * Some versions of Hyper-V that provide IBT in guest VMs have a bug > > > - * in that there's no ENDBR64 instruction at the entry to the > > > - * hypercall page. Because hypercalls are invoked via an indirect call > > > - * to the hypercall page, all hypercall attempts fail when IBT is > > > - * enabled, and Linux panics. For such buggy versions, disable IBT. > > > - * > > > - * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall > > > - * page, so if future Linux kernel versions enable IBT for 32-bit > > > - * builds, additional hypercall page hackery will be required here > > > - * to provide an ENDBR32. > > > - */ > > > -#ifdef CONFIG_X86_KERNEL_IBT > > > - if (cpu_feature_enabled(X86_FEATURE_IBT) && > > > - *(u32 *)hv_hypercall_pg != gen_endbr()) { > > > - setup_clear_cpu_cap(X86_FEATURE_IBT); > > > - pr_warn("Disabling IBT because of Hyper-V bug\n"); > > > - } > > > -#endif > > > > With this patch set, it's nice to see IBT working in a Hyper-V guest! > > I had previously tested IBT with some hackery to the hypercall page > > to add the missing ENDBR64, and didn't see any problems. Same > > after these changes -- no complaints from IBT. > > No indirect calls left, no IBT complaints ;-) > > > > + hv_set_hypercall_pg(hv_hypercall_pg); > > > > > > +skip_hypercall_pg_init: > > > /* > > > * hyperv_init() is called before LAPIC is initialized: see > > > * apic_intr_mode_init() -> x86_platform.apic_post_init() and > > > @@ -658,7 +658,7 @@ void hyperv_cleanup(void) > > > * let hypercall operations fail safely rather than > > > * panic the kernel for using invalid hypercall page > > > */ > > > - hv_hypercall_pg = NULL; > > > + hv_set_hypercall_pg(NULL); > > > > This causes a hang getting into the kdump kernel after a panic. > > hyperv_cleanup() is called after native_machine_crash_shutdown() > > has done crash_smp_send_stop() on all the other CPUs. I don't know > > the details of how static_call_update() works, > > Right, so let me try and explain this :-) > > So we get the compiler to emit direct calls (CALL/JMP) to symbols > prefixed with "__SCT__", in this case from asm, but more usually by > means of the static_call() macro mess. > > Meanwhile DEFINE_STATIC_CALL() ensures such a symbol actually exists. > This symbol is a little trampoline that redirects to the actual > target function given to DEFINE_STATIC_CALL() -- __hv_hyperfail() in the > above case. > > Then objtool runs through the resulting object file and stores the > location of every call to these __STC__ prefixed symbols in a custom > section. > > This enables static_call init (boot time) to go through the section and > rewrite all the trampoline calls to direct calls to the target. > Subsequent static_call_update() calls will again rewrite the direct call > to point elsewhere. > > So very much how static_branch() does a NOP/JMP rewrite to toggle > branches, static_call() rewrites (direct) call targets. > > > but it's easy to imagine that > > it wouldn't work when the kernel is in such a state. > > > > The original code setting hv_hypercall_pg to NULL is just tidiness. > > Other CPUs are stopped and can't be making hypercalls, and this CPU > > shouldn't be making hypercalls either, so setting it to NULL more > > cleanly catches some erroneous hypercall (vs. accessing the hypercall > > page after Hyper-V has been told to reset it). > > So if you look at (retained above) hv_set_hypercall_pg(), when given > NULL, the call target is set to __hv_hyperfail(), which does an > unconditional U64_MAX return. > > Combined with the fact that the thing *should* not be doing hypercalls > anymore at this point, something is iffy. > > I can easily remove it, but it *should* be equivalent to before, where > it dynamicall checked for hv_hypercall_pg being NULL. I agree that setting the call target to __hv_hyperfail() should be good. But my theory is that static_call_update() is hanging when trying to do the rewrite, because of the state of the other CPUs. I don't think control is ever returning from static_call_update() when invoked through hyperv_cleanup(). Wouldn't static_call_update() need to park the other CPUs temporarily and/or flush instruction caches to make everything consistent? But that's just my theory. I'll run a few more experiments to confirm if control ever returns from static_call_update() in this case. Michael ^ permalink raw reply [flat|nested] 32+ messages in thread
* RE: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page 2025-04-25 14:32 ` Michael Kelley @ 2025-04-27 3:58 ` Michael Kelley 2025-04-29 15:19 ` Peter Zijlstra 0 siblings, 1 reply; 32+ messages in thread From: Michael Kelley @ 2025-04-27 3:58 UTC (permalink / raw) To: Michael Kelley, Peter Zijlstra Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org From: Michael Kelley <mhklinux@outlook.com> Sent: Friday, April 25, 2025 7:32 AM > > From: Peter Zijlstra <peterz@infradead.org> Sent: Friday, April 25, 2025 7:04 AM > > > > On Mon, Apr 21, 2025 at 06:28:42PM +0000, Michael Kelley wrote: > > > > > > #ifdef CONFIG_X86_64 > > > > +static u64 __hv_hyperfail(u64 control, u64 param1, u64 param2) > > > > +{ > > > > + return U64_MAX; > > > > +} > > > > + > > > > +DEFINE_STATIC_CALL(__hv_hypercall, __hv_hyperfail); > > > > + > > > > u64 hv_pg_hypercall(u64 control, u64 param1, u64 param2) > > > > { > > > > u64 hv_status; > > > > > > > > + asm volatile ("call " STATIC_CALL_TRAMP_STR(__hv_hypercall) > > > > : "=a" (hv_status), ASM_CALL_CONSTRAINT, > > > > "+c" (control), "+d" (param1) > > > > + : "r" (__r8) > > > > : "cc", "memory", "r9", "r10", "r11"); > > > > > > > > return hv_status; > > > > } > > > > + > > > > +typedef u64 (*hv_hypercall_f)(u64 control, u64 param1, u64 param2); > > > > + > > > > +static inline void hv_set_hypercall_pg(void *ptr) > > > > +{ > > > > + hv_hypercall_pg = ptr; > > > > + > > > > + if (!ptr) > > > > + ptr = &__hv_hyperfail; > > > > + static_call_update(__hv_hypercall, (hv_hypercall_f)ptr); > > > > +} > > > > ^ kept for reference, as I try and explain how static_call() works > > below. > > > > > > -skip_hypercall_pg_init: > > > > - /* > > > > - * Some versions of Hyper-V that provide IBT in guest VMs have a bug > > > > - * in that there's no ENDBR64 instruction at the entry to the > > > > - * hypercall page. Because hypercalls are invoked via an indirect call > > > > - * to the hypercall page, all hypercall attempts fail when IBT is > > > > - * enabled, and Linux panics. For such buggy versions, disable IBT. > > > > - * > > > > - * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall > > > > - * page, so if future Linux kernel versions enable IBT for 32-bit > > > > - * builds, additional hypercall page hackery will be required here > > > > - * to provide an ENDBR32. > > > > - */ > > > > -#ifdef CONFIG_X86_KERNEL_IBT > > > > - if (cpu_feature_enabled(X86_FEATURE_IBT) && > > > > - *(u32 *)hv_hypercall_pg != gen_endbr()) { > > > > - setup_clear_cpu_cap(X86_FEATURE_IBT); > > > > - pr_warn("Disabling IBT because of Hyper-V bug\n"); > > > > - } > > > > -#endif > > > > > > With this patch set, it's nice to see IBT working in a Hyper-V guest! > > > I had previously tested IBT with some hackery to the hypercall page > > > to add the missing ENDBR64, and didn't see any problems. Same > > > after these changes -- no complaints from IBT. > > > > No indirect calls left, no IBT complaints ;-) > > > > > > + hv_set_hypercall_pg(hv_hypercall_pg); > > > > > > > > +skip_hypercall_pg_init: > > > > /* > > > > * hyperv_init() is called before LAPIC is initialized: see > > > > * apic_intr_mode_init() -> x86_platform.apic_post_init() and > > > > @@ -658,7 +658,7 @@ void hyperv_cleanup(void) > > > > * let hypercall operations fail safely rather than > > > > * panic the kernel for using invalid hypercall page > > > > */ > > > > - hv_hypercall_pg = NULL; > > > > + hv_set_hypercall_pg(NULL); > > > > > > This causes a hang getting into the kdump kernel after a panic. > > > hyperv_cleanup() is called after native_machine_crash_shutdown() > > > has done crash_smp_send_stop() on all the other CPUs. I don't know > > > the details of how static_call_update() works, > > > > Right, so let me try and explain this :-) > > > > So we get the compiler to emit direct calls (CALL/JMP) to symbols > > prefixed with "__SCT__", in this case from asm, but more usually by > > means of the static_call() macro mess. > > > > Meanwhile DEFINE_STATIC_CALL() ensures such a symbol actually exists. > > This symbol is a little trampoline that redirects to the actual > > target function given to DEFINE_STATIC_CALL() -- __hv_hyperfail() in the > > above case. > > > > Then objtool runs through the resulting object file and stores the > > location of every call to these __STC__ prefixed symbols in a custom > > section. > > > > This enables static_call init (boot time) to go through the section and > > rewrite all the trampoline calls to direct calls to the target. > > Subsequent static_call_update() calls will again rewrite the direct call > > to point elsewhere. > > > > So very much how static_branch() does a NOP/JMP rewrite to toggle > > branches, static_call() rewrites (direct) call targets. > > > > > but it's easy to imagine that > > > it wouldn't work when the kernel is in such a state. > > > > > > The original code setting hv_hypercall_pg to NULL is just tidiness. > > > Other CPUs are stopped and can't be making hypercalls, and this CPU > > > shouldn't be making hypercalls either, so setting it to NULL more > > > cleanly catches some erroneous hypercall (vs. accessing the hypercall > > > page after Hyper-V has been told to reset it). > > > > So if you look at (retained above) hv_set_hypercall_pg(), when given > > NULL, the call target is set to __hv_hyperfail(), which does an > > unconditional U64_MAX return. > > > > Combined with the fact that the thing *should* not be doing hypercalls > > anymore at this point, something is iffy. > > > > I can easily remove it, but it *should* be equivalent to before, where > > it dynamicall checked for hv_hypercall_pg being NULL. > > I agree that setting the call target to __hv_hyperfail() should be good. > But my theory is that static_call_update() is hanging when trying to > do the rewrite, because of the state of the other CPUs. I don't think > control is ever returning from static_call_update() when invoked > through hyperv_cleanup(). Wouldn't static_call_update() need to park > the other CPUs temporarily and/or flush instruction caches to make > everything consistent? > > But that's just my theory. I'll run a few more experiments to confirm > if control ever returns from static_call_update() in this case. > Indeed, control never returns from static_call_update(). Prior to hyperv_cleanup() running, crash_smp_send_stop() has been called to stop all the other CPUs, and it does not update cpu_online_mask to reflect the other CPUs being stopped. static_call_update() runs this call sequence: arch_static_call_transform() __static_call_transform() smp_text_poke_single() smp_text_poke_batch_finish() smp_text_poke_sync_each_cpu() smp_text_poke_sync_each_cpu() sends an IPI to each CPU in cpu_online_mask, and of course the other CPUs never respond, so it waits forever. Michael ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page 2025-04-27 3:58 ` Michael Kelley @ 2025-04-29 15:19 ` Peter Zijlstra 0 siblings, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-29 15:19 UTC (permalink / raw) To: Michael Kelley Cc: x86@kernel.org, kys@microsoft.com, haiyangz@microsoft.com, wei.liu@kernel.org, decui@microsoft.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, jpoimboe@kernel.org, pawan.kumar.gupta@linux.intel.com, seanjc@google.com, pbonzini@redhat.com, ardb@kernel.org, kees@kernel.org, Arnd Bergmann, gregkh@linuxfoundation.org, linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, samitolvanen@google.com, ojeda@kernel.org On Sun, Apr 27, 2025 at 03:58:54AM +0000, Michael Kelley wrote: > Indeed, control never returns from static_call_update(). Prior to > hyperv_cleanup() running, crash_smp_send_stop() has been called to > stop all the other CPUs, and it does not update cpu_online_mask to > reflect the other CPUs being stopped. > > static_call_update() runs this call sequence: > > arch_static_call_transform() > __static_call_transform() > smp_text_poke_single() > smp_text_poke_batch_finish() > smp_text_poke_sync_each_cpu() > > smp_text_poke_sync_each_cpu() sends an IPI to each CPU in > cpu_online_mask, and of course the other CPUs never respond, so > it waits forever. Fair enough. I'll just remove this final call. Thanks! ^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 6/6] objtool: Validate kCFI calls 2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra ` (4 preceding siblings ...) 2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra @ 2025-04-14 11:11 ` Peter Zijlstra 2025-04-14 23:43 ` Josh Poimboeuf 5 siblings, 1 reply; 32+ messages in thread From: Peter Zijlstra @ 2025-04-14 11:11 UTC (permalink / raw) To: x86 Cc: kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, peterz, jpoimboe, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda Validate that all indirect calls adhere to kCFI rules. Notably doing nocfi indirect call to a cfi function is broken. Apparently some Rust 'core' code violates this and explodes when ran with FineIBT. All the ANNOTATE_NOCFI sites are prime targets for attackers. - runtime EFI is especially henous because it also needs to disable IBT. Basically calling unknown code without CFI protection at runtime is a massice security issue. - Kexec image handover; if you can exploit this, you get to keep it :-) - KVM, once for the interrupt injection calling IDT gates directly. - KVM, once for the FASTOP emulation stuff. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/x86/kernel/machine_kexec_64.c | 1 arch/x86/kvm/emulate.c | 4 +++ arch/x86/kvm/vmx/vmenter.S | 4 +++ arch/x86/platform/efi/efi_stub_64.S | 4 +++ drivers/misc/lkdtm/perms.c | 2 + include/linux/objtool.h | 3 ++ include/linux/objtool_types.h | 1 tools/include/linux/objtool_types.h | 1 tools/objtool/check.c | 41 ++++++++++++++++++++++++++++++++++++ tools/objtool/include/objtool/elf.h | 1 10 files changed, 62 insertions(+) --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -442,6 +442,7 @@ void __nocfi machine_kexec(struct kimage __ftrace_enabled_restore(save_ftrace_enabled); } +ANNOTATE_NOCFI_SYM(machine_kexec); /* arch-dependent functionality related to kexec file-based syscall */ --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -5071,6 +5071,10 @@ static noinline int fastop(struct x86_em return emulate_de(ctxt); return X86EMUL_CONTINUE; } +/* + * The ASM stubs don't have CFI on. + */ +ANNOTATE_NOCFI_SYM(fastop); void init_decode_cache(struct x86_emulate_ctxt *ctxt) { --- a/arch/x86/kvm/vmx/vmenter.S +++ b/arch/x86/kvm/vmx/vmenter.S @@ -363,5 +363,9 @@ SYM_FUNC_END(vmread_error_trampoline) .section .text, "ax" SYM_FUNC_START(vmx_do_interrupt_irqoff) + /* + * Calling an IDT gate directly. + */ + ANNOTATE_NOCFI VMX_DO_EVENT_IRQOFF CALL_NOSPEC _ASM_ARG1 SYM_FUNC_END(vmx_do_interrupt_irqoff) --- a/arch/x86/platform/efi/efi_stub_64.S +++ b/arch/x86/platform/efi/efi_stub_64.S @@ -11,6 +11,10 @@ #include <asm/nospec-branch.h> SYM_FUNC_START(__efi_call) + /* + * The EFI code doesn't have any CFI :-( + */ + ANNOTATE_NOCFI pushq %rbp movq %rsp, %rbp and $~0xf, %rsp --- a/drivers/misc/lkdtm/perms.c +++ b/drivers/misc/lkdtm/perms.c @@ -9,6 +9,7 @@ #include <linux/vmalloc.h> #include <linux/mman.h> #include <linux/uaccess.h> +#include <linux/objtool.h> #include <asm/cacheflush.h> #include <asm/sections.h> @@ -86,6 +87,7 @@ static noinline __nocfi void execute_loc func(); pr_err("FAIL: func returned\n"); } +ANNOTATE_NOCFI_SYM(execute_location); static void execute_user_location(void *dst) { --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -185,6 +185,8 @@ */ #define ANNOTATE_REACHABLE(label) __ASM_ANNOTATE(label, ANNOTYPE_REACHABLE) +#define ANNOTATE_NOCFI_SYM(sym) asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOCFI)) + #else #define ANNOTATE_NOENDBR ANNOTATE type=ANNOTYPE_NOENDBR #define ANNOTATE_RETPOLINE_SAFE ANNOTATE type=ANNOTYPE_RETPOLINE_SAFE @@ -194,6 +196,7 @@ #define ANNOTATE_INTRA_FUNCTION_CALL ANNOTATE type=ANNOTYPE_INTRA_FUNCTION_CALL #define ANNOTATE_UNRET_BEGIN ANNOTATE type=ANNOTYPE_UNRET_BEGIN #define ANNOTATE_REACHABLE ANNOTATE type=ANNOTYPE_REACHABLE +#define ANNOTATE_NOCFI ANNOTATE type=ANNOTYPE_NOCFI #endif #if defined(CONFIG_NOINSTR_VALIDATION) && \ --- a/include/linux/objtool_types.h +++ b/include/linux/objtool_types.h @@ -66,5 +66,6 @@ struct unwind_hint { #define ANNOTYPE_INTRA_FUNCTION_CALL 7 #define ANNOTYPE_REACHABLE 8 #define ANNOTYPE_JUMP_TABLE 9 +#define ANNOTYPE_NOCFI 10 #endif /* _LINUX_OBJTOOL_TYPES_H */ --- a/tools/include/linux/objtool_types.h +++ b/tools/include/linux/objtool_types.h @@ -66,5 +66,6 @@ struct unwind_hint { #define ANNOTYPE_INTRA_FUNCTION_CALL 7 #define ANNOTYPE_REACHABLE 8 #define ANNOTYPE_JUMP_TABLE 9 +#define ANNOTYPE_NOCFI 10 #endif /* _LINUX_OBJTOOL_TYPES_H */ --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -2387,6 +2387,8 @@ static int __annotate_ifc(struct objtool static int __annotate_late(struct objtool_file *file, int type, struct instruction *insn) { + struct symbol *sym; + switch (type) { case ANNOTYPE_NOENDBR: /* early */ @@ -2436,6 +2438,15 @@ static int __annotate_late(struct objtoo insn->_jump_table = (void *)1; break; + case ANNOTYPE_NOCFI: + sym = insn->sym; + if (!sym) { + WARN_INSN(insn, "dodgy NOCFI annotation"); + break; + } + insn->sym->nocfi = 1; + break; + default: ERROR_INSN(insn, "Unknown annotation type: %d", type); return -1; @@ -4006,6 +4017,36 @@ static int validate_retpoline(struct obj warnings++; } + if (!opts.cfi) + return warnings; + + /* + * kCFI call sites look like: + * + * movl $(-0x12345678), %r10d + * addl -4(%r11), %r10d + * jz 1f + * ud2 + * 1: cs call __x86_indirect_thunk_r11 + * + * Verify all indirect calls are kCFI adorned by checking for the + * UD2. Notably, doing __nocfi calls to regular (cfi) functions is + * broken. + */ + list_for_each_entry(insn, &file->retpoline_call_list, call_node) { + struct symbol *sym = insn->sym; + + if (sym && sym->type == STT_FUNC && !sym->nocfi) { + struct instruction *prev = + prev_insn_same_sym(file, insn); + + if (!prev || prev->type != INSN_BUG) { + WARN_INSN(insn, "no-cfi indirect call!"); + warnings++; + } + } + } + return warnings; } --- a/tools/objtool/include/objtool/elf.h +++ b/tools/objtool/include/objtool/elf.h @@ -70,6 +70,7 @@ struct symbol { u8 local_label : 1; u8 frame_pointer : 1; u8 ignore : 1; + u8 nocfi : 1; struct list_head pv_target; struct reloc *relocs; }; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] objtool: Validate kCFI calls 2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra @ 2025-04-14 23:43 ` Josh Poimboeuf 2025-04-29 16:10 ` Peter Zijlstra 2025-04-29 16:18 ` Peter Zijlstra 0 siblings, 2 replies; 32+ messages in thread From: Josh Poimboeuf @ 2025-04-14 23:43 UTC (permalink / raw) To: Peter Zijlstra Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 14, 2025 at 01:11:46PM +0200, Peter Zijlstra wrote: > SYM_FUNC_START(__efi_call) > + /* > + * The EFI code doesn't have any CFI :-( > + */ > + ANNOTATE_NOCFI > pushq %rbp > movq %rsp, %rbp > and $~0xf, %rsp This looks like an insn annotation but is actually a func annotation. ANNOTATE_NOCFI_SYM() would be a lot clearer, for all the asm code. Though maybe it's better for ANNOTATE_NOCFI to annotate the call site itself (for asm) while ANNOTATE_NOCFI_SYM annotates the entire function (in C). So either there would be two separate annotypes or the annotation would get interpreted based on whether it's at the beginning of the function. > +++ b/include/linux/objtool.h > @@ -185,6 +185,8 @@ > */ > #define ANNOTATE_REACHABLE(label) __ASM_ANNOTATE(label, ANNOTYPE_REACHABLE) > > +#define ANNOTATE_NOCFI_SYM(sym) asm(__ASM_ANNOTATE(sym, ANNOTYPE_NOCFI)) This needs a comment like the others. > @@ -2436,6 +2438,15 @@ static int __annotate_late(struct objtoo > insn->_jump_table = (void *)1; > break; > > + case ANNOTYPE_NOCFI: > + sym = insn->sym; > + if (!sym) { > + WARN_INSN(insn, "dodgy NOCFI annotation"); > + break; Return an error? > @@ -4006,6 +4017,36 @@ static int validate_retpoline(struct obj > + /* > + * kCFI call sites look like: > + * > + * movl $(-0x12345678), %r10d > + * addl -4(%r11), %r10d > + * jz 1f > + * ud2 > + * 1: cs call __x86_indirect_thunk_r11 > + * > + * Verify all indirect calls are kCFI adorned by checking for the > + * UD2. Notably, doing __nocfi calls to regular (cfi) functions is > + * broken. This "__nocfi calls" is confusing me. IIUC, there are two completely different meanings for "nocfi": - __nocfi: disable the kcfi function entry stuff - ANNOTATE_NOCFI_SYM: Regardless of whether the function is __nocfi, allow it to have non-CFI indirect call sites. Can we call this ANNOTATE_NOCFI_SAFE or something? -- Josh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] objtool: Validate kCFI calls 2025-04-14 23:43 ` Josh Poimboeuf @ 2025-04-29 16:10 ` Peter Zijlstra 2025-04-29 16:18 ` Peter Zijlstra 1 sibling, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-29 16:10 UTC (permalink / raw) To: Josh Poimboeuf Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 14, 2025 at 04:43:26PM -0700, Josh Poimboeuf wrote: > > + * Verify all indirect calls are kCFI adorned by checking for the > > + * UD2. Notably, doing __nocfi calls to regular (cfi) functions is > > + * broken. > > This "__nocfi calls" is confusing me. IIUC, there are two completely > different meanings for "nocfi": > > - __nocfi: disable the kcfi function entry stuff Ah, no. __nocfi is a bit of a mess, this is both the function entry thing, but also very much the caller verification stuff for indirect calls done inside this function. This leads to lovely stuff like: void (*foo)(void); static __always_inline __nocfi void nocfi_caller(void) { foo(); } void bar(void) { nocfi_caller(); foo(); } This actually compiles and has bar() have two distinctly different indirect calls to foo, while bar itself has a __cfi preamble. Anyway, let me have a poke at the annotation. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 6/6] objtool: Validate kCFI calls 2025-04-14 23:43 ` Josh Poimboeuf 2025-04-29 16:10 ` Peter Zijlstra @ 2025-04-29 16:18 ` Peter Zijlstra 1 sibling, 0 replies; 32+ messages in thread From: Peter Zijlstra @ 2025-04-29 16:18 UTC (permalink / raw) To: Josh Poimboeuf Cc: x86, kys, haiyangz, wei.liu, decui, tglx, mingo, bp, dave.hansen, hpa, pawan.kumar.gupta, seanjc, pbonzini, ardb, kees, Arnd Bergmann, gregkh, linux-hyperv, linux-kernel, kvm, linux-efi, samitolvanen, ojeda On Mon, Apr 14, 2025 at 04:43:26PM -0700, Josh Poimboeuf wrote: > Can we call this ANNOTATE_NOCFI_SAFE or something? I'm hesitant to do so, because some of these sites really are not safe. EFI and VMX interrupt crud really are a security issue. EFI really is unfixable but not less brokene, because the EFI code is out of our control, the VMX thing might be fixable, not sure I understand KVM well enough. ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-04-29 20:36 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-14 11:11 [PATCH 0/6] objtool: Detect and warn about indirect calls in __nocfi functions Peter Zijlstra 2025-04-14 11:11 ` [PATCH 1/6] x86/nospec: JMP_NOSPEC Peter Zijlstra 2025-04-14 11:11 ` [PATCH 2/6] x86/kvm/emulate: Implement test_cc() in C Peter Zijlstra 2025-04-14 11:11 ` [PATCH 3/6] x86/kvm/emulate: Avoid RET for fastops Peter Zijlstra 2025-04-14 22:36 ` Josh Poimboeuf 2025-04-15 7:44 ` Peter Zijlstra 2025-04-15 14:39 ` Josh Poimboeuf 2025-04-16 8:38 ` Peter Zijlstra 2025-04-26 10:01 ` Peter Zijlstra 2025-04-28 17:13 ` Sean Christopherson 2025-04-29 10:09 ` Peter Zijlstra 2025-04-29 14:05 ` Sean Christopherson 2025-04-29 14:46 ` Peter Zijlstra 2025-04-29 17:16 ` Sean Christopherson 2025-04-14 11:11 ` [PATCH 4/6] x86,hyperv: Clean up hv_do_hypercall() Peter Zijlstra 2025-04-14 11:47 ` Peter Zijlstra 2025-04-14 14:06 ` Uros Bizjak 2025-04-14 14:08 ` Peter Zijlstra 2025-04-21 18:27 ` Michael Kelley 2025-04-25 13:50 ` Peter Zijlstra 2025-04-29 15:18 ` Peter Zijlstra 2025-04-29 20:36 ` Michael Kelley 2025-04-14 11:11 ` [PATCH 5/6] x86_64,hyperv: Use direct call to hypercall-page Peter Zijlstra 2025-04-21 18:28 ` Michael Kelley 2025-04-25 14:03 ` Peter Zijlstra 2025-04-25 14:32 ` Michael Kelley 2025-04-27 3:58 ` Michael Kelley 2025-04-29 15:19 ` Peter Zijlstra 2025-04-14 11:11 ` [PATCH 6/6] objtool: Validate kCFI calls Peter Zijlstra 2025-04-14 23:43 ` Josh Poimboeuf 2025-04-29 16:10 ` Peter Zijlstra 2025-04-29 16:18 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).