From: Dave Hansen <dave.hansen@intel.com>
To: "Xin Li (Intel)" <xin@zytor.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
linux-doc@vger.kernel.org
Cc: pbonzini@redhat.com, seanjc@google.com, corbet@lwn.net,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
luto@kernel.org, peterz@infradead.org, andrew.cooper3@citrix.com,
chao.gao@intel.com, hch@infradead.org
Subject: Re: [PATCH v6 04/20] x86/cea: Export an API to get per CPU exception stacks for KVM to use
Date: Wed, 27 Aug 2025 10:33:15 -0700 [thread overview]
Message-ID: <720bc7ac-7e81-4ad9-8cc5-29ac540be283@intel.com> (raw)
In-Reply-To: <20250821223630.984383-5-xin@zytor.com>
On 8/21/25 15:36, Xin Li (Intel) wrote:
> FRED introduced new fields in the host-state area of the VMCS for
> stack levels 1->3 (HOST_IA32_FRED_RSP[123]), each respectively
> corresponding to per CPU exception stacks for #DB, NMI and #DF.
> KVM must populate these each time a vCPU is loaded onto a CPU.
>
> Convert the __this_cpu_ist_{top,bottom}_va() macros into real
> functions and export __this_cpu_ist_top_va().
>
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
Nit: I wouldn't use Suggested-by unless the person basically asked for
the *entire* patch. Christoph and I were asking for specific bits of
this, but neither of us asked for this patch as a whole.
> diff --git a/arch/x86/coco/sev/sev-nmi.c b/arch/x86/coco/sev/sev-nmi.c
> index d8dfaddfb367..73e34ad7a1a9 100644
> --- a/arch/x86/coco/sev/sev-nmi.c
> +++ b/arch/x86/coco/sev/sev-nmi.c
> @@ -30,7 +30,7 @@ static __always_inline bool on_vc_stack(struct pt_regs *regs)
> if (ip_within_syscall_gap(regs))
> return false;
>
> - return ((sp >= __this_cpu_ist_bottom_va(VC)) && (sp < __this_cpu_ist_top_va(VC)));
> + return ((sp >= __this_cpu_ist_bottom_va(ESTACK_VC)) && (sp < __this_cpu_ist_top_va(ESTACK_VC)));
> }
This rename is one of those things that had me scratching my head for a
minute. It wasn't obvious at _all_ why the VC=>ESTACK_VC "rename" is
necessary.
This needs to have been mentioned in the changelog.
Better yet would have been to do this in a separate patch because a big
chunk of this patch is just rename noise.
> /*
> @@ -82,7 +82,7 @@ void noinstr __sev_es_ist_exit(void)
> /* Read IST entry */
> ist = __this_cpu_read(cpu_tss_rw.x86_tss.ist[IST_INDEX_VC]);
>
> - if (WARN_ON(ist == __this_cpu_ist_top_va(VC)))
> + if (WARN_ON(ist == __this_cpu_ist_top_va(ESTACK_VC)))
> return;
>
> /* Read back old IST entry and write it to the TSS */
> diff --git a/arch/x86/coco/sev/vc-handle.c b/arch/x86/coco/sev/vc-handle.c
> index c3b4acbde0d8..88b6bc518a5a 100644
> --- a/arch/x86/coco/sev/vc-handle.c
> +++ b/arch/x86/coco/sev/vc-handle.c
> @@ -859,7 +859,7 @@ static enum es_result vc_handle_exitcode(struct es_em_ctxt *ctxt,
>
> static __always_inline bool is_vc2_stack(unsigned long sp)
> {
> - return (sp >= __this_cpu_ist_bottom_va(VC2) && sp < __this_cpu_ist_top_va(VC2));
> + return (sp >= __this_cpu_ist_bottom_va(ESTACK_VC2) && sp < __this_cpu_ist_top_va(ESTACK_VC2));
> }
>
> static __always_inline bool vc_from_invalid_context(struct pt_regs *regs)
> diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h
> index 462fc34f1317..8e17f0ca74e6 100644
> --- a/arch/x86/include/asm/cpu_entry_area.h
> +++ b/arch/x86/include/asm/cpu_entry_area.h
> @@ -46,7 +46,7 @@ struct cea_exception_stacks {
> * The exception stack ordering in [cea_]exception_stacks
> */
> enum exception_stack_ordering {
> - ESTACK_DF,
> + ESTACK_DF = 0,
> ESTACK_NMI,
> ESTACK_DB,
> ESTACK_MCE,
Is this really required? I thought the first enum was always 0? Is this
just trying to ensure that ESTACKS_MEMBERS() defines a matching number
of N_EXCEPTION_STACKS stacks?
If that's the case, shouldn't this be represented with a BUILD_BUG_ON()?
> @@ -58,18 +58,15 @@ enum exception_stack_ordering {
> #define CEA_ESTACK_SIZE(st) \
> sizeof(((struct cea_exception_stacks *)0)->st## _stack)
>
> -#define CEA_ESTACK_BOT(ceastp, st) \
> - ((unsigned long)&(ceastp)->st## _stack)
> -
> -#define CEA_ESTACK_TOP(ceastp, st) \
> - (CEA_ESTACK_BOT(ceastp, st) + CEA_ESTACK_SIZE(st))
> -
> #define CEA_ESTACK_OFFS(st) \
> offsetof(struct cea_exception_stacks, st## _stack)
>
> #define CEA_ESTACK_PAGES \
> (sizeof(struct cea_exception_stacks) / PAGE_SIZE)
>
> +extern unsigned long __this_cpu_ist_top_va(enum exception_stack_ordering stack);
> +extern unsigned long __this_cpu_ist_bottom_va(enum exception_stack_ordering stack);
> +
> #endif
>
> #ifdef CONFIG_X86_32
> @@ -144,10 +141,4 @@ static __always_inline struct entry_stack *cpu_entry_stack(int cpu)
> return &get_cpu_entry_area(cpu)->entry_stack_page.stack;
> }
>
> -#define __this_cpu_ist_top_va(name) \
> - CEA_ESTACK_TOP(__this_cpu_read(cea_exception_stacks), name)
> -
> -#define __this_cpu_ist_bottom_va(name) \
> - CEA_ESTACK_BOT(__this_cpu_read(cea_exception_stacks), name)
> -
> #endif
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 34a054181c4d..cb14919f92da 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -2307,12 +2307,12 @@ static inline void setup_getcpu(int cpu)
> static inline void tss_setup_ist(struct tss_struct *tss)
> {
> /* Set up the per-CPU TSS IST stacks */
> - tss->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(DF);
> - tss->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(NMI);
> - tss->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(DB);
> - tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(MCE);
> + tss->x86_tss.ist[IST_INDEX_DF] = __this_cpu_ist_top_va(ESTACK_DF);
> + tss->x86_tss.ist[IST_INDEX_NMI] = __this_cpu_ist_top_va(ESTACK_NMI);
> + tss->x86_tss.ist[IST_INDEX_DB] = __this_cpu_ist_top_va(ESTACK_DB);
> + tss->x86_tss.ist[IST_INDEX_MCE] = __this_cpu_ist_top_va(ESTACK_MCE);
If you respin this, please vertically align these.
> +/*
> + * FRED introduced new fields in the host-state area of the VMCS for
> + * stack levels 1->3 (HOST_IA32_FRED_RSP[123]), each respectively
> + * corresponding to per CPU stacks for #DB, NMI and #DF. KVM must
> + * populate these each time a vCPU is loaded onto a CPU.
> + *
> + * Called from entry code, so must be noinstr.
> + */
> +noinstr unsigned long __this_cpu_ist_top_va(enum exception_stack_ordering stack)
> +{
> + unsigned long base = (unsigned long)&(__this_cpu_read(cea_exception_stacks)->DF_stack);
> + return base + EXCEPTION_STKSZ + stack * (EXCEPTION_STKSZ + PAGE_SIZE);
> +}
> +EXPORT_SYMBOL(__this_cpu_ist_top_va);
> +
> +noinstr unsigned long __this_cpu_ist_bottom_va(enum exception_stack_ordering stack)
> +{
> + unsigned long base = (unsigned long)&(__this_cpu_read(cea_exception_stacks)->DF_stack);
> + return base + stack * (EXCEPTION_STKSZ + PAGE_SIZE);
> +}
These are basically treating 'struct exception_stacks' like an array.
There's no type safety or anything here. It's just an open-coded array
access.
Also, starting with ->DF_stack is a bit goofy looking. It's not obvious
(or enforced) that it is stack #0 or at the beginning of the structure.
Shouldn't we be _trying_ to make this look like:
struct cea_exception_stacks *s;
s = __this_cpu_read(cea_exception_stacks);
return &s[stack_nr].stack;
?
Where 'cea_exception_stacks' is an actual array:
struct cea_exception_stacks[N_EXCEPTION_STACKS];
which might need to be embedded in a larger structure to get the
'IST_top_guard' without wasting allocating space for an extra full stack.
> static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, _cea_offset);
>
> static __always_inline unsigned int cea_offset(unsigned int cpu)
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index 998bd807fc7b..1804eb86cc14 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -671,7 +671,7 @@ page_fault_oops(struct pt_regs *regs, unsigned long error_code,
> * and then double-fault, though, because we're likely to
> * break the console driver and lose most of the stack dump.
> */
> - call_on_stack(__this_cpu_ist_top_va(DF) - sizeof(void*),
> + call_on_stack(__this_cpu_ist_top_va(ESTACK_DF) - sizeof(void*),
> handle_stack_overflow,
> ASM_CALL_ARG3,
> , [arg1] "r" (regs), [arg2] "r" (address), [arg3] "r" (&info));
next prev parent reply other threads:[~2025-08-27 17:33 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-21 22:36 [PATCH v6 00/20] Enable FRED with KVM VMX Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 01/20] KVM: VMX: Add support for the secondary VM exit controls Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 02/20] KVM: VMX: Initialize VM entry/exit FRED controls in vmcs_config Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 03/20] KVM: VMX: Disable FRED if FRED consistency checks fail Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 04/20] x86/cea: Export an API to get per CPU exception stacks for KVM to use Xin Li (Intel)
2025-08-27 17:33 ` Dave Hansen [this message]
2025-08-27 22:18 ` Xin Li
2025-08-21 22:36 ` [PATCH v6 05/20] KVM: VMX: Initialize VMCS FRED fields Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 06/20] KVM: VMX: Set FRED MSR intercepts Xin Li (Intel)
2025-08-25 2:51 ` Xin Li
2025-08-26 18:11 ` Sean Christopherson
2025-08-26 21:59 ` Xin Li
2025-08-26 22:17 ` Sean Christopherson
2025-08-27 22:24 ` Xin Li
2025-08-27 22:43 ` Xin Li
2025-08-28 23:32 ` Sean Christopherson
2025-08-26 18:50 ` Andrew Cooper
2025-08-26 22:03 ` Xin Li
2025-08-26 22:20 ` Andrew Cooper
2025-08-21 22:36 ` [PATCH v6 07/20] KVM: VMX: Save/restore guest FRED RSP0 Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 08/20] KVM: VMX: Add support for FRED context save/restore Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 09/20] KVM: x86: Add a helper to detect if FRED is enabled for a vCPU Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 10/20] KVM: VMX: Virtualize FRED event_data Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 11/20] KVM: VMX: Virtualize FRED nested exception tracking Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 12/20] KVM: x86: Save/restore the nested flag of an exception Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 13/20] KVM: x86: Mark CR4.FRED as not reserved Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 14/20] KVM: VMX: Dump FRED context in dump_vmcs() Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 15/20] KVM: x86: Advertise support for FRED Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 16/20] KVM: nVMX: Add support for the secondary VM exit controls Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 17/20] KVM: nVMX: Add FRED VMCS fields to nested VMX context handling Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 18/20] KVM: nVMX: Add FRED-related VMCS field checks Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 19/20] KVM: nVMX: Add prerequisites to SHADOW_FIELD_R[OW] macros Xin Li (Intel)
2025-08-21 22:36 ` [PATCH v6 20/20] KVM: nVMX: Allow VMX FRED controls Xin Li (Intel)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=720bc7ac-7e81-4ad9-8cc5-29ac540be283@intel.com \
--to=dave.hansen@intel.com \
--cc=andrew.cooper3@citrix.com \
--cc=bp@alien8.de \
--cc=chao.gao@intel.com \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xin@zytor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).