* [PATCH v4 00/24] x86: strict separation of startup code
@ 2025-07-09 8:08 Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
` (25 more replies)
0 siblings, 26 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
!!! Boot tested on non-SEV guest ONLY !!!!
This RFT series implements a strict separation between startup code and
ordinary code, where startup code is built in a way that tolerates being
invoked from the initial 1:1 mapping of memory.
The existing approach of emitting this code into .head.text and checking
for absolute relocations in that section is not 100% safe, and produces
diagnostics that are sometimes difficult to interpret. [0]
Instead, rely on symbol prefixes, similar to how this is implemented for
the EFI stub and for the startup code in the arm64 port. This ensures
that startup code can only call other startup code, unless a special
symbol alias is emitted that exposes a non-startup routine to the
startup code.
This is somewhat intrusive, as there are many data objects that are
referenced both by startup code and by ordinary code, and an alias needs
to be emitted for each of those. If startup code references anything
that has not been made available to it explicitly, a build time link
error will occur.
This ultimately allows the .head.text section to be dropped entirely, as
it no longer has a special significance. Instead, code that only
executes at boot is emitted into .init.text as it should.
The majority of changes is around early SEV code. The main issue is that
its use of GHCB pages and SVSM calling areas in code that may run from
both the 1:1 mapping and the kernel virtual mapping is problematic as it
relies on __pa() to perform VA to PA translations, which are ambiguous
in this context. Also, __pa() pulls in non-trivial instrumented code
when CONFIG_DEBUG_VIRTUAL=y and so it is better to avoid VA to PA
translations altogether in the startup code.
Changes since RFT/v3:
- Rebase onto tip/master
- Incorporate Borislav's feedback on v3
- Switch to objtool to check for absolute references in startup code
- Remap inittext R-X when running on EFI implementations that require
strict R-X/RW- separation
- Include a kbuild fix to incorporate arch/x86/boot/startup/ in the
right manner
- For now, omit the LA57 changes that remove the problematic early
5-level paging checks. We can revisit this once there is agreement on
the approach.
Changes since RFT/v2:
- Rebase onto tip/x86/boot and drop the patches from the previous
revision that have been applied in the meantime.
- Omit the pgtable_l5_enabled() changes for now, and just expose PIC
aliases for the variables in question - this can be sorted later.
- Don't use the boot SVSM calling area in snp_kexec_finish(), but pass
down the correct per-CPU one to the early page state API.
- Rename arch/x86/coco/sev/sev-noinstr.o to arch/x86/coco/sev/noinstr.o
- Further reduce the amount of SEV code that needs to be constructed in
a special way.
Change since RFC/v1:
- Include a major disentanglement/refactor of the SEV-SNP startup code,
so that only code that really needs to run from the 1:1 mapping is
included in the startup/ code
- Incorporate some early notes from Ingo
Build tested defconfig and allmodconfig
!!! Boot tested on non-SEV guest ONLY !!!!
Again, I will need to lean on Tom to determine whether this breaks
SEV-SNP guest boot. As I mentioned before, I am still waiting for
SEV-SNP capable hardware to be delivered.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
[0] https://lore.kernel.org/all/CAHk-=wj7k9nvJn6cpa3-5Ciwn2RGyE605BMkjWE4MqnvC9E92A@mail.gmail.com/
Ard Biesheuvel (24):
x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
x86/sev: Use MSR protocol for remapping SVSM calling area
x86/sev: Use MSR protocol only for early SVSM PVALIDATE call
x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL
x86/sev: Move GHCB page based HV communication out of startup code
x86/sev: Avoid global variable to store virtual address of SVSM area
x86/sev: Move MSR save/restore out of early page state change helper
x86/sev: Share implementation of MSR-based page state change
x86/sev: Pass SVSM calling area down to early page state change API
x86/sev: Use boot SVSM CA for all startup and init code
x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check
x86/sev: Unify SEV-SNP hypervisor feature check
x86/boot: Provide PIC aliases for 5-level paging related constants
x86/sev: Provide PIC aliases for SEV related data objects
x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
x86/sev: Export startup routines for later use
objtool: Add action to check for absence of absolute relocations
x86/boot: Check startup code for absence of absolute relocations
x86/boot: Revert "Reject absolute references in .head.text"
x86/kbuild: Incorporate boot/startup/ via Kbuild makefile
x86/boot: Create a confined code area for startup code
efistub/x86: Remap inittext read-execute when needed
x86/boot: Move startup code out of __head section
x86/boot: Get rid of the .head.text section
arch/x86/Kbuild | 2 +
arch/x86/Makefile | 1 -
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 2 +
arch/x86/boot/compressed/sev-handle-vc.c | 3 +
arch/x86/boot/compressed/sev.c | 132 ++-------
arch/x86/boot/startup/Makefile | 22 ++
arch/x86/boot/startup/exports.h | 14 +
arch/x86/boot/startup/gdt_idt.c | 4 +-
arch/x86/boot/startup/map_kernel.c | 4 +-
arch/x86/boot/startup/sev-shared.c | 313 ++++++--------------
arch/x86/boot/startup/sev-startup.c | 190 ++----------
arch/x86/boot/startup/sme.c | 29 +-
arch/x86/coco/sev/Makefile | 6 +-
arch/x86/coco/sev/core.c | 180 ++++++++---
arch/x86/coco/sev/{sev-nmi.c => noinstr.c} | 74 +++++
arch/x86/coco/sev/vc-handle.c | 2 +
arch/x86/coco/sev/vc-shared.c | 143 ++++++++-
arch/x86/include/asm/boot.h | 2 +
arch/x86/include/asm/init.h | 6 -
arch/x86/include/asm/setup.h | 1 +
arch/x86/include/asm/sev-internal.h | 29 +-
arch/x86/include/asm/sev.h | 19 +-
arch/x86/kernel/head64.c | 5 +-
arch/x86/kernel/head_32.S | 2 +-
arch/x86/kernel/head_64.S | 10 +-
arch/x86/kernel/vmlinux.lds.S | 9 +-
arch/x86/mm/mem_encrypt_amd.c | 6 -
arch/x86/mm/mem_encrypt_boot.S | 6 +-
arch/x86/platform/pvh/head.S | 2 +-
arch/x86/tools/relocs.c | 8 +-
drivers/firmware/efi/libstub/x86-stub.c | 4 +-
tools/objtool/builtin-check.c | 2 +
tools/objtool/check.c | 39 ++-
tools/objtool/include/objtool/builtin.h | 1 +
35 files changed, 632 insertions(+), 642 deletions(-)
create mode 100644 arch/x86/boot/startup/exports.h
rename arch/x86/coco/sev/{sev-nmi.c => noinstr.c} (61%)
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 15:12 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 02/24] x86/sev: Use MSR protocol for remapping SVSM calling area Ard Biesheuvel
` (24 subsequent siblings)
25 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
There are two distinct callers of snp_cpuid(): one where the MSR
protocol is always used, and one where the GHCB page based interface is
always used.
The snp_cpuid() logic does not care about the distinction, which only
matters at a lower level. But the fact that it supports both interfaces
means that the GHCB page based logic is pulled into the early startup
code where PA to VA conversions are problematic, given that it runs from
the 1:1 mapping of memory.
So keep snp_cpuid() itself in the startup code, but factor out the
hypervisor calls via a callback, so that the GHCB page handling can be
moved out.
Code refactoring only - no functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/sev-shared.c | 64 ++++----------------
arch/x86/coco/sev/vc-shared.c | 49 ++++++++++++++-
arch/x86/include/asm/sev.h | 3 +-
3 files changed, 63 insertions(+), 53 deletions(-)
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 7a706db87b93..992abfa50508 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -319,7 +319,7 @@ static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
return 0;
}
-static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
+static int __sev_cpuid_msr_prot(struct cpuid_leaf *leaf)
{
int ret;
@@ -342,44 +342,7 @@ static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
return ret;
}
-static int __sev_cpuid_hv_ghcb(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
-{
- u32 cr4 = native_read_cr4();
- int ret;
-
- ghcb_set_rax(ghcb, leaf->fn);
- ghcb_set_rcx(ghcb, leaf->subfn);
-
- if (cr4 & X86_CR4_OSXSAVE)
- /* Safe to read xcr0 */
- ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
- else
- /* xgetbv will cause #UD - use reset value for xcr0 */
- ghcb_set_xcr0(ghcb, 1);
-
- ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
- if (ret != ES_OK)
- return ret;
-
- if (!(ghcb_rax_is_valid(ghcb) &&
- ghcb_rbx_is_valid(ghcb) &&
- ghcb_rcx_is_valid(ghcb) &&
- ghcb_rdx_is_valid(ghcb)))
- return ES_VMM_ERROR;
- leaf->eax = ghcb->save.rax;
- leaf->ebx = ghcb->save.rbx;
- leaf->ecx = ghcb->save.rcx;
- leaf->edx = ghcb->save.rdx;
-
- return ES_OK;
-}
-
-static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
-{
- return ghcb ? __sev_cpuid_hv_ghcb(ghcb, ctxt, leaf)
- : __sev_cpuid_hv_msr(leaf);
-}
/*
* This may be called early while still running on the initial identity
@@ -484,21 +447,20 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
return false;
}
-static void snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+static void snp_cpuid_msr_prot(void *ctx, struct cpuid_leaf *leaf)
{
- if (sev_cpuid_hv(ghcb, ctxt, leaf))
+ if (__sev_cpuid_msr_prot(leaf))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
}
-static int __head
-snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
- struct cpuid_leaf *leaf)
+static int __head snp_cpuid_postprocess(void (*cpuid)(void *ctx, struct cpuid_leaf *),
+ void *ctx, struct cpuid_leaf *leaf)
{
struct cpuid_leaf leaf_hv = *leaf;
switch (leaf->fn) {
case 0x1:
- snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
+ cpuid(ctx, &leaf_hv);
/* initial APIC ID */
leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
@@ -517,7 +479,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
break;
case 0xB:
leaf_hv.subfn = 0;
- snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
+ cpuid(ctx, &leaf_hv);
/* extended APIC ID */
leaf->edx = leaf_hv.edx;
@@ -565,7 +527,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
}
break;
case 0x8000001E:
- snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
+ cpuid(ctx, &leaf_hv);
/* extended APIC ID */
leaf->eax = leaf_hv.eax;
@@ -586,8 +548,8 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
* Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
* should be treated as fatal by caller.
*/
-int __head
-snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+int __head snp_cpuid(void (*cpuid)(void *ctx, struct cpuid_leaf *), void *ctx,
+ struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -621,7 +583,7 @@ snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
return 0;
}
- return snp_cpuid_postprocess(ghcb, ctxt, leaf);
+ return snp_cpuid_postprocess(cpuid, ctx, leaf);
}
/*
@@ -648,14 +610,14 @@ void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
leaf.fn = fn;
leaf.subfn = subfn;
- ret = snp_cpuid(NULL, NULL, &leaf);
+ ret = snp_cpuid(snp_cpuid_msr_prot, NULL, &leaf);
if (!ret)
goto cpuid_done;
if (ret != -EOPNOTSUPP)
goto fail;
- if (__sev_cpuid_hv_msr(&leaf))
+ if (__sev_cpuid_msr_prot(&leaf))
goto fail;
cpuid_done:
diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
index 2c0ab0fdc060..776cb90be530 100644
--- a/arch/x86/coco/sev/vc-shared.c
+++ b/arch/x86/coco/sev/vc-shared.c
@@ -409,15 +409,62 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}
+static int __sev_cpuid_ghcb_prot(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+{
+ u32 cr4 = native_read_cr4();
+ int ret;
+
+ ghcb_set_rax(ghcb, leaf->fn);
+ ghcb_set_rcx(ghcb, leaf->subfn);
+
+ if (cr4 & X86_CR4_OSXSAVE)
+ /* Safe to read xcr0 */
+ ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
+ else
+ /* xgetbv will cause #UD - use reset value for xcr0 */
+ ghcb_set_xcr0(ghcb, 1);
+
+ ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
+ if (ret != ES_OK)
+ return ret;
+
+ if (!(ghcb_rax_is_valid(ghcb) &&
+ ghcb_rbx_is_valid(ghcb) &&
+ ghcb_rcx_is_valid(ghcb) &&
+ ghcb_rdx_is_valid(ghcb)))
+ return ES_VMM_ERROR;
+
+ leaf->eax = ghcb->save.rax;
+ leaf->ebx = ghcb->save.rbx;
+ leaf->ecx = ghcb->save.rcx;
+ leaf->edx = ghcb->save.rdx;
+
+ return ES_OK;
+}
+
+struct cpuid_ctx {
+ struct ghcb *ghcb;
+ struct es_em_ctxt *ctxt;
+};
+
+static void snp_cpuid_ghcb_prot(void *p, struct cpuid_leaf *leaf)
+{
+ struct cpuid_ctx *ctx = p;
+
+ if (__sev_cpuid_ghcb_prot(ctx->ghcb, ctx->ctxt, leaf))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
+}
+
static int vc_handle_cpuid_snp(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
{
+ struct cpuid_ctx ctx = { ghcb, ctxt };
struct pt_regs *regs = ctxt->regs;
struct cpuid_leaf leaf;
int ret;
leaf.fn = regs->ax;
leaf.subfn = regs->cx;
- ret = snp_cpuid(ghcb, ctxt, &leaf);
+ ret = snp_cpuid(snp_cpuid_ghcb_prot, &ctx, &leaf);
if (!ret) {
regs->ax = leaf.eax;
regs->bx = leaf.ebx;
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 89075ff19afa..2cabf617de3c 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -552,7 +552,8 @@ struct cpuid_leaf {
u32 edx;
};
-int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf);
+int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
+ void *ctx, struct cpuid_leaf *leaf);
void __noreturn sev_es_terminate(unsigned int set, unsigned int reason);
enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 02/24] x86/sev: Use MSR protocol for remapping SVSM calling area
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 03/24] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call Ard Biesheuvel
` (23 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
As the preceding code comment already indicates, remapping the SVSM
calling area occurs long before the GHCB page is configured, and so
calling svsm_perform_call_protocol() is guaranteed to result in a call
to svsm_perform_msr_protocol().
So just call the latter directly. This allows most of the GHCB based API
infrastructure to be moved out of the startup code in a subsequent
patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
---
arch/x86/boot/startup/sev-startup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 0b7e3b950183..08b8c24fb469 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -325,7 +325,9 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
call.caa = svsm_get_caa();
call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
call.rcx = pa;
- ret = svsm_perform_call_protocol(&call);
+ do {
+ ret = svsm_perform_msr_protocol(&call);
+ } while (ret == -EAGAIN);
if (ret)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 03/24] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 02/24] x86/sev: Use MSR protocol for remapping SVSM calling area Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 15:50 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 04/24] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
` (22 subsequent siblings)
25 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
The early page state change API performs an SVSM call to PVALIDATE each
page when running under a SVSM, and this involves either a GHCB page
based call or a call based on the MSR protocol.
The GHCB page based variant involves VA to PA translation of the GHCB
address, and this is best avoided in the startup code, where virtual
addresses are ambiguous (1:1 or kernel virtual).
As this is the last remaining occurrence of svsm_perform_call_protocol()
in the startup code, switch to the MSR protocol exclusively in this
particular case, so that the GHCB based plumbing can be moved out of the
startup code entirely in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 20 --------------------
arch/x86/boot/startup/sev-shared.c | 4 +++-
2 files changed, 3 insertions(+), 21 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index fd1b67dfea22..b71c1ab6a282 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -50,31 +50,11 @@ u64 svsm_get_caa_pa(void)
return boot_svsm_caa_pa;
}
-int svsm_perform_call_protocol(struct svsm_call *call);
-
u8 snp_vmpl;
/* Include code for early handlers */
#include "../../boot/startup/sev-shared.c"
-int svsm_perform_call_protocol(struct svsm_call *call)
-{
- struct ghcb *ghcb;
- int ret;
-
- if (boot_ghcb)
- ghcb = boot_ghcb;
- else
- ghcb = NULL;
-
- do {
- ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
- : svsm_perform_msr_protocol(call);
- } while (ret == -EAGAIN);
-
- return ret;
-}
-
static bool sev_snp_enabled(void)
{
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 992abfa50508..1bb4b522dfaa 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -753,7 +753,9 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
call.rcx = pc_pa;
- ret = svsm_perform_call_protocol(&call);
+ do {
+ ret = svsm_perform_msr_protocol(&call);
+ } while (ret == -EAGAIN);
if (ret)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 04/24] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (2 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 03/24] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 16:13 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 05/24] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
` (21 subsequent siblings)
25 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Determining the VMPL at which the kernel runs involves performing a
RMPADJUST operation on an arbitrary page of memory, and observing whether
it succeeds.
The use of boot_ghcb_page in the core kernel in this case is completely
arbitrary, but results in the need to provide a PIC alias for it. So use
boot_svsm_ca_page instead, which already needs this alias for other
reasons.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 2 +-
arch/x86/boot/startup/sev-shared.c | 5 +++--
arch/x86/boot/startup/sev-startup.c | 2 +-
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index b71c1ab6a282..3628e9bddc6a 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -327,7 +327,7 @@ static bool early_snp_init(struct boot_params *bp)
* running at VMPL0. The CA will be used to communicate with the
* SVSM and request its services.
*/
- svsm_setup_ca(cc_info);
+ svsm_setup_ca(cc_info, rip_rel_ptr(&boot_ghcb_page));
/*
* Pass run-time kernel a pointer to CC info via boot_params so EFI
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 1bb4b522dfaa..db89a55623f8 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -780,7 +780,8 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
* Maintain the GPA of the SVSM Calling Area (CA) in order to utilize the SVSM
* services needed when not running in VMPL0.
*/
-static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info)
+static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
+ void *page)
{
struct snp_secrets_page *secrets_page;
struct snp_cpuid_table *cpuid_table;
@@ -803,7 +804,7 @@ static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info)
* routine is running identity mapped when called, both by the decompressor
* code and the early kernel code.
*/
- if (!rmpadjust((unsigned long)rip_rel_ptr(&boot_ghcb_page), RMP_PG_SIZE_4K, 1))
+ if (!rmpadjust((unsigned long)page, RMP_PG_SIZE_4K, 1))
return false;
/*
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 08b8c24fb469..0994986be908 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -303,7 +303,7 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
* running at VMPL0. The CA will be used to communicate with the
* SVSM to perform the SVSM services.
*/
- if (!svsm_setup_ca(cc_info))
+ if (!svsm_setup_ca(cc_info, rip_rel_ptr(&boot_svsm_ca_page)))
return;
/*
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 05/24] x86/sev: Move GHCB page based HV communication out of startup code
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (3 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 04/24] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 06/24] x86/sev: Avoid global variable to store virtual address of SVSM area Ard Biesheuvel
` (20 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Both the decompressor and the core kernel implement an early #VC
handler, which only deals with CPUID instructions, and full featured
one, which can handle any #VC exception.
The former communicates with the hypervisor using the MSR based
protocol, whereas the latter uses a shared GHCB page, which is
configured a bit later during the boot, when the kernel runs from its
ordinary virtual mapping, rather than the 1:1 mapping that the startup
code uses.
Accessing this shared GHCB page from the core kernel's startup code is
problematic, because it involves converting the GHCB address provided by
the caller to a physical address. In the startup code, virtual to
physical address translations are problematic, given that the virtual
address might be a 1:1 mapped address, and such translations should
therefore be avoided.
This means that exposing startup code dealing with the GHCB to callers
that execute from the ordinary kernel virtual mapping should be avoided
too. So move all GHCB page based communication out of the startup code,
now that all communication occurring before the kernel virtual mapping
is up relies on the MSR protocol only.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev-handle-vc.c | 3 +
arch/x86/boot/startup/sev-shared.c | 143 +-------------------
arch/x86/boot/startup/sev-startup.c | 42 ------
arch/x86/coco/sev/core.c | 76 +++++++++++
arch/x86/coco/sev/vc-handle.c | 2 +
arch/x86/coco/sev/vc-shared.c | 94 +++++++++++++
arch/x86/include/asm/sev-internal.h | 7 +-
arch/x86/include/asm/sev.h | 11 +-
8 files changed, 190 insertions(+), 188 deletions(-)
diff --git a/arch/x86/boot/compressed/sev-handle-vc.c b/arch/x86/boot/compressed/sev-handle-vc.c
index 89dd02de2a0f..7530ad8b768b 100644
--- a/arch/x86/boot/compressed/sev-handle-vc.c
+++ b/arch/x86/boot/compressed/sev-handle-vc.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include "misc.h"
+#include "error.h"
#include "sev.h"
#include <linux/kernel.h>
@@ -14,6 +15,8 @@
#include <asm/fpu/xcr.h>
#define __BOOT_COMPRESSED
+#undef __init
+#define __init
/* Basic instruction decoding support needed */
#include "../../lib/inat.c"
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index db89a55623f8..7d2415d690c9 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -13,12 +13,9 @@
#ifndef __BOOT_COMPRESSED
#define error(v) pr_err(v)
-#define has_cpuflag(f) boot_cpu_has(f)
#else
#undef WARN
#define WARN(condition, format...) (!!(condition))
-#undef vc_forward_exception
-#define vc_forward_exception(c) panic("SNP: Hypervisor requested exception\n")
#endif
/*
@@ -39,7 +36,7 @@ u64 boot_svsm_caa_pa __ro_after_init;
*
* GHCB protocol version negotiated with the hypervisor.
*/
-static u16 ghcb_version __ro_after_init;
+u16 ghcb_version __ro_after_init;
/* Copy of the SNP firmware's CPUID page. */
static struct snp_cpuid_table cpuid_table_copy __ro_after_init;
@@ -54,16 +51,6 @@ static u32 cpuid_std_range_max __ro_after_init;
static u32 cpuid_hyp_range_max __ro_after_init;
static u32 cpuid_ext_range_max __ro_after_init;
-bool __init sev_es_check_cpu_features(void)
-{
- if (!has_cpuflag(X86_FEATURE_RDRAND)) {
- error("RDRAND instruction not supported - no trusted source of randomness available\n");
- return false;
- }
-
- return true;
-}
-
void __head __noreturn
sev_es_terminate(unsigned int set, unsigned int reason)
{
@@ -100,72 +87,7 @@ u64 get_hv_features(void)
return GHCB_MSR_HV_FT_RESP_VAL(val);
}
-void snp_register_ghcb_early(unsigned long paddr)
-{
- unsigned long pfn = paddr >> PAGE_SHIFT;
- u64 val;
-
- sev_es_wr_ghcb_msr(GHCB_MSR_REG_GPA_REQ_VAL(pfn));
- VMGEXIT();
-
- val = sev_es_rd_ghcb_msr();
-
- /* If the response GPA is not ours then abort the guest */
- if ((GHCB_RESP_CODE(val) != GHCB_MSR_REG_GPA_RESP) ||
- (GHCB_MSR_REG_GPA_RESP_VAL(val) != pfn))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_REGISTER);
-}
-
-bool sev_es_negotiate_protocol(void)
-{
- u64 val;
-
- /* Do the GHCB protocol version negotiation */
- sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
-
- if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
- return false;
-
- if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
- GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
- return false;
-
- ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
-
- return true;
-}
-
-static enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
-{
- u32 ret;
-
- ret = ghcb->save.sw_exit_info_1 & GENMASK_ULL(31, 0);
- if (!ret)
- return ES_OK;
-
- if (ret == 1) {
- u64 info = ghcb->save.sw_exit_info_2;
- unsigned long v = info & SVM_EVTINJ_VEC_MASK;
-
- /* Check if exception information from hypervisor is sane. */
- if ((info & SVM_EVTINJ_VALID) &&
- ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
- ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
- ctxt->fi.vector = v;
-
- if (info & SVM_EVTINJ_VALID_ERR)
- ctxt->fi.error_code = info >> 32;
-
- return ES_EXCEPTION;
- }
- }
-
- return ES_VMM_ERROR;
-}
-
-static inline int svsm_process_result_codes(struct svsm_call *call)
+int svsm_process_result_codes(struct svsm_call *call)
{
switch (call->rax_out) {
case SVSM_SUCCESS:
@@ -193,7 +115,7 @@ static inline int svsm_process_result_codes(struct svsm_call *call)
* - RAX specifies the SVSM protocol/callid as input and the return code
* as output.
*/
-static __always_inline void svsm_issue_call(struct svsm_call *call, u8 *pending)
+void svsm_issue_call(struct svsm_call *call, u8 *pending)
{
register unsigned long rax asm("rax") = call->rax;
register unsigned long rcx asm("rcx") = call->rcx;
@@ -216,7 +138,7 @@ static __always_inline void svsm_issue_call(struct svsm_call *call, u8 *pending)
call->r9_out = r9;
}
-static int svsm_perform_msr_protocol(struct svsm_call *call)
+int svsm_perform_msr_protocol(struct svsm_call *call)
{
u8 pending = 0;
u64 val, resp;
@@ -247,63 +169,6 @@ static int svsm_perform_msr_protocol(struct svsm_call *call)
return svsm_process_result_codes(call);
}
-static int svsm_perform_ghcb_protocol(struct ghcb *ghcb, struct svsm_call *call)
-{
- struct es_em_ctxt ctxt;
- u8 pending = 0;
-
- vc_ghcb_invalidate(ghcb);
-
- /*
- * Fill in protocol and format specifiers. This can be called very early
- * in the boot, so use rip-relative references as needed.
- */
- ghcb->protocol_version = ghcb_version;
- ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
-
- ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_SNP_RUN_VMPL);
- ghcb_set_sw_exit_info_1(ghcb, 0);
- ghcb_set_sw_exit_info_2(ghcb, 0);
-
- sev_es_wr_ghcb_msr(__pa(ghcb));
-
- svsm_issue_call(call, &pending);
-
- if (pending)
- return -EINVAL;
-
- switch (verify_exception_info(ghcb, &ctxt)) {
- case ES_OK:
- break;
- case ES_EXCEPTION:
- vc_forward_exception(&ctxt);
- fallthrough;
- default:
- return -EINVAL;
- }
-
- return svsm_process_result_codes(call);
-}
-
-enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
- struct es_em_ctxt *ctxt,
- u64 exit_code, u64 exit_info_1,
- u64 exit_info_2)
-{
- /* Fill in protocol and format specifiers */
- ghcb->protocol_version = ghcb_version;
- ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
-
- ghcb_set_sw_exit_code(ghcb, exit_code);
- ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
- ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
-
- sev_es_wr_ghcb_msr(__pa(ghcb));
- VMGEXIT();
-
- return verify_exception_info(ghcb, ctxt);
-}
-
static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
{
u64 val;
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 0994986be908..050e071504af 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -41,15 +41,6 @@
#include <asm/cpuid/api.h>
#include <asm/cmdline.h>
-/* For early boot hypervisor communication in SEV-ES enabled guests */
-struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
-
-/*
- * Needs to be in the .data section because we need it NULL before bss is
- * cleared
- */
-struct ghcb *boot_ghcb __section(".data");
-
/* Bitmap of SEV features supported by the hypervisor */
u64 sev_hv_features __ro_after_init;
@@ -139,39 +130,6 @@ noinstr void __sev_put_ghcb(struct ghcb_state *state)
}
}
-int svsm_perform_call_protocol(struct svsm_call *call)
-{
- struct ghcb_state state;
- unsigned long flags;
- struct ghcb *ghcb;
- int ret;
-
- /*
- * This can be called very early in the boot, use native functions in
- * order to avoid paravirt issues.
- */
- flags = native_local_irq_save();
-
- if (sev_cfg.ghcbs_initialized)
- ghcb = __sev_get_ghcb(&state);
- else if (boot_ghcb)
- ghcb = boot_ghcb;
- else
- ghcb = NULL;
-
- do {
- ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
- : svsm_perform_msr_protocol(call);
- } while (ret == -EAGAIN);
-
- if (sev_cfg.ghcbs_initialized)
- __sev_put_ghcb(&state);
-
- native_local_irq_restore(flags);
-
- return ret;
-}
-
void __head
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index fc59ce78c477..15be9e52848d 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -101,6 +101,15 @@ DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
u8 snp_vmpl __ro_after_init;
EXPORT_SYMBOL_GPL(snp_vmpl);
+/* For early boot hypervisor communication in SEV-ES enabled guests */
+static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
+
+/*
+ * Needs to be in the .data section because we need it NULL before bss is
+ * cleared
+ */
+struct ghcb *boot_ghcb __section(".data");
+
static u64 __init get_snp_jump_table_addr(void)
{
struct snp_secrets_page *secrets;
@@ -154,6 +163,73 @@ static u64 __init get_jump_table_addr(void)
return ret;
}
+static int svsm_perform_ghcb_protocol(struct ghcb *ghcb, struct svsm_call *call)
+{
+ struct es_em_ctxt ctxt;
+ u8 pending = 0;
+
+ vc_ghcb_invalidate(ghcb);
+
+ /*
+ * Fill in protocol and format specifiers. This can be called very early
+ * in the boot, so use rip-relative references as needed.
+ */
+ ghcb->protocol_version = ghcb_version;
+ ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
+
+ ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_SNP_RUN_VMPL);
+ ghcb_set_sw_exit_info_1(ghcb, 0);
+ ghcb_set_sw_exit_info_2(ghcb, 0);
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+
+ svsm_issue_call(call, &pending);
+
+ if (pending)
+ return -EINVAL;
+
+ switch (verify_exception_info(ghcb, &ctxt)) {
+ case ES_OK:
+ break;
+ case ES_EXCEPTION:
+ vc_forward_exception(&ctxt);
+ fallthrough;
+ default:
+ return -EINVAL;
+ }
+
+ return svsm_process_result_codes(call);
+}
+
+static int svsm_perform_call_protocol(struct svsm_call *call)
+{
+ struct ghcb_state state;
+ unsigned long flags;
+ struct ghcb *ghcb;
+ int ret;
+
+ flags = native_local_irq_save();
+
+ if (sev_cfg.ghcbs_initialized)
+ ghcb = __sev_get_ghcb(&state);
+ else if (boot_ghcb)
+ ghcb = boot_ghcb;
+ else
+ ghcb = NULL;
+
+ do {
+ ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
+ : svsm_perform_msr_protocol(call);
+ } while (ret == -EAGAIN);
+
+ if (sev_cfg.ghcbs_initialized)
+ __sev_put_ghcb(&state);
+
+ native_local_irq_restore(flags);
+
+ return ret;
+}
+
static inline void __pval_terminate(u64 pfn, bool action, unsigned int page_size,
int ret, u64 svsm_ret)
{
diff --git a/arch/x86/coco/sev/vc-handle.c b/arch/x86/coco/sev/vc-handle.c
index faf1fce89ed4..9a5e16f70e83 100644
--- a/arch/x86/coco/sev/vc-handle.c
+++ b/arch/x86/coco/sev/vc-handle.c
@@ -351,6 +351,8 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
}
#define sev_printk(fmt, ...) printk(fmt, ##__VA_ARGS__)
+#define error(v)
+#define has_cpuflag(f) boot_cpu_has(f)
#include "vc-shared.c"
diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
index 776cb90be530..3d44474f46e7 100644
--- a/arch/x86/coco/sev/vc-shared.c
+++ b/arch/x86/coco/sev/vc-shared.c
@@ -409,6 +409,53 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}
+enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+ u32 ret;
+
+ ret = ghcb->save.sw_exit_info_1 & GENMASK_ULL(31, 0);
+ if (!ret)
+ return ES_OK;
+
+ if (ret == 1) {
+ u64 info = ghcb->save.sw_exit_info_2;
+ unsigned long v = info & SVM_EVTINJ_VEC_MASK;
+
+ /* Check if exception information from hypervisor is sane. */
+ if ((info & SVM_EVTINJ_VALID) &&
+ ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
+ ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
+ ctxt->fi.vector = v;
+
+ if (info & SVM_EVTINJ_VALID_ERR)
+ ctxt->fi.error_code = info >> 32;
+
+ return ES_EXCEPTION;
+ }
+ }
+
+ return ES_VMM_ERROR;
+}
+
+enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+ struct es_em_ctxt *ctxt,
+ u64 exit_code, u64 exit_info_1,
+ u64 exit_info_2)
+{
+ /* Fill in protocol and format specifiers */
+ ghcb->protocol_version = ghcb_version;
+ ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
+
+ ghcb_set_sw_exit_code(ghcb, exit_code);
+ ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
+ ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+ VMGEXIT();
+
+ return verify_exception_info(ghcb, ctxt);
+}
+
static int __sev_cpuid_ghcb_prot(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
{
u32 cr4 = native_read_cr4();
@@ -549,3 +596,50 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
return ES_OK;
}
+
+void snp_register_ghcb_early(unsigned long paddr)
+{
+ unsigned long pfn = paddr >> PAGE_SHIFT;
+ u64 val;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_REG_GPA_REQ_VAL(pfn));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ /* If the response GPA is not ours then abort the guest */
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_REG_GPA_RESP) ||
+ (GHCB_MSR_REG_GPA_RESP_VAL(val) != pfn))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_REGISTER);
+}
+
+bool __init sev_es_check_cpu_features(void)
+{
+ if (!has_cpuflag(X86_FEATURE_RDRAND)) {
+ error("RDRAND instruction not supported - no trusted source of randomness available\n");
+ return false;
+ }
+
+ return true;
+}
+
+bool sev_es_negotiate_protocol(void)
+{
+ u64 val;
+
+ /* Do the GHCB protocol version negotiation */
+ sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
+ return false;
+
+ if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
+ GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
+ return false;
+
+ ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+
+ return true;
+}
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 3dfd306d1c9e..6199b35a82e4 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -2,7 +2,6 @@
#define DR7_RESET_VALUE 0x400
-extern struct ghcb boot_ghcb_page;
extern u64 sev_hv_features;
extern u64 sev_secrets_pa;
@@ -80,7 +79,8 @@ static __always_inline u64 svsm_get_caa_pa(void)
return boot_svsm_caa_pa;
}
-int svsm_perform_call_protocol(struct svsm_call *call);
+enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt);
+void vc_forward_exception(struct es_em_ctxt *ctxt);
static inline u64 sev_es_rd_ghcb_msr(void)
{
@@ -97,9 +97,6 @@ static __always_inline void sev_es_wr_ghcb_msr(u64 val)
native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
}
-void snp_register_ghcb_early(unsigned long paddr);
-bool sev_es_negotiate_protocol(void);
-bool sev_es_check_cpu_features(void);
u64 get_hv_features(void);
const struct snp_cpuid_table *snp_cpuid_get_table(void);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 2cabf617de3c..135e91a17d04 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -503,6 +503,7 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
}
void setup_ghcb(void);
+void snp_register_ghcb_early(unsigned long paddr);
void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned long npages);
void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
@@ -540,8 +541,6 @@ static __always_inline void vc_ghcb_invalidate(struct ghcb *ghcb)
__builtin_memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
}
-void vc_forward_exception(struct es_em_ctxt *ctxt);
-
/* I/O parameters for CPUID-related helpers */
struct cpuid_leaf {
u32 fn;
@@ -552,15 +551,23 @@ struct cpuid_leaf {
u32 edx;
};
+int svsm_perform_msr_protocol(struct svsm_call *call);
int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
void *ctx, struct cpuid_leaf *leaf);
+void svsm_issue_call(struct svsm_call *call, u8 *pending);
+int svsm_process_result_codes(struct svsm_call *call);
+
void __noreturn sev_es_terminate(unsigned int set, unsigned int reason);
enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
struct es_em_ctxt *ctxt,
u64 exit_code, u64 exit_info_1,
u64 exit_info_2);
+bool sev_es_negotiate_protocol(void);
+bool sev_es_check_cpu_features(void);
+
+extern u16 ghcb_version;
extern struct ghcb *boot_ghcb;
#else /* !CONFIG_AMD_MEM_ENCRYPT */
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 06/24] x86/sev: Avoid global variable to store virtual address of SVSM area
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (4 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 05/24] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 17:49 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 07/24] x86/sev: Move MSR save/restore out of early page state change helper Ard Biesheuvel
` (19 subsequent siblings)
25 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
The boottime SVSM calling area is used both by the startup code running
from a 1:1 mapping, and potentially later on running from the ordinary
kernel mapping.
This SVSM calling area is statically allocated, and so its physical
address doesn't change. However, its virtual address depends on the
calling context (1:1 mapping or kernel virtual mapping), and even though
the variable that holds the virtual address of this calling area gets
updated from 1:1 address to kernel address during the boot, it is hard
to reason about why this is guaranteed to be safe.
So instead, take the RIP-relative address of the boottime SVSM calling
area whenever its virtual address is required, and only use a global
variable for the physical address.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 5 ++---
arch/x86/boot/startup/sev-shared.c | 6 ------
arch/x86/boot/startup/sev-startup.c | 4 ++--
arch/x86/coco/sev/core.c | 9 ---------
arch/x86/include/asm/sev-internal.h | 3 +--
arch/x86/include/asm/sev.h | 2 --
arch/x86/mm/mem_encrypt_amd.c | 6 ------
7 files changed, 5 insertions(+), 30 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 3628e9bddc6a..6c0f91d38595 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -37,12 +37,12 @@ struct ghcb *boot_ghcb;
#define __BOOT_COMPRESSED
-extern struct svsm_ca *boot_svsm_caa;
extern u64 boot_svsm_caa_pa;
struct svsm_ca *svsm_get_caa(void)
{
- return boot_svsm_caa;
+ /* The decompressor is mapped 1:1 so VA == PA */
+ return (struct svsm_ca *)boot_svsm_caa_pa;
}
u64 svsm_get_caa_pa(void)
@@ -530,7 +530,6 @@ bool early_is_sevsnp_guest(void)
/* Obtain the address of the calling area to use */
boot_rdmsr(MSR_SVSM_CAA, &m);
- boot_svsm_caa = (void *)m.q;
boot_svsm_caa_pa = m.q;
/*
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 7d2415d690c9..7ca59038269f 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -26,7 +26,6 @@
* early boot, both with identity mapped virtual addresses and proper kernel
* virtual addresses.
*/
-struct svsm_ca *boot_svsm_caa __ro_after_init;
u64 boot_svsm_caa_pa __ro_after_init;
/*
@@ -697,11 +696,6 @@ static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
if (caa & (PAGE_SIZE - 1))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CAA);
- /*
- * The CA is identity mapped when this routine is called, both by the
- * decompressor code and the early kernel code.
- */
- boot_svsm_caa = (struct svsm_ca *)caa;
boot_svsm_caa_pa = caa;
/* Advertise the SVSM presence via CPUID. */
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 050e071504af..8edf1ba78a48 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -252,6 +252,7 @@ static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
{
+ struct snp_secrets_page *secrets = (void *)cc_info->secrets_phys;
struct svsm_call call = {};
int ret;
u64 pa;
@@ -280,7 +281,7 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
* RAX = 0 (Protocol=0, CallID=0)
* RCX = New CA GPA
*/
- call.caa = svsm_get_caa();
+ call.caa = (struct svsm_ca *)secrets->svsm_caa;
call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
call.rcx = pa;
do {
@@ -289,7 +290,6 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
if (ret)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL);
- boot_svsm_caa = (struct svsm_ca *)pa;
boot_svsm_caa_pa = pa;
}
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 15be9e52848d..bea67d017bf0 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -1643,15 +1643,6 @@ void sev_show_status(void)
pr_cont("\n");
}
-void __init snp_update_svsm_ca(void)
-{
- if (!snp_vmpl)
- return;
-
- /* Update the CAA to a proper kernel address */
- boot_svsm_caa = &boot_svsm_ca_page;
-}
-
#ifdef CONFIG_SYSFS
static ssize_t vmpl_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 6199b35a82e4..ffe4755962fe 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -60,7 +60,6 @@ void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
DECLARE_PER_CPU(struct svsm_ca *, svsm_caa);
DECLARE_PER_CPU(u64, svsm_caa_pa);
-extern struct svsm_ca *boot_svsm_caa;
extern u64 boot_svsm_caa_pa;
static __always_inline struct svsm_ca *svsm_get_caa(void)
@@ -68,7 +67,7 @@ static __always_inline struct svsm_ca *svsm_get_caa(void)
if (sev_cfg.use_cas)
return this_cpu_read(svsm_caa);
else
- return boot_svsm_caa;
+ return rip_rel_ptr(&boot_svsm_ca_page);
}
static __always_inline u64 svsm_get_caa_pa(void)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 135e91a17d04..f3acbfcdca9a 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -519,7 +519,6 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end);
u64 snp_get_unsupported_features(u64 status);
u64 sev_get_status(void);
void sev_show_status(void);
-void snp_update_svsm_ca(void);
int prepare_pte_enc(struct pte_enc_desc *d);
void set_pte_enc_mask(pte_t *kpte, unsigned long pfn, pgprot_t new_prot);
void snp_kexec_finish(void);
@@ -600,7 +599,6 @@ static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
static inline u64 sev_get_status(void) { return 0; }
static inline void sev_show_status(void) { }
-static inline void snp_update_svsm_ca(void) { }
static inline int prepare_pte_enc(struct pte_enc_desc *d) { return 0; }
static inline void set_pte_enc_mask(pte_t *kpte, unsigned long pfn, pgprot_t new_prot) { }
static inline void snp_kexec_finish(void) { }
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index faf3a13fb6ba..2f8c32173972 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -536,12 +536,6 @@ void __init sme_early_init(void)
x86_init.resources.dmi_setup = snp_dmi_setup;
}
- /*
- * Switch the SVSM CA mapping (if active) from identity mapped to
- * kernel mapped.
- */
- snp_update_svsm_ca();
-
if (sev_status & MSR_AMD64_SNP_SECURE_TSC)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
}
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 07/24] x86/sev: Move MSR save/restore out of early page state change helper
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (5 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 06/24] x86/sev: Avoid global variable to store virtual address of SVSM area Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 18:16 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 08/24] x86/sev: Share implementation of MSR-based page state change Ard Biesheuvel
` (18 subsequent siblings)
25 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
The function __page_state_change() in the decompressor is very similar
to the loop in early_set_pages_state(), and they can share this code
once the MSR save/restore is moved out.
This also avoids doing the preserve/restore for each page in a longer
sequence unnecessarily.
This simplifies subsequent changes, where the APIs used by
__page_state_change() are modified for better separation between startup
code and ordinary code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 6c0f91d38595..f00f68175f14 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -71,9 +71,6 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if (op == SNP_PAGE_STATE_SHARED)
pvalidate_4k_page(paddr, paddr, false);
- /* Save the current GHCB MSR value */
- msr = sev_es_rd_ghcb_msr();
-
/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
VMGEXIT();
@@ -83,9 +80,6 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
- /* Restore the GHCB MSR value */
- sev_es_wr_ghcb_msr(msr);
-
/*
* Now that page state is changed in the RMP table, validate it so that it is
* consistent with the RMP entry.
@@ -96,18 +90,26 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
void snp_set_page_private(unsigned long paddr)
{
+ u64 msr;
+
if (!sev_snp_enabled())
return;
+ msr = sev_es_rd_ghcb_msr();
__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+ sev_es_wr_ghcb_msr(msr);
}
void snp_set_page_shared(unsigned long paddr)
{
+ u64 msr;
+
if (!sev_snp_enabled())
return;
+ msr = sev_es_rd_ghcb_msr();
__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+ sev_es_wr_ghcb_msr(msr);
}
bool early_setup_ghcb(void)
@@ -132,8 +134,11 @@ bool early_setup_ghcb(void)
void snp_accept_memory(phys_addr_t start, phys_addr_t end)
{
+ u64 msr = sev_es_rd_ghcb_msr();
+
for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
__page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
+ sev_es_wr_ghcb_msr(msr);
}
void sev_es_shutdown_ghcb(void)
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 08/24] x86/sev: Share implementation of MSR-based page state change
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (6 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 07/24] x86/sev: Move MSR save/restore out of early page state change helper Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 18:24 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 09/24] x86/sev: Pass SVSM calling area down to early page state change API Ard Biesheuvel
` (17 subsequent siblings)
25 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Both the decompressor and the SEV startup code implement the exact same
sequence for invoking the MSR based communication protocol to effectuate
a page state change.
Before tweaking the internal APIs used in both versions, merge them and
share them so those tweaks are only needed in a single place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 34 ++------------------
arch/x86/boot/startup/sev-shared.c | 29 +++++++++++++++++
arch/x86/boot/startup/sev-startup.c | 29 +----------------
3 files changed, 33 insertions(+), 59 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index f00f68175f14..6d3ed7ed03a4 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -60,34 +60,6 @@ static bool sev_snp_enabled(void)
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}
-static void __page_state_change(unsigned long paddr, enum psc_op op)
-{
- u64 val, msr;
-
- /*
- * If private -> shared then invalidate the page before requesting the
- * state change in the RMP table.
- */
- if (op == SNP_PAGE_STATE_SHARED)
- pvalidate_4k_page(paddr, paddr, false);
-
- /* Issue VMGEXIT to change the page state in RMP table. */
- sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
- VMGEXIT();
-
- /* Read the response of the VMGEXIT. */
- val = sev_es_rd_ghcb_msr();
- if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- /*
- * Now that page state is changed in the RMP table, validate it so that it is
- * consistent with the RMP entry.
- */
- if (op == SNP_PAGE_STATE_PRIVATE)
- pvalidate_4k_page(paddr, paddr, true);
-}
-
void snp_set_page_private(unsigned long paddr)
{
u64 msr;
@@ -96,7 +68,7 @@ void snp_set_page_private(unsigned long paddr)
return;
msr = sev_es_rd_ghcb_msr();
- __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+ __page_state_change(paddr, paddr, SNP_PAGE_STATE_PRIVATE);
sev_es_wr_ghcb_msr(msr);
}
@@ -108,7 +80,7 @@ void snp_set_page_shared(unsigned long paddr)
return;
msr = sev_es_rd_ghcb_msr();
- __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+ __page_state_change(paddr, paddr, SNP_PAGE_STATE_SHARED);
sev_es_wr_ghcb_msr(msr);
}
@@ -137,7 +109,7 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
u64 msr = sev_es_rd_ghcb_msr();
for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
- __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
+ __page_state_change(pa, pa, SNP_PAGE_STATE_PRIVATE);
sev_es_wr_ghcb_msr(msr);
}
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 7ca59038269f..f553268d31d7 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -640,6 +640,35 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
}
}
+static void __head __page_state_change(unsigned long vaddr, unsigned long paddr,
+ enum psc_op op)
+{
+ u64 val;
+
+ /*
+ * If private -> shared then invalidate the page before requesting the
+ * state change in the RMP table.
+ */
+ if (op == SNP_PAGE_STATE_SHARED)
+ pvalidate_4k_page(vaddr, paddr, false);
+
+ /* Issue VMGEXIT to change the page state in RMP table. */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ /* Read the response of the VMGEXIT. */
+ val = sev_es_rd_ghcb_msr();
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ /*
+ * Now that page state is changed in the RMP table, validate it so that it is
+ * consistent with the RMP entry.
+ */
+ if (op == SNP_PAGE_STATE_PRIVATE)
+ pvalidate_4k_page(vaddr, paddr, true);
+}
+
/*
* Maintain the GPA of the SVSM Calling Area (CA) in order to utilize the SVSM
* services needed when not running in VMPL0.
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 8edf1ba78a48..2ffd8bf09357 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -135,7 +135,6 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
- u64 val;
vaddr = vaddr & PAGE_MASK;
@@ -143,37 +142,11 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
paddr_end = paddr + (npages << PAGE_SHIFT);
while (paddr < paddr_end) {
- /* Page validation must be rescinded before changing to shared */
- if (op == SNP_PAGE_STATE_SHARED)
- pvalidate_4k_page(vaddr, paddr, false);
-
- /*
- * Use the MSR protocol because this function can be called before
- * the GHCB is established.
- */
- sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
- VMGEXIT();
-
- val = sev_es_rd_ghcb_msr();
-
- if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP)
- goto e_term;
-
- if (GHCB_MSR_PSC_RESP_VAL(val))
- goto e_term;
-
- /* Page validation must be performed after changing to private */
- if (op == SNP_PAGE_STATE_PRIVATE)
- pvalidate_4k_page(vaddr, paddr, true);
+ __page_state_change(vaddr, paddr, op);
vaddr += PAGE_SIZE;
paddr += PAGE_SIZE;
}
-
- return;
-
-e_term:
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
}
void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 09/24] x86/sev: Pass SVSM calling area down to early page state change API
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (7 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 08/24] x86/sev: Share implementation of MSR-based page state change Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 10/24] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
` (16 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
The early page state change API is mostly only used very early, when
only the boot time SVSM calling area is in use. However, this API is
also called by the kexec finishing code, which runs very late, and
potentially from a different CPU (which uses a different calling area).
To avoid pulling the per-CPU SVSM calling area pointers and related SEV
state into the startup code, refactor the page state change API so the
SVSM calling area virtual and physical addresses can be provided by the
caller.
No functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 12 +++++++++---
arch/x86/boot/startup/sev-shared.c | 17 +++++++++--------
arch/x86/boot/startup/sev-startup.c | 11 +++++++----
arch/x86/coco/sev/core.c | 3 ++-
arch/x86/include/asm/sev-internal.h | 3 ++-
5 files changed, 29 insertions(+), 17 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 6d3ed7ed03a4..bc23b3dabf1b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -68,7 +68,9 @@ void snp_set_page_private(unsigned long paddr)
return;
msr = sev_es_rd_ghcb_msr();
- __page_state_change(paddr, paddr, SNP_PAGE_STATE_PRIVATE);
+ __page_state_change(paddr, paddr, SNP_PAGE_STATE_PRIVATE,
+ (struct svsm_ca *)boot_svsm_caa_pa,
+ boot_svsm_caa_pa);
sev_es_wr_ghcb_msr(msr);
}
@@ -80,7 +82,9 @@ void snp_set_page_shared(unsigned long paddr)
return;
msr = sev_es_rd_ghcb_msr();
- __page_state_change(paddr, paddr, SNP_PAGE_STATE_SHARED);
+ __page_state_change(paddr, paddr, SNP_PAGE_STATE_SHARED,
+ (struct svsm_ca *)boot_svsm_caa_pa,
+ boot_svsm_caa_pa);
sev_es_wr_ghcb_msr(msr);
}
@@ -109,7 +113,9 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
u64 msr = sev_es_rd_ghcb_msr();
for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
- __page_state_change(pa, pa, SNP_PAGE_STATE_PRIVATE);
+ __page_state_change(pa, pa, SNP_PAGE_STATE_PRIVATE,
+ (struct svsm_ca *)boot_svsm_caa_pa,
+ boot_svsm_caa_pa);
sev_es_wr_ghcb_msr(msr);
}
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index f553268d31d7..1f2c4feeafce 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -587,7 +587,8 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
}
}
-static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
+static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
+ struct svsm_ca *caa, u64 caa_pa)
{
struct svsm_pvalidate_call *pc;
struct svsm_call call = {};
@@ -601,10 +602,10 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
*/
flags = native_local_irq_save();
- call.caa = svsm_get_caa();
+ call.caa = caa;
pc = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
- pc_pa = svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
+ pc_pa = caa_pa + offsetof(struct svsm_ca, svsm_buffer);
pc->num_entries = 1;
pc->cur_index = 0;
@@ -627,12 +628,12 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
}
static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
- bool validate)
+ bool validate, struct svsm_ca *caa, u64 caa_pa)
{
int ret;
if (snp_vmpl) {
- svsm_pval_4k_page(paddr, validate);
+ svsm_pval_4k_page(paddr, validate, caa, caa_pa);
} else {
ret = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
if (ret)
@@ -641,7 +642,7 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
}
static void __head __page_state_change(unsigned long vaddr, unsigned long paddr,
- enum psc_op op)
+ enum psc_op op, struct svsm_ca *caa, u64 caa_pa)
{
u64 val;
@@ -650,7 +651,7 @@ static void __head __page_state_change(unsigned long vaddr, unsigned long paddr,
* state change in the RMP table.
*/
if (op == SNP_PAGE_STATE_SHARED)
- pvalidate_4k_page(vaddr, paddr, false);
+ pvalidate_4k_page(vaddr, paddr, false, caa, caa_pa);
/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
@@ -666,7 +667,7 @@ static void __head __page_state_change(unsigned long vaddr, unsigned long paddr,
* consistent with the RMP entry.
*/
if (op == SNP_PAGE_STATE_PRIVATE)
- pvalidate_4k_page(vaddr, paddr, true);
+ pvalidate_4k_page(vaddr, paddr, true, caa, caa_pa);
}
/*
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 2ffd8bf09357..e253783f6aa5 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -132,7 +132,8 @@ noinstr void __sev_put_ghcb(struct ghcb_state *state)
void __head
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
- unsigned long npages, enum psc_op op)
+ unsigned long npages, enum psc_op op,
+ struct svsm_ca *caa, u64 caa_pa)
{
unsigned long paddr_end;
@@ -142,7 +143,7 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
paddr_end = paddr + (npages << PAGE_SHIFT);
while (paddr < paddr_end) {
- __page_state_change(vaddr, paddr, op);
+ __page_state_change(vaddr, paddr, op, caa, caa_pa);
vaddr += PAGE_SIZE;
paddr += PAGE_SIZE;
@@ -165,7 +166,8 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* Ask the hypervisor to mark the memory pages as private in the RMP
* table.
*/
- early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE);
+ early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE,
+ svsm_get_caa(), svsm_get_caa_pa());
}
void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
@@ -181,7 +183,8 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
return;
/* Ask hypervisor to mark the memory pages shared in the RMP table. */
- early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED);
+ early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED,
+ svsm_get_caa(), svsm_get_caa_pa());
}
/*
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index bea67d017bf0..7a86a2fe494d 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -585,7 +585,8 @@ static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
/* Use the MSR protocol when a GHCB is not available. */
if (!boot_ghcb)
- return early_set_pages_state(vaddr, __pa(vaddr), npages, op);
+ return early_set_pages_state(vaddr, __pa(vaddr), npages, op,
+ svsm_get_caa(), svsm_get_caa_pa());
vaddr = vaddr & PAGE_MASK;
vaddr_end = vaddr + (npages << PAGE_SHIFT);
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index ffe4755962fe..3b72d8217827 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -55,7 +55,8 @@ DECLARE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DECLARE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
- unsigned long npages, enum psc_op op);
+ unsigned long npages, enum psc_op op,
+ struct svsm_ca *ca, u64 caa_pa);
DECLARE_PER_CPU(struct svsm_ca *, svsm_caa);
DECLARE_PER_CPU(u64, svsm_caa_pa);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 10/24] x86/sev: Use boot SVSM CA for all startup and init code
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (8 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 09/24] x86/sev: Pass SVSM calling area down to early page state change API Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 11/24] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
` (15 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
To avoid having to reason about whether or not to use the per-CPU SVSM
calling area when running startup and init code on the boot CPU, reuse
the boot SVSM calling area as the per-CPU area for CPU #0.
This removes the need to make the per-CPU variables and associated state
in sev_cfg accessible to the startup code once confined.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 13 ------
arch/x86/boot/startup/sev-startup.c | 7 +--
arch/x86/coco/sev/core.c | 47 +++++++++-----------
arch/x86/include/asm/sev-internal.h | 16 -------
4 files changed, 24 insertions(+), 59 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index bc23b3dabf1b..e67413bc78fd 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -37,19 +37,6 @@ struct ghcb *boot_ghcb;
#define __BOOT_COMPRESSED
-extern u64 boot_svsm_caa_pa;
-
-struct svsm_ca *svsm_get_caa(void)
-{
- /* The decompressor is mapped 1:1 so VA == PA */
- return (struct svsm_ca *)boot_svsm_caa_pa;
-}
-
-u64 svsm_get_caa_pa(void)
-{
- return boot_svsm_caa_pa;
-}
-
u8 snp_vmpl;
/* Include code for early handlers */
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index e253783f6aa5..cba4fb4cace0 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -50,9 +50,6 @@ u64 sev_secrets_pa __ro_after_init;
/* For early boot SVSM communication */
struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);
-DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
-DEFINE_PER_CPU(u64, svsm_caa_pa);
-
/*
* Nothing shall interrupt this code path while holding the per-CPU
* GHCB. The backup GHCB is only for NMIs interrupting this path.
@@ -167,7 +164,7 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* table.
*/
early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE,
- svsm_get_caa(), svsm_get_caa_pa());
+ rip_rel_ptr(&boot_svsm_ca_page), boot_svsm_caa_pa);
}
void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
@@ -184,7 +181,7 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
/* Ask hypervisor to mark the memory pages shared in the RMP table. */
early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED,
- svsm_get_caa(), svsm_get_caa_pa());
+ rip_rel_ptr(&boot_svsm_ca_page), boot_svsm_caa_pa);
}
/*
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 7a86a2fe494d..4fe0928bc0ad 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -46,6 +46,25 @@
#include <asm/cmdline.h>
#include <asm/msr.h>
+DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
+DEFINE_PER_CPU(u64, svsm_caa_pa);
+
+static inline struct svsm_ca *svsm_get_caa(void)
+{
+ if (sev_cfg.use_cas)
+ return this_cpu_read(svsm_caa);
+ else
+ return rip_rel_ptr(&boot_svsm_ca_page);
+}
+
+static inline u64 svsm_get_caa_pa(void)
+{
+ if (sev_cfg.use_cas)
+ return this_cpu_read(svsm_caa_pa);
+ else
+ return boot_svsm_caa_pa;
+}
+
/* AP INIT values as documented in the APM2 section "Processor Initialization State" */
#define AP_INIT_CS_LIMIT 0xffff
#define AP_INIT_DS_LIMIT 0xffff
@@ -1287,7 +1306,8 @@ static void __init alloc_runtime_data(int cpu)
struct svsm_ca *caa;
/* Allocate the SVSM CA page if an SVSM is present */
- caa = memblock_alloc_or_panic(sizeof(*caa), PAGE_SIZE);
+ caa = cpu ? memblock_alloc_or_panic(sizeof(*caa), PAGE_SIZE)
+ : &boot_svsm_ca_page;
per_cpu(svsm_caa, cpu) = caa;
per_cpu(svsm_caa_pa, cpu) = __pa(caa);
@@ -1341,32 +1361,9 @@ void __init sev_es_init_vc_handling(void)
init_ghcb(cpu);
}
- /* If running under an SVSM, switch to the per-cpu CA */
- if (snp_vmpl) {
- struct svsm_call call = {};
- unsigned long flags;
- int ret;
-
- local_irq_save(flags);
-
- /*
- * SVSM_CORE_REMAP_CA call:
- * RAX = 0 (Protocol=0, CallID=0)
- * RCX = New CA GPA
- */
- call.caa = svsm_get_caa();
- call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
- call.rcx = this_cpu_read(svsm_caa_pa);
- ret = svsm_perform_call_protocol(&call);
- if (ret)
- panic("Can't remap the SVSM CA, ret=%d, rax_out=0x%llx\n",
- ret, call.rax_out);
-
+ if (snp_vmpl)
sev_cfg.use_cas = true;
- local_irq_restore(flags);
- }
-
sev_es_setup_play_dead();
/* Secondary CPUs use the runtime #VC handler */
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 3b72d8217827..bdfe008120f3 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -63,22 +63,6 @@ DECLARE_PER_CPU(u64, svsm_caa_pa);
extern u64 boot_svsm_caa_pa;
-static __always_inline struct svsm_ca *svsm_get_caa(void)
-{
- if (sev_cfg.use_cas)
- return this_cpu_read(svsm_caa);
- else
- return rip_rel_ptr(&boot_svsm_ca_page);
-}
-
-static __always_inline u64 svsm_get_caa_pa(void)
-{
- if (sev_cfg.use_cas)
- return this_cpu_read(svsm_caa_pa);
- else
- return boot_svsm_caa_pa;
-}
-
enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt);
void vc_forward_exception(struct es_em_ctxt *ctxt);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 11/24] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (9 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 10/24] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
` (14 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
snp_vmpl will be assigned a non-zero value when executing at a VMPL
other than 0, and this is inferred from a call to RMPADJUST, which only
works when running at VMPL0.
This means that testing snp_vmpl is sufficient, and there is no need to
perform the same check again.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 20 +++-----------------
1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index e67413bc78fd..5fd51f51e55c 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -403,30 +403,16 @@ void sev_enable(struct boot_params *bp)
*/
if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
u64 hv_features;
- int ret;
hv_features = get_hv_features();
if (!(hv_features & GHCB_HV_FT_SNP))
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
/*
- * Enforce running at VMPL0 or with an SVSM.
- *
- * Use RMPADJUST (see the rmpadjust() function for a description of
- * what the instruction does) to update the VMPL1 permissions of a
- * page. If the guest is running at VMPL0, this will succeed. If the
- * guest is running at any other VMPL, this will fail. Linux SNP guests
- * only ever run at a single VMPL level so permission mask changes of a
- * lesser-privileged VMPL are a don't-care.
+ * Running at VMPL0 is required unless an SVSM is present and
+ * the hypervisor supports the required SVSM GHCB events.
*/
- ret = rmpadjust((unsigned long)&boot_ghcb_page, RMP_PG_SIZE_4K, 1);
-
- /*
- * Running at VMPL0 is not required if an SVSM is present and the hypervisor
- * supports the required SVSM GHCB events.
- */
- if (ret &&
- !(snp_vmpl && (hv_features & GHCB_HV_FT_SNP_MULTI_VMPL)))
+ if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
}
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (10 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 11/24] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-10 4:21 ` Nikunj A Dadhania
2025-07-09 8:08 ` [PATCH v4 13/24] x86/boot: Provide PIC aliases for 5-level paging related constants Ard Biesheuvel
` (13 subsequent siblings)
25 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
The decompressor and the core kernel both check the hypervisor feature
mask exposed by the hypervisor, but test it in slightly different ways.
This disparity seems unintentional, and simply a result of the fact that
the decompressor and the core kernel evolve differently over time when
it comes to setting up the SEV-SNP execution context.
So move the HV feature check into a helper function and call that
instead. For the core kernel, move the check to an earlier boot stage,
right after the point where it is established that the guest is
executing in SEV-SNP mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 19 +------------
arch/x86/boot/startup/sev-shared.c | 30 +++++++++++++++++---
arch/x86/boot/startup/sme.c | 2 ++
arch/x86/coco/sev/core.c | 11 -------
arch/x86/include/asm/sev-internal.h | 2 +-
arch/x86/include/asm/sev.h | 2 ++
6 files changed, 32 insertions(+), 34 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 5fd51f51e55c..4bd7b45562ed 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -397,24 +397,7 @@ void sev_enable(struct boot_params *bp)
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_PROT_UNSUPPORTED);
}
- /*
- * SNP is supported in v2 of the GHCB spec which mandates support for HV
- * features.
- */
- if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
- u64 hv_features;
-
- hv_features = get_hv_features();
- if (!(hv_features & GHCB_HV_FT_SNP))
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
-
- /*
- * Running at VMPL0 is required unless an SVSM is present and
- * the hypervisor supports the required SVSM GHCB events.
- */
- if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
- }
+ snp_check_hv_features();
if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 1f2c4feeafce..a1d27a418421 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -66,10 +66,7 @@ sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}
-/*
- * The hypervisor features are available from GHCB version 2 onward.
- */
-u64 get_hv_features(void)
+static u64 __head get_hv_features(void)
{
u64 val;
@@ -86,6 +83,31 @@ u64 get_hv_features(void)
return GHCB_MSR_HV_FT_RESP_VAL(val);
}
+u64 __head snp_check_hv_features(void)
+{
+ /*
+ * SNP is supported in v2 of the GHCB spec which mandates support for HV
+ * features.
+ */
+ if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
+ u64 hv_features;
+
+ hv_features = get_hv_features();
+ if (!(hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
+ /*
+ * Running at VMPL0 is required unless an SVSM is present and
+ * the hypervisor supports the required SVSM GHCB events.
+ */
+ if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
+
+ return hv_features;
+ }
+ return 0;
+}
+
int svsm_process_result_codes(struct svsm_call *call)
{
switch (call->rax_out) {
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 70ea1748c0a7..529090e20d2a 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -533,6 +533,8 @@ void __head sme_enable(struct boot_params *bp)
if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();
+ sev_hv_features = snp_check_hv_features();
+
/* Check if memory encryption is enabled */
if (feature_mask == AMD_SME_BIT) {
if (!(bp->hdr.xloadflags & XLF_MEM_ENCRYPTION))
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 4fe0928bc0ad..f73dea313f55 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -1344,17 +1344,6 @@ void __init sev_es_init_vc_handling(void)
if (!sev_es_check_cpu_features())
panic("SEV-ES CPU Features missing");
- /*
- * SNP is supported in v2 of the GHCB spec which mandates support for HV
- * features.
- */
- if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
- sev_hv_features = get_hv_features();
-
- if (!(sev_hv_features & GHCB_HV_FT_SNP))
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
- }
-
/* Initialize per-cpu GHCB pages */
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index bdfe008120f3..f0bfa46f3407 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -81,6 +81,6 @@ static __always_inline void sev_es_wr_ghcb_msr(u64 val)
native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
}
-u64 get_hv_features(void);
+void check_hv_features(void);
const struct snp_cpuid_table *snp_cpuid_get_table(void);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index f3acbfcdca9a..f20860187fe9 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -438,6 +438,7 @@ struct svsm_call {
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern u8 snp_vmpl;
+extern u64 sev_hv_features;
extern void __sev_es_ist_enter(struct pt_regs *regs);
extern void __sev_es_ist_exit(void);
@@ -513,6 +514,7 @@ void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __noreturn snp_abort(void);
+u64 snp_check_hv_features(void);
void snp_dmi_setup(void);
int snp_issue_svsm_attest_req(u64 call_id, struct svsm_call *call, struct svsm_attest_call *input);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 13/24] x86/boot: Provide PIC aliases for 5-level paging related constants
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (11 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 14/24] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
` (12 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Provide PIC aliases for the global variables related to 5-level paging,
so that the startup code can access them in order to populate the kernel
page tables.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/head64.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 533fcf5636fc..1bc40d0785ee 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -52,10 +52,13 @@ SYM_PIC_ALIAS(next_early_pgt);
pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
unsigned int __pgtable_l5_enabled __ro_after_init;
+SYM_PIC_ALIAS(__pgtable_l5_enabled);
unsigned int pgdir_shift __ro_after_init = 39;
EXPORT_SYMBOL(pgdir_shift);
+SYM_PIC_ALIAS(pgdir_shift);
unsigned int ptrs_per_p4d __ro_after_init = 1;
EXPORT_SYMBOL(ptrs_per_p4d);
+SYM_PIC_ALIAS(ptrs_per_p4d);
unsigned long page_offset_base __ro_after_init = __PAGE_OFFSET_BASE_L4;
EXPORT_SYMBOL(page_offset_base);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 14/24] x86/sev: Provide PIC aliases for SEV related data objects
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (12 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 13/24] x86/boot: Provide PIC aliases for 5-level paging related constants Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 15/24] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
` (11 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Provide PIC aliases for data objects that are shared between the SEV
startup code and the SEV code that executes later. This is needed so
that the confined startup code is permitted to access them.
This requires some of these variables to be moved into a source file
that is not part of the startup code, as the PIC alias is already
implied, and exporting variables in the opposite direction is not
supported.
Move ghcb_version as well, but don't provide a PIC alias as it is not
actually needed.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 3 ++
arch/x86/boot/startup/sev-shared.c | 19 -----------
arch/x86/boot/startup/sev-startup.c | 9 ------
arch/x86/coco/sev/core.c | 34 ++++++++++++++++++++
4 files changed, 37 insertions(+), 28 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 4bd7b45562ed..5e0fa165e10b 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -38,6 +38,9 @@ struct ghcb *boot_ghcb;
#define __BOOT_COMPRESSED
u8 snp_vmpl;
+u16 ghcb_version;
+
+u64 boot_svsm_caa_pa;
/* Include code for early handlers */
#include "../../boot/startup/sev-shared.c"
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index a1d27a418421..042eef7b3a2a 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -18,25 +18,6 @@
#define WARN(condition, format...) (!!(condition))
#endif
-/*
- * SVSM related information:
- * During boot, the page tables are set up as identity mapped and later
- * changed to use kernel virtual addresses. Maintain separate virtual and
- * physical addresses for the CAA to allow SVSM functions to be used during
- * early boot, both with identity mapped virtual addresses and proper kernel
- * virtual addresses.
- */
-u64 boot_svsm_caa_pa __ro_after_init;
-
-/*
- * Since feature negotiation related variables are set early in the boot
- * process they must reside in the .data section so as not to be zeroed
- * out when the .bss section is later cleared.
- *
- * GHCB protocol version negotiated with the hypervisor.
- */
-u16 ghcb_version __ro_after_init;
-
/* Copy of the SNP firmware's CPUID page. */
static struct snp_cpuid_table cpuid_table_copy __ro_after_init;
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index cba4fb4cace0..0eb31bf81899 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -41,15 +41,6 @@
#include <asm/cpuid/api.h>
#include <asm/cmdline.h>
-/* Bitmap of SEV features supported by the hypervisor */
-u64 sev_hv_features __ro_after_init;
-
-/* Secrets page physical address from the CC blob */
-u64 sev_secrets_pa __ro_after_init;
-
-/* For early boot SVSM communication */
-struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);
-
/*
* Nothing shall interrupt this code path while holding the per-CPU
* GHCB. The backup GHCB is only for NMIs interrupting this path.
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index f73dea313f55..71a76fde3102 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -46,6 +46,29 @@
#include <asm/cmdline.h>
#include <asm/msr.h>
+/* Bitmap of SEV features supported by the hypervisor */
+u64 sev_hv_features __ro_after_init;
+SYM_PIC_ALIAS(sev_hv_features);
+
+/* Secrets page physical address from the CC blob */
+u64 sev_secrets_pa __ro_after_init;
+SYM_PIC_ALIAS(sev_secrets_pa);
+
+/* For early boot SVSM communication */
+struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);
+SYM_PIC_ALIAS(boot_svsm_ca_page);
+
+/*
+ * SVSM related information:
+ * During boot, the page tables are set up as identity mapped and later
+ * changed to use kernel virtual addresses. Maintain separate virtual and
+ * physical addresses for the CAA to allow SVSM functions to be used during
+ * early boot, both with identity mapped virtual addresses and proper kernel
+ * virtual addresses.
+ */
+u64 boot_svsm_caa_pa __ro_after_init;
+SYM_PIC_ALIAS(boot_svsm_caa_pa);
+
DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
DEFINE_PER_CPU(u64, svsm_caa_pa);
@@ -119,6 +142,17 @@ DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
*/
u8 snp_vmpl __ro_after_init;
EXPORT_SYMBOL_GPL(snp_vmpl);
+SYM_PIC_ALIAS(snp_vmpl);
+
+/*
+ * Since feature negotiation related variables are set early in the boot
+ * process they must reside in the .data section so as not to be zeroed
+ * out when the .bss section is later cleared.
+ *
+ * GHCB protocol version negotiated with the hypervisor.
+ */
+u16 ghcb_version __ro_after_init;
+SYM_PIC_ALIAS(ghcb_version);
/* For early boot hypervisor communication in SEV-ES enabled guests */
static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 15/24] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (13 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 14/24] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 16/24] x86/sev: Export startup routines for later use Ard Biesheuvel
` (10 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Rename sev-nmi.c to noinstr.c, and move the get/put GHCB routines
into it too, which are also annotated as 'noinstr' and suffer from the
same problem as the NMI code, i.e., that GCC may ignore the
__no_sanitize_address__ function attribute implied by 'noinstr' and
insert KASAN instrumentation anyway.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/sev-startup.c | 74 --------------------
arch/x86/coco/sev/Makefile | 6 +-
arch/x86/coco/sev/{sev-nmi.c => noinstr.c} | 74 ++++++++++++++++++++
3 files changed, 77 insertions(+), 77 deletions(-)
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 0eb31bf81899..0fd80e63b639 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -41,83 +41,9 @@
#include <asm/cpuid/api.h>
#include <asm/cmdline.h>
-/*
- * Nothing shall interrupt this code path while holding the per-CPU
- * GHCB. The backup GHCB is only for NMIs interrupting this path.
- *
- * Callers must disable local interrupts around it.
- */
-noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
-{
- struct sev_es_runtime_data *data;
- struct ghcb *ghcb;
-
- WARN_ON(!irqs_disabled());
-
- data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;
-
- if (unlikely(data->ghcb_active)) {
- /* GHCB is already in use - save its contents */
-
- if (unlikely(data->backup_ghcb_active)) {
- /*
- * Backup-GHCB is also already in use. There is no way
- * to continue here so just kill the machine. To make
- * panic() work, mark GHCBs inactive so that messages
- * can be printed out.
- */
- data->ghcb_active = false;
- data->backup_ghcb_active = false;
-
- instrumentation_begin();
- panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
- instrumentation_end();
- }
-
- /* Mark backup_ghcb active before writing to it */
- data->backup_ghcb_active = true;
-
- state->ghcb = &data->backup_ghcb;
-
- /* Backup GHCB content */
- *state->ghcb = *ghcb;
- } else {
- state->ghcb = NULL;
- data->ghcb_active = true;
- }
-
- return ghcb;
-}
-
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"
-noinstr void __sev_put_ghcb(struct ghcb_state *state)
-{
- struct sev_es_runtime_data *data;
- struct ghcb *ghcb;
-
- WARN_ON(!irqs_disabled());
-
- data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;
-
- if (state->ghcb) {
- /* Restore GHCB from Backup */
- *ghcb = *state->ghcb;
- data->backup_ghcb_active = false;
- state->ghcb = NULL;
- } else {
- /*
- * Invalidate the GHCB so a VMGEXIT instruction issued
- * from userspace won't appear to be valid.
- */
- vc_ghcb_invalidate(ghcb);
- data->ghcb_active = false;
- }
-}
-
void __head
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op,
diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
index db3255b979bd..53e964a22759 100644
--- a/arch/x86/coco/sev/Makefile
+++ b/arch/x86/coco/sev/Makefile
@@ -1,9 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
-obj-y += core.o sev-nmi.o vc-handle.o
+obj-y += core.o noinstr.o vc-handle.o
# Clang 14 and older may fail to respect __no_sanitize_undefined when inlining
-UBSAN_SANITIZE_sev-nmi.o := n
+UBSAN_SANITIZE_noinstr.o := n
# GCC may fail to respect __no_sanitize_address when inlining
-KASAN_SANITIZE_sev-nmi.o := n
+KASAN_SANITIZE_noinstr.o := n
diff --git a/arch/x86/coco/sev/sev-nmi.c b/arch/x86/coco/sev/noinstr.c
similarity index 61%
rename from arch/x86/coco/sev/sev-nmi.c
rename to arch/x86/coco/sev/noinstr.c
index d8dfaddfb367..b527eafb6312 100644
--- a/arch/x86/coco/sev/sev-nmi.c
+++ b/arch/x86/coco/sev/noinstr.c
@@ -106,3 +106,77 @@ void noinstr __sev_es_nmi_complete(void)
__sev_put_ghcb(&state);
}
+
+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (unlikely(data->ghcb_active)) {
+ /* GHCB is already in use - save its contents */
+
+ if (unlikely(data->backup_ghcb_active)) {
+ /*
+ * Backup-GHCB is also already in use. There is no way
+ * to continue here so just kill the machine. To make
+ * panic() work, mark GHCBs inactive so that messages
+ * can be printed out.
+ */
+ data->ghcb_active = false;
+ data->backup_ghcb_active = false;
+
+ instrumentation_begin();
+ panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
+ instrumentation_end();
+ }
+
+ /* Mark backup_ghcb active before writing to it */
+ data->backup_ghcb_active = true;
+
+ state->ghcb = &data->backup_ghcb;
+
+ /* Backup GHCB content */
+ *state->ghcb = *ghcb;
+ } else {
+ state->ghcb = NULL;
+ data->ghcb_active = true;
+ }
+
+ return ghcb;
+}
+
+noinstr void __sev_put_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (state->ghcb) {
+ /* Restore GHCB from Backup */
+ *ghcb = *state->ghcb;
+ data->backup_ghcb_active = false;
+ state->ghcb = NULL;
+ } else {
+ /*
+ * Invalidate the GHCB so a VMGEXIT instruction issued
+ * from userspace won't appear to be valid.
+ */
+ vc_ghcb_invalidate(ghcb);
+ data->ghcb_active = false;
+ }
+}
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 16/24] x86/sev: Export startup routines for later use
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (14 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 15/24] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 17/24] objtool: Add action to check for absence of absolute relocations Ard Biesheuvel
` (9 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Create aliases that expose routines that are part of the startup code to
other code in the core kernel, so that they can be called later as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/exports.h | 14 ++++++++++++++
arch/x86/kernel/vmlinux.lds.S | 2 ++
2 files changed, 16 insertions(+)
diff --git a/arch/x86/boot/startup/exports.h b/arch/x86/boot/startup/exports.h
new file mode 100644
index 000000000000..01d2363dc445
--- /dev/null
+++ b/arch/x86/boot/startup/exports.h
@@ -0,0 +1,14 @@
+
+/*
+ * The symbols below are functions that are implemented by the startup code,
+ * but called at runtime by the SEV code residing in the core kernel.
+ */
+PROVIDE(early_set_pages_state = __pi_early_set_pages_state);
+PROVIDE(early_snp_set_memory_private = __pi_early_snp_set_memory_private);
+PROVIDE(early_snp_set_memory_shared = __pi_early_snp_set_memory_shared);
+PROVIDE(get_hv_features = __pi_get_hv_features);
+PROVIDE(sev_es_terminate = __pi_sev_es_terminate);
+PROVIDE(snp_cpuid = __pi_snp_cpuid);
+PROVIDE(snp_cpuid_get_table = __pi_snp_cpuid_get_table);
+PROVIDE(svsm_issue_call = __pi_svsm_issue_call);
+PROVIDE(svsm_process_result_codes = __pi_svsm_process_result_codes);
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4fa0be732af1..5d5e3a95e1f9 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -535,3 +535,5 @@ xen_elfnote_entry_value =
xen_elfnote_phys32_entry_value =
ABSOLUTE(xen_elfnote_phys32_entry) + ABSOLUTE(pvh_start_xen - LOAD_OFFSET);
#endif
+
+#include "../boot/startup/exports.h"
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 17/24] objtool: Add action to check for absence of absolute relocations
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (15 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 16/24] x86/sev: Export startup routines for later use Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 18/24] x86/boot: Check startup code " Ard Biesheuvel
` (8 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
The x86 startup code must not use absolute references to code or data,
as it executes before the kernel virtual mapping is up.
Add an action to objtool to check all allocatable sections (with the
exception of __patchable_function_entries, which uses absolute
references for nebulous reasons) and raise an error if any absolute
references are found.
Note that debug sections typically contain lots of absolute references
too, but those are not allocatable so they will be ignored.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
tools/objtool/builtin-check.c | 2 ++
tools/objtool/check.c | 36 ++++++++++++++++++++
tools/objtool/include/objtool/builtin.h | 1 +
3 files changed, 39 insertions(+)
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index 80239843e9f0..0f6b197cfcb0 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -87,6 +87,7 @@ static const struct option check_options[] = {
OPT_BOOLEAN('t', "static-call", &opts.static_call, "annotate static calls"),
OPT_BOOLEAN('u', "uaccess", &opts.uaccess, "validate uaccess rules for SMAP"),
OPT_BOOLEAN(0 , "cfi", &opts.cfi, "annotate kernel control flow integrity (kCFI) function preambles"),
+ OPT_BOOLEAN(0 , "noabs", &opts.noabs, "reject absolute references in allocatable sections"),
OPT_CALLBACK_OPTARG(0, "dump", NULL, NULL, "orc", "dump metadata", parse_dump),
OPT_GROUP("Options:"),
@@ -162,6 +163,7 @@ static bool opts_valid(void)
opts.hack_noinstr ||
opts.ibt ||
opts.mcount ||
+ opts.noabs ||
opts.noinstr ||
opts.orc ||
opts.retpoline ||
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index d967ac001498..5d1d38404892 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -4643,6 +4643,39 @@ static void disas_warned_funcs(struct objtool_file *file)
disas_funcs(funcs);
}
+static int check_abs_references(struct objtool_file *file)
+{
+ struct section *sec;
+ struct reloc *reloc;
+ int ret = 0;
+
+ for_each_sec(file, sec) {
+ /* absolute references in non-loadable sections are fine */
+ if (!(sec->sh.sh_flags & SHF_ALLOC))
+ continue;
+
+ /* section must have an associated .rela section */
+ if (!sec->rsec)
+ continue;
+
+ /*
+ * Special case for compiler generated metadata that is not
+ * consumed until after boot.
+ */
+ if (!strcmp(sec->name, "__patchable_function_entries"))
+ continue;
+
+ for_each_reloc(sec->rsec, reloc) {
+ if (reloc_type(reloc) == R_ABS64) {
+ WARN("section %s has absolute relocation at offset 0x%lx",
+ sec->name, reloc_offset(reloc));
+ ret++;
+ }
+ }
+ }
+ return ret;
+}
+
struct insn_chunk {
void *addr;
struct insn_chunk *next;
@@ -4776,6 +4809,9 @@ int check(struct objtool_file *file)
goto out;
}
+ if (opts.noabs)
+ warnings += check_abs_references(file);
+
if (opts.orc && nr_insns) {
ret = orc_create(file);
if (ret)
diff --git a/tools/objtool/include/objtool/builtin.h b/tools/objtool/include/objtool/builtin.h
index 6b08666fa69d..ab22673862e1 100644
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -26,6 +26,7 @@ struct opts {
bool uaccess;
int prefix;
bool cfi;
+ bool noabs;
/* options: */
bool backtrace;
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 18/24] x86/boot: Check startup code for absence of absolute relocations
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (16 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 17/24] objtool: Add action to check for absence of absolute relocations Ard Biesheuvel
@ 2025-07-09 8:08 ` Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 19/24] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
` (7 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Invoke objtool on each startup code object individually to check for the
absence of absolute relocations. This is needed because this code will
be invoked from the 1:1 mapping of memory before those absolute virtual
addresses (which are derived from the kernel virtual base address
provided to the linker and possibly shifted at boot) are mapped.
Only objects built under arch/x86/boot/startup/ have this restriction,
and once they have been incorporated into vmlinux.o, this distinction is
difficult to make. So force the invocation of objtool for each object
file individually, even if objtool is deferred to vmlinux.o for the rest
of the build. In the latter case, only pass --noabs and nothing else;
otherwise, append it to the existing objtool command line.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/Makefile | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile
index b514f7e81332..32737f4ab5a8 100644
--- a/arch/x86/boot/startup/Makefile
+++ b/arch/x86/boot/startup/Makefile
@@ -19,6 +19,7 @@ KCOV_INSTRUMENT := n
obj-$(CONFIG_X86_64) += gdt_idt.o map_kernel.o
obj-$(CONFIG_AMD_MEM_ENCRYPT) += sme.o sev-startup.o
+pi-objs := $(patsubst %.o,$(obj)/%.o,$(obj-y))
lib-$(CONFIG_X86_64) += la57toggle.o
lib-$(CONFIG_EFI_MIXED) += efi-mixed.o
@@ -28,3 +29,10 @@ lib-$(CONFIG_EFI_MIXED) += efi-mixed.o
# to be linked into the decompressor or the EFI stub but not vmlinux
#
$(patsubst %.o,$(obj)/%.o,$(lib-y)): OBJECT_FILES_NON_STANDARD := y
+
+#
+# Invoke objtool for each object individually to check for absolute
+# relocations, even if other objtool actions are being deferred.
+#
+$(pi-objs): objtool-enabled = 1
+$(pi-objs): objtool-args = $(if $(delay-objtool),,$(objtool-args-y)) --noabs
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 19/24] x86/boot: Revert "Reject absolute references in .head.text"
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (17 preceding siblings ...)
2025-07-09 8:08 ` [PATCH v4 18/24] x86/boot: Check startup code " Ard Biesheuvel
@ 2025-07-09 8:09 ` Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 20/24] x86/kbuild: Incorporate boot/startup/ via Kbuild makefile Ard Biesheuvel
` (6 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:09 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
This reverts commit
faf0ed487415 ("x86/boot: Reject absolute references in .head.text")
The startup code is checked directly for the absence of absolute symbol
references, so checking the .head.text section in the relocs tool is no
longer needed.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/tools/relocs.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 5778bc498415..e5a2b9a912d1 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -740,10 +740,10 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
const char *symname)
{
- int headtext = !strcmp(sec_name(sec->shdr.sh_info), ".head.text");
unsigned r_type = ELF64_R_TYPE(rel->r_info);
ElfW(Addr) offset = rel->r_offset;
int shn_abs = (sym->st_shndx == SHN_ABS) && !is_reloc(S_REL, symname);
+
if (sym->st_shndx == SHN_UNDEF)
return 0;
@@ -783,12 +783,6 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
break;
}
- if (headtext) {
- die("Absolute reference to symbol '%s' not permitted in .head.text\n",
- symname);
- break;
- }
-
/*
* Relocation offsets for 64 bit kernels are output
* as 32 bits and sign extended back to 64 bits when
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 20/24] x86/kbuild: Incorporate boot/startup/ via Kbuild makefile
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (18 preceding siblings ...)
2025-07-09 8:09 ` [PATCH v4 19/24] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
@ 2025-07-09 8:09 ` Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 21/24] x86/boot: Create a confined code area for startup code Ard Biesheuvel
` (5 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:09 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Using core-y is not the correct way to get kbuild to descend into
arch/x86/boot/startup. For instance, building an individual object does
not work as expected when the pattern rule is local to the Makefile
$ make arch/x86/boot/startup/map_kernel.pi.o
GEN Makefile
CALL /home/ardb/linux/scripts/checksyscalls.sh
DESCEND objtool
INSTALL libsubcmd_headers
make[3]: *** No rule to make target 'arch/x86/boot/startup/map_kernel.pi.o'. Stop.
make[2]: *** [/home/ardb/linux/scripts/Makefile.build:461: arch/x86] Error 2
make[1]: *** [/home/ardb/linux/Makefile:2011: .] Error 2
make: *** [/home/ardb/linux/Makefile:248: __sub-make] Error 2
So use obj-y from arch.x86/Kbuild instead, which makes things work as
expected.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/Kbuild | 2 ++
arch/x86/Makefile | 1 -
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/Kbuild b/arch/x86/Kbuild
index f7fb3d88c57b..36b985d0e7bf 100644
--- a/arch/x86/Kbuild
+++ b/arch/x86/Kbuild
@@ -3,6 +3,8 @@
# Branch profiling isn't noinstr-safe. Disable it for arch/x86/*
subdir-ccflags-$(CONFIG_TRACE_BRANCH_PROFILING) += -DDISABLE_BRANCH_PROFILING
+obj-y += boot/startup/
+
obj-$(CONFIG_ARCH_HAS_CC_PLATFORM) += coco/
obj-y += entry/
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1913d342969b..9b76e77ff7f7 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -286,7 +286,6 @@ archprepare: $(cpufeaturemasks.hdr)
###
# Kernel objects
-core-y += arch/x86/boot/startup/
libs-y += arch/x86/lib/
# drivers-y are linked after core-y
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 21/24] x86/boot: Create a confined code area for startup code
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (19 preceding siblings ...)
2025-07-09 8:09 ` [PATCH v4 20/24] x86/kbuild: Incorporate boot/startup/ via Kbuild makefile Ard Biesheuvel
@ 2025-07-09 8:09 ` Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 22/24] efistub/x86: Remap inittext read-execute when needed Ard Biesheuvel
` (4 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:09 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
In order to be able to have tight control over which code may execute
from the early 1:1 mapping of memory, but still link vmlinux as a single
executable, prefix all symbol references in startup code with __pi_, and
invoke it from outside using the __pi_ prefix.
Use objtool to check that no absolute symbol references are present in
the startup code, as these cannot be used from code running from the 1:1
mapping.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/Makefile | 14 ++++++++++++++
arch/x86/boot/startup/sev-shared.c | 4 +---
arch/x86/boot/startup/sme.c | 1 -
arch/x86/coco/sev/core.c | 2 +-
arch/x86/include/asm/setup.h | 1 +
arch/x86/include/asm/sev.h | 1 +
arch/x86/kernel/head64.c | 2 +-
arch/x86/kernel/head_64.S | 8 ++++----
arch/x86/mm/mem_encrypt_boot.S | 6 +++---
tools/objtool/check.c | 3 ++-
10 files changed, 28 insertions(+), 14 deletions(-)
diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile
index 32737f4ab5a8..e8fdf020b422 100644
--- a/arch/x86/boot/startup/Makefile
+++ b/arch/x86/boot/startup/Makefile
@@ -4,6 +4,7 @@ KBUILD_AFLAGS += -D__DISABLE_EXPORTS
KBUILD_CFLAGS += -D__DISABLE_EXPORTS -mcmodel=small -fPIC \
-Os -DDISABLE_BRANCH_PROFILING \
$(DISABLE_STACKLEAK_PLUGIN) \
+ $(DISABLE_LATENT_ENTROPY_PLUGIN) \
-fno-stack-protector -D__NO_FORTIFY \
-fno-jump-tables \
-include $(srctree)/include/linux/hidden.h
@@ -36,3 +37,16 @@ $(patsubst %.o,$(obj)/%.o,$(lib-y)): OBJECT_FILES_NON_STANDARD := y
#
$(pi-objs): objtool-enabled = 1
$(pi-objs): objtool-args = $(if $(delay-objtool),,$(objtool-args-y)) --noabs
+
+#
+# Confine the startup code by prefixing all symbols with __pi_ (for position
+# independent). This ensures that startup code can only call other startup
+# code, or code that has explicitly been made accessible to it via a symbol
+# alias.
+#
+$(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_
+$(obj)/%.pi.o: $(obj)/%.o FORCE
+ $(call if_changed,objcopy)
+
+targets += $(obj-y)
+obj-y := $(patsubst %.o,%.pi.o,$(obj-y))
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 042eef7b3a2a..223eafa56d9d 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -11,9 +11,7 @@
#include <asm/setup_data.h>
-#ifndef __BOOT_COMPRESSED
-#define error(v) pr_err(v)
-#else
+#ifdef __BOOT_COMPRESSED
#undef WARN
#define WARN(condition, format...) (!!(condition))
#endif
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 529090e20d2a..56dde8c53139 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -569,7 +569,6 @@ void __head sme_enable(struct boot_params *bp)
#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
/* Local version for startup code, which never operates on user page tables */
-__weak
pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
{
return pgd;
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 71a76fde3102..c5d70dd835c7 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -272,7 +272,7 @@ static int svsm_perform_call_protocol(struct svsm_call *call)
do {
ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
- : svsm_perform_msr_protocol(call);
+ : __pi_svsm_perform_msr_protocol(call);
} while (ret == -EAGAIN);
if (sev_cfg.ghcbs_initialized)
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 692af46603a1..914eb32581c7 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -53,6 +53,7 @@ extern void i386_reserve_resources(void);
extern unsigned long __startup_64(unsigned long p2v_offset, struct boot_params *bp);
extern void startup_64_setup_gdt_idt(void);
extern void startup_64_load_idt(void *vc_handler);
+extern void __pi_startup_64_load_idt(void *vc_handler);
extern void early_setup_idt(void);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index f20860187fe9..e50736d15e02 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -553,6 +553,7 @@ struct cpuid_leaf {
};
int svsm_perform_msr_protocol(struct svsm_call *call);
+int __pi_svsm_perform_msr_protocol(struct svsm_call *call);
int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
void *ctx, struct cpuid_leaf *leaf);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 1bc40d0785ee..fd28b53dbac5 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -319,5 +319,5 @@ void early_setup_idt(void)
handler = vc_boot_ghcb;
}
- startup_64_load_idt(handler);
+ __pi_startup_64_load_idt(handler);
}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 3e9b3a3bd039..d219963ecb60 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -71,7 +71,7 @@ SYM_CODE_START_NOALIGN(startup_64)
xorl %edx, %edx
wrmsr
- call startup_64_setup_gdt_idt
+ call __pi_startup_64_setup_gdt_idt
/* Now switch to __KERNEL_CS so IRET works reliably */
pushq $__KERNEL_CS
@@ -91,7 +91,7 @@ SYM_CODE_START_NOALIGN(startup_64)
* subsequent code. Pass the boot_params pointer as the first argument.
*/
movq %r15, %rdi
- call sme_enable
+ call __pi_sme_enable
#endif
/* Sanitize CPU configuration */
@@ -111,7 +111,7 @@ SYM_CODE_START_NOALIGN(startup_64)
* programmed into CR3.
*/
movq %r15, %rsi
- call __startup_64
+ call __pi___startup_64
/* Form the CR3 value being sure to include the CR3 modifier */
leaq early_top_pgt(%rip), %rcx
@@ -562,7 +562,7 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb)
/* Call C handler */
movq %rsp, %rdi
movq ORIG_RAX(%rsp), %rsi
- call do_vc_no_ghcb
+ call __pi_do_vc_no_ghcb
/* Unwind pt_regs */
POP_REGS
diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
index f8a33b25ae86..edbf9c998848 100644
--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -16,7 +16,7 @@
.text
.code64
-SYM_FUNC_START(sme_encrypt_execute)
+SYM_FUNC_START(__pi_sme_encrypt_execute)
/*
* Entry parameters:
@@ -69,9 +69,9 @@ SYM_FUNC_START(sme_encrypt_execute)
ANNOTATE_UNRET_SAFE
ret
int3
-SYM_FUNC_END(sme_encrypt_execute)
+SYM_FUNC_END(__pi_sme_encrypt_execute)
-SYM_FUNC_START(__enc_copy)
+SYM_FUNC_START_LOCAL(__enc_copy)
ANNOTATE_NOENDBR
/*
* Routine used to encrypt memory in place.
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 5d1d38404892..f43bd598d928 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -3563,7 +3563,8 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (func && insn_func(insn) && func != insn_func(insn)->pfunc) {
/* Ignore KCFI type preambles, which always fall through */
if (!strncmp(func->name, "__cfi_", 6) ||
- !strncmp(func->name, "__pfx_", 6))
+ !strncmp(func->name, "__pfx_", 6) ||
+ !strncmp(func->name, "__pi___pfx_", 11))
return 0;
if (file->ignore_unreachables)
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 22/24] efistub/x86: Remap inittext read-execute when needed
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (20 preceding siblings ...)
2025-07-09 8:09 ` [PATCH v4 21/24] x86/boot: Create a confined code area for startup code Ard Biesheuvel
@ 2025-07-09 8:09 ` Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 23/24] x86/boot: Move startup code out of __head section Ard Biesheuvel
` (3 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:09 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Recent EFI x86 systems are more strict when it comes to mapping boot
images, and require that mappings are either read-write or read-execute.
Now that the boot code is being cleaned up and refactored, most of it is
being moved into .init.text [where it arguably belongs] but that implies
that when booting on such strict EFI firmware, we need to take care to
map .init.text (and the .altinstr_aux section that follows it)
read-execute as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 2 ++
arch/x86/include/asm/boot.h | 2 ++
arch/x86/kernel/vmlinux.lds.S | 2 ++
drivers/firmware/efi/libstub/x86-stub.c | 4 +++-
5 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 3a38fdcdb9bd..74657589264d 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -73,7 +73,7 @@ LDFLAGS_vmlinux += -T
hostprogs := mkpiggy
HOST_EXTRACFLAGS += -I$(srctree)/tools/include
-sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__start_rodata\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
+sed-voffset := -e 's/^\([0-9a-fA-F]*\) [ABbCDGRSTtVW] \(_text\|__start_rodata\|_sinittext\|__inittext_end\|__bss_start\|_end\)$$/\#define VO_\2 _AC(0x\1,UL)/p'
quiet_cmd_voffset = VOFFSET $@
cmd_voffset = $(NM) $< | sed -n $(sed-voffset) > $@
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 94b5991da001..0f41ca0e52c0 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -332,6 +332,8 @@ static size_t parse_elf(void *output)
}
const unsigned long kernel_text_size = VO___start_rodata - VO__text;
+const unsigned long kernel_inittext_offset = VO__sinittext - VO__text;
+const unsigned long kernel_inittext_size = VO___inittext_end - VO__sinittext;
const unsigned long kernel_total_size = VO__end - VO__text;
static u8 boot_heap[BOOT_HEAP_SIZE] __aligned(4);
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 02b23aa78955..f7b67cb73915 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -82,6 +82,8 @@
#ifndef __ASSEMBLER__
extern unsigned int output_len;
extern const unsigned long kernel_text_size;
+extern const unsigned long kernel_inittext_offset;
+extern const unsigned long kernel_inittext_size;
extern const unsigned long kernel_total_size;
unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr,
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 5d5e3a95e1f9..4277efb26358 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -227,6 +227,8 @@ SECTIONS
*/
.altinstr_aux : AT(ADDR(.altinstr_aux) - LOAD_OFFSET) {
*(.altinstr_aux)
+ . = ALIGN(PAGE_SIZE);
+ __inittext_end = .;
}
INIT_DATA_SECTION(16)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index cafc90d4caaf..0d05eac7c72b 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -788,7 +788,9 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry,
*kernel_entry = addr + entry;
- return efi_adjust_memory_range_protection(addr, kernel_text_size);
+ return efi_adjust_memory_range_protection(addr, kernel_text_size) ?:
+ efi_adjust_memory_range_protection(addr + kernel_inittext_offset,
+ kernel_inittext_size);
}
static void __noreturn enter_kernel(unsigned long kernel_addr,
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 23/24] x86/boot: Move startup code out of __head section
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (21 preceding siblings ...)
2025-07-09 8:09 ` [PATCH v4 22/24] efistub/x86: Remap inittext read-execute when needed Ard Biesheuvel
@ 2025-07-09 8:09 ` Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 24/24] x86/boot: Get rid of the .head.text section Ard Biesheuvel
` (2 subsequent siblings)
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:09 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
Move startup code out of the __head section, now that this no longer has
a special significance. Move everything into .text or .init.text as
appropriate, so that startup code is not kept around unnecessarily.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 3 --
arch/x86/boot/startup/gdt_idt.c | 4 +--
arch/x86/boot/startup/map_kernel.c | 4 +--
arch/x86/boot/startup/sev-shared.c | 38 ++++++++++----------
arch/x86/boot/startup/sev-startup.c | 14 ++++----
arch/x86/boot/startup/sme.c | 26 +++++++-------
arch/x86/include/asm/init.h | 6 ----
arch/x86/kernel/head_32.S | 2 +-
arch/x86/kernel/head_64.S | 2 +-
arch/x86/platform/pvh/head.S | 2 +-
10 files changed, 46 insertions(+), 55 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 5e0fa165e10b..6521530fb701 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -32,9 +32,6 @@ struct ghcb *boot_ghcb;
#undef __init
#define __init
-#undef __head
-#define __head
-
#define __BOOT_COMPRESSED
u8 snp_vmpl;
diff --git a/arch/x86/boot/startup/gdt_idt.c b/arch/x86/boot/startup/gdt_idt.c
index a3112a69b06a..d16102abdaec 100644
--- a/arch/x86/boot/startup/gdt_idt.c
+++ b/arch/x86/boot/startup/gdt_idt.c
@@ -24,7 +24,7 @@
static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_data;
/* This may run while still in the direct mapping */
-void __head startup_64_load_idt(void *vc_handler)
+void startup_64_load_idt(void *vc_handler)
{
struct desc_ptr desc = {
.address = (unsigned long)rip_rel_ptr(bringup_idt_table),
@@ -46,7 +46,7 @@ void __head startup_64_load_idt(void *vc_handler)
/*
* Setup boot CPU state needed before kernel switches to virtual addresses.
*/
-void __head startup_64_setup_gdt_idt(void)
+void __init startup_64_setup_gdt_idt(void)
{
struct gdt_page *gp = rip_rel_ptr((void *)(__force unsigned long)&gdt_page);
void *handler = NULL;
diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map_kernel.c
index 332dbe6688c4..83ba98d61572 100644
--- a/arch/x86/boot/startup/map_kernel.c
+++ b/arch/x86/boot/startup/map_kernel.c
@@ -30,7 +30,7 @@ static inline bool check_la57_support(void)
return true;
}
-static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
+static unsigned long __init sme_postprocess_startup(struct boot_params *bp,
pmdval_t *pmd,
unsigned long p2v_offset)
{
@@ -84,7 +84,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
* the 1:1 mapping of memory. Kernel virtual addresses can be determined by
* subtracting p2v_offset from the RIP-relative address.
*/
-unsigned long __head __startup_64(unsigned long p2v_offset,
+unsigned long __init __startup_64(unsigned long p2v_offset,
struct boot_params *bp)
{
pmd_t (*early_pgts)[PTRS_PER_PMD] = rip_rel_ptr(early_dynamic_pgts);
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 223eafa56d9d..ce11bab57d4f 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -29,7 +29,7 @@ static u32 cpuid_std_range_max __ro_after_init;
static u32 cpuid_hyp_range_max __ro_after_init;
static u32 cpuid_ext_range_max __ro_after_init;
-void __head __noreturn
+void __noreturn
sev_es_terminate(unsigned int set, unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;
@@ -45,7 +45,7 @@ sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}
-static u64 __head get_hv_features(void)
+static u64 __init get_hv_features(void)
{
u64 val;
@@ -62,7 +62,7 @@ static u64 __head get_hv_features(void)
return GHCB_MSR_HV_FT_RESP_VAL(val);
}
-u64 __head snp_check_hv_features(void)
+u64 __init snp_check_hv_features(void)
{
/*
* SNP is supported in v2 of the GHCB spec which mandates support for HV
@@ -240,7 +240,7 @@ const struct snp_cpuid_table *snp_cpuid_get_table(void)
*
* Return: XSAVE area size on success, 0 otherwise.
*/
-static u32 __head snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
+static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
u64 xfeatures_found = 0;
@@ -276,7 +276,7 @@ static u32 __head snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
return xsave_size;
}
-static bool __head
+static bool
snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -318,8 +318,8 @@ static void snp_cpuid_msr_prot(void *ctx, struct cpuid_leaf *leaf)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
}
-static int __head snp_cpuid_postprocess(void (*cpuid)(void *ctx, struct cpuid_leaf *),
- void *ctx, struct cpuid_leaf *leaf)
+static int snp_cpuid_postprocess(void (*cpuid)(void *ctx, struct cpuid_leaf *),
+ void *ctx, struct cpuid_leaf *leaf)
{
struct cpuid_leaf leaf_hv = *leaf;
@@ -413,8 +413,8 @@ static int __head snp_cpuid_postprocess(void (*cpuid)(void *ctx, struct cpuid_le
* Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
* should be treated as fatal by caller.
*/
-int __head snp_cpuid(void (*cpuid)(void *ctx, struct cpuid_leaf *), void *ctx,
- struct cpuid_leaf *leaf)
+int snp_cpuid(void (*cpuid)(void *ctx, struct cpuid_leaf *), void *ctx,
+ struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -456,7 +456,7 @@ int __head snp_cpuid(void (*cpuid)(void *ctx, struct cpuid_leaf *), void *ctx,
* page yet, so it only supports the MSR based communication with the
* hypervisor and only the CPUID exit-code.
*/
-void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
+void do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
unsigned int subfn = lower_bits(regs->cx, 32);
unsigned int fn = lower_bits(regs->ax, 32);
@@ -532,7 +532,7 @@ struct cc_setup_data {
* Search for a Confidential Computing blob passed in as a setup_data entry
* via the Linux Boot Protocol.
*/
-static __head
+static __init
struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
{
struct cc_setup_data *sd = NULL;
@@ -560,7 +560,7 @@ struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
* mapping needs to be updated in sync with all the changes to virtual memory
* layout and related mapping facilities throughout the boot process.
*/
-static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
+static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
{
const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
int i;
@@ -588,8 +588,8 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
}
}
-static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
- struct svsm_ca *caa, u64 caa_pa)
+static void svsm_pval_4k_page(unsigned long paddr, bool validate,
+ struct svsm_ca *caa, u64 caa_pa)
{
struct svsm_pvalidate_call *pc;
struct svsm_call call = {};
@@ -628,8 +628,8 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
native_local_irq_restore(flags);
}
-static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
- bool validate, struct svsm_ca *caa, u64 caa_pa)
+static void pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
+ bool validate, struct svsm_ca *caa, u64 caa_pa)
{
int ret;
@@ -642,8 +642,8 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
}
}
-static void __head __page_state_change(unsigned long vaddr, unsigned long paddr,
- enum psc_op op, struct svsm_ca *caa, u64 caa_pa)
+static void __page_state_change(unsigned long vaddr, unsigned long paddr,
+ enum psc_op op, struct svsm_ca *caa, u64 caa_pa)
{
u64 val;
@@ -675,7 +675,7 @@ static void __head __page_state_change(unsigned long vaddr, unsigned long paddr,
* Maintain the GPA of the SVSM Calling Area (CA) in order to utilize the SVSM
* services needed when not running in VMPL0.
*/
-static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
+static bool __init svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
void *page)
{
struct snp_secrets_page *secrets_page;
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 0fd80e63b639..f2fb5674283d 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -44,7 +44,7 @@
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"
-void __head
+void __init
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op,
struct svsm_ca *caa, u64 caa_pa)
@@ -64,7 +64,7 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
}
}
-void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned long npages)
{
/*
@@ -84,7 +84,7 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
rip_rel_ptr(&boot_svsm_ca_page), boot_svsm_caa_pa);
}
-void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned long npages)
{
/*
@@ -114,7 +114,7 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
*
* Scan for the blob in that order.
*/
-static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
+static struct cc_blob_sev_info *__init find_cc_blob(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;
@@ -140,7 +140,7 @@ static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
return cc_info;
}
-static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
+static void __init svsm_setup(struct cc_blob_sev_info *cc_info)
{
struct snp_secrets_page *secrets = (void *)cc_info->secrets_phys;
struct svsm_call call = {};
@@ -183,7 +183,7 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
boot_svsm_caa_pa = pa;
}
-bool __head snp_init(struct boot_params *bp)
+bool __init snp_init(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;
@@ -212,7 +212,7 @@ bool __head snp_init(struct boot_params *bp)
return true;
}
-void __head __noreturn snp_abort(void)
+void __init __noreturn snp_abort(void)
{
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
}
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 56dde8c53139..c3eff6d5102c 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -91,7 +91,7 @@ struct sme_populate_pgd_data {
*/
static char sme_workarea[2 * PMD_SIZE] __section(".init.scratch");
-static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd)
+static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
{
unsigned long pgd_start, pgd_end, pgd_size;
pgd_t *pgd_p;
@@ -106,7 +106,7 @@ static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd)
memset(pgd_p, 0, pgd_size);
}
-static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
+static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
{
pgd_t *pgd;
p4d_t *p4d;
@@ -143,7 +143,7 @@ static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
return pud;
}
-static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
+static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -159,7 +159,7 @@ static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
}
-static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd)
+static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -185,7 +185,7 @@ static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd)
set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
}
-static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
+static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd_large(ppd);
@@ -195,7 +195,7 @@ static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
}
}
-static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
+static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd(ppd);
@@ -205,7 +205,7 @@ static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
}
}
-static void __head __sme_map_range(struct sme_populate_pgd_data *ppd,
+static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
pmdval_t pmd_flags, pteval_t pte_flags)
{
unsigned long vaddr_end;
@@ -229,22 +229,22 @@ static void __head __sme_map_range(struct sme_populate_pgd_data *ppd,
__sme_map_range_pte(ppd);
}
-static void __head sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
+static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC);
}
-static void __head sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
+static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC);
}
-static void __head sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
+static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP);
}
-static unsigned long __head sme_pgtable_calc(unsigned long len)
+static unsigned long __init sme_pgtable_calc(unsigned long len)
{
unsigned long entries = 0, tables = 0;
@@ -281,7 +281,7 @@ static unsigned long __head sme_pgtable_calc(unsigned long len)
return entries + tables;
}
-void __head sme_encrypt_kernel(struct boot_params *bp)
+void __init sme_encrypt_kernel(struct boot_params *bp)
{
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
@@ -485,7 +485,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp)
native_write_cr3(__native_read_cr3());
}
-void __head sme_enable(struct boot_params *bp)
+void __init sme_enable(struct boot_params *bp)
{
unsigned int eax, ebx, ecx, edx;
unsigned long feature_mask;
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 8b1b1abcef15..01ccdd168df0 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -2,12 +2,6 @@
#ifndef _ASM_X86_INIT_H
#define _ASM_X86_INIT_H
-#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 170000
-#define __head __section(".head.text") __no_sanitize_undefined __no_stack_protector
-#else
-#define __head __section(".head.text") __no_sanitize_undefined
-#endif
-
struct x86_mapping_info {
void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
void (*free_pgt_page)(void *, void *); /* free buf for page table */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 76743dfad6ab..437effb1ef03 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -61,7 +61,7 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
* any particular GDT layout, because we load our own as soon as we
* can.
*/
-__HEAD
+ __INIT
SYM_CODE_START(startup_32)
movl pa(initial_stack),%ecx
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index d219963ecb60..21816b48537c 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -33,7 +33,7 @@
* because we need identity-mapped pages.
*/
- __HEAD
+ __INIT
.code64
SYM_CODE_START_NOALIGN(startup_64)
UNWIND_HINT_END_OF_STACK
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index 1d78e5631bb8..344030c1a81d 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -24,7 +24,7 @@
#include <asm/nospec-branch.h>
#include <xen/interface/elfnote.h>
- __HEAD
+ __INIT
/*
* Entry point for PVH guests.
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v4 24/24] x86/boot: Get rid of the .head.text section
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (22 preceding siblings ...)
2025-07-09 8:09 ` [PATCH v4 23/24] x86/boot: Move startup code out of __head section Ard Biesheuvel
@ 2025-07-09 8:09 ` Ard Biesheuvel
2025-07-09 10:23 ` [PATCH v4 00/24] x86: strict separation of startup code Kirill A. Shutemov
2025-07-09 14:01 ` Nikunj A Dadhania
25 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 8:09 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
From: Ard Biesheuvel <ardb@kernel.org>
The .head.text section is now empty, so it can be dropped from the
linker script.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/vmlinux.lds.S | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4277efb26358..d7af4a64c211 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -160,11 +160,6 @@ SECTIONS
} :text = 0xcccccccc
- /* bootstrapping code */
- .head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {
- HEAD_TEXT
- } :text = 0xcccccccc
-
/* End of text section, which should occupy whole number of pages */
_etext = .;
. = ALIGN(PAGE_SIZE);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v4 00/24] x86: strict separation of startup code
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (23 preceding siblings ...)
2025-07-09 8:09 ` [PATCH v4 24/24] x86/boot: Get rid of the .head.text section Ard Biesheuvel
@ 2025-07-09 10:23 ` Kirill A. Shutemov
2025-07-09 10:32 ` Ard Biesheuvel
2025-07-09 14:01 ` Nikunj A Dadhania
25 siblings, 1 reply; 43+ messages in thread
From: Kirill A. Shutemov @ 2025-07-09 10:23 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Borislav Petkov,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky,
Josh Poimboeuf, Peter Zijlstra
On Wed, Jul 09, 2025 at 10:08:41AM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> !!! Boot tested on non-SEV guest ONLY !!!!
TDX guest boots fine.
--
Kiryl Shutsemau / Kirill A. Shutemov
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 00/24] x86: strict separation of startup code
2025-07-09 10:23 ` [PATCH v4 00/24] x86: strict separation of startup code Kirill A. Shutemov
@ 2025-07-09 10:32 ` Ard Biesheuvel
0 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 10:32 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Borislav Petkov,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky,
Josh Poimboeuf, Peter Zijlstra
On Wed, 9 Jul 2025 at 20:23, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Wed, Jul 09, 2025 at 10:08:41AM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > !!! Boot tested on non-SEV guest ONLY !!!!
>
> TDX guest boots fine.
>
Thanks!
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 00/24] x86: strict separation of startup code
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
` (24 preceding siblings ...)
2025-07-09 10:23 ` [PATCH v4 00/24] x86: strict separation of startup code Kirill A. Shutemov
@ 2025-07-09 14:01 ` Nikunj A Dadhania
25 siblings, 0 replies; 43+ messages in thread
From: Nikunj A Dadhania @ 2025-07-09 14:01 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
Ard Biesheuvel <ardb+git@google.com> writes:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Build tested defconfig and allmodconfig
>
> !!! Boot tested on non-SEV guest ONLY !!!!
>
> Again, I will need to lean on Tom to determine whether this breaks
> SEV-SNP guest boot. As I mentioned before, I am still waiting for
> SEV-SNP capable hardware to be delivered.
This series breaks SEV-SNP guest boot, bisect points to patch 12/24
"x86/sev: Unify SEV-SNP hypervisor feature check". If I revert this
patch SEV-SNP guest boot fine. I will continue debugging it further.
Regards,
Nikunj
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-07-09 8:08 ` [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
@ 2025-07-09 15:12 ` Tom Lendacky
2025-07-09 23:21 ` Ard Biesheuvel
2025-07-11 20:59 ` Borislav Petkov
0 siblings, 2 replies; 43+ messages in thread
From: Tom Lendacky @ 2025-07-09 15:12 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On 7/9/25 03:08, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> There are two distinct callers of snp_cpuid(): one where the MSR
> protocol is always used, and one where the GHCB page based interface is
> always used.
>
> The snp_cpuid() logic does not care about the distinction, which only
> matters at a lower level. But the fact that it supports both interfaces
> means that the GHCB page based logic is pulled into the early startup
> code where PA to VA conversions are problematic, given that it runs from
> the 1:1 mapping of memory.
>
> So keep snp_cpuid() itself in the startup code, but factor out the
> hypervisor calls via a callback, so that the GHCB page handling can be
> moved out.
>
> Code refactoring only - no functional change intended.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/boot/startup/sev-shared.c | 64 ++++----------------
> arch/x86/coco/sev/vc-shared.c | 49 ++++++++++++++-
> arch/x86/include/asm/sev.h | 3 +-
> 3 files changed, 63 insertions(+), 53 deletions(-)
>
> diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> index 7a706db87b93..992abfa50508 100644
> --- a/arch/x86/boot/startup/sev-shared.c
> +++ b/arch/x86/boot/startup/sev-shared.c
> @@ -319,7 +319,7 @@ static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
> return 0;
> }
>
> -static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
> +static int __sev_cpuid_msr_prot(struct cpuid_leaf *leaf)
Not sure the renaming makes it read any easier or say anything more. It
does add extra changes to the diff that have to be read through, though,
so I don't think it is beneficial.
> {
> int ret;
>
> @@ -342,44 +342,7 @@ static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
> return ret;
> }
>
> -static int __sev_cpuid_hv_ghcb(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> -{
> - u32 cr4 = native_read_cr4();
> - int ret;
> -
> - ghcb_set_rax(ghcb, leaf->fn);
> - ghcb_set_rcx(ghcb, leaf->subfn);
> -
> - if (cr4 & X86_CR4_OSXSAVE)
> - /* Safe to read xcr0 */
> - ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
> - else
> - /* xgetbv will cause #UD - use reset value for xcr0 */
> - ghcb_set_xcr0(ghcb, 1);
> -
> - ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
> - if (ret != ES_OK)
> - return ret;
> -
> - if (!(ghcb_rax_is_valid(ghcb) &&
> - ghcb_rbx_is_valid(ghcb) &&
> - ghcb_rcx_is_valid(ghcb) &&
> - ghcb_rdx_is_valid(ghcb)))
> - return ES_VMM_ERROR;
>
> - leaf->eax = ghcb->save.rax;
> - leaf->ebx = ghcb->save.rbx;
> - leaf->ecx = ghcb->save.rcx;
> - leaf->edx = ghcb->save.rdx;
> -
> - return ES_OK;
> -}
> -
> -static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> -{
> - return ghcb ? __sev_cpuid_hv_ghcb(ghcb, ctxt, leaf)
> - : __sev_cpuid_hv_msr(leaf);
> -}
>
> /*
> * This may be called early while still running on the initial identity
> @@ -484,21 +447,20 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> return false;
> }
>
> -static void snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> +static void snp_cpuid_msr_prot(void *ctx, struct cpuid_leaf *leaf)
> {
> - if (sev_cpuid_hv(ghcb, ctxt, leaf))
> + if (__sev_cpuid_msr_prot(leaf))
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
> }
>
> -static int __head
> -snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> - struct cpuid_leaf *leaf)
> +static int __head snp_cpuid_postprocess(void (*cpuid)(void *ctx, struct cpuid_leaf *),
> + void *ctx, struct cpuid_leaf *leaf)
> {
> struct cpuid_leaf leaf_hv = *leaf;
>
> switch (leaf->fn) {
> case 0x1:
> - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> + cpuid(ctx, &leaf_hv);
Maybe rename this parameter to snp_cpuid or snp_cpuid_fn or similar,
because it can be very confusing to see "cpuid" on its own like this.
>
> /* initial APIC ID */
> leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
> @@ -517,7 +479,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> break;
> case 0xB:
> leaf_hv.subfn = 0;
> - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> + cpuid(ctx, &leaf_hv);
>
> /* extended APIC ID */
> leaf->edx = leaf_hv.edx;
> @@ -565,7 +527,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> }
> break;
> case 0x8000001E:
> - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> + cpuid(ctx, &leaf_hv);
>
> /* extended APIC ID */
> leaf->eax = leaf_hv.eax;
> @@ -586,8 +548,8 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> * Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
> * should be treated as fatal by caller.
> */
> -int __head
> -snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> +int __head snp_cpuid(void (*cpuid)(void *ctx, struct cpuid_leaf *), void *ctx,
> + struct cpuid_leaf *leaf)
> {
> const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
>
> @@ -621,7 +583,7 @@ snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> return 0;
> }
>
> - return snp_cpuid_postprocess(ghcb, ctxt, leaf);
> + return snp_cpuid_postprocess(cpuid, ctx, leaf);
> }
>
> /*
> @@ -648,14 +610,14 @@ void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> leaf.fn = fn;
> leaf.subfn = subfn;
>
> - ret = snp_cpuid(NULL, NULL, &leaf);
> + ret = snp_cpuid(snp_cpuid_msr_prot, NULL, &leaf);
> if (!ret)
> goto cpuid_done;
>
> if (ret != -EOPNOTSUPP)
> goto fail;
>
> - if (__sev_cpuid_hv_msr(&leaf))
> + if (__sev_cpuid_msr_prot(&leaf))
> goto fail;
>
> cpuid_done:
> diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
> index 2c0ab0fdc060..776cb90be530 100644
> --- a/arch/x86/coco/sev/vc-shared.c
> +++ b/arch/x86/coco/sev/vc-shared.c
> @@ -409,15 +409,62 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
> return ret;
> }
>
> +static int __sev_cpuid_ghcb_prot(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
Ditto here and below, keeping the __sev_cpuid_hv_ghcb() /
sev_cpuid_hv_ghcb() name would be best.
Thanks,
Tom
> +{
> + u32 cr4 = native_read_cr4();
> + int ret;
> +
> + ghcb_set_rax(ghcb, leaf->fn);
> + ghcb_set_rcx(ghcb, leaf->subfn);
> +
> + if (cr4 & X86_CR4_OSXSAVE)
> + /* Safe to read xcr0 */
> + ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
> + else
> + /* xgetbv will cause #UD - use reset value for xcr0 */
> + ghcb_set_xcr0(ghcb, 1);
> +
> + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
> + if (ret != ES_OK)
> + return ret;
> +
> + if (!(ghcb_rax_is_valid(ghcb) &&
> + ghcb_rbx_is_valid(ghcb) &&
> + ghcb_rcx_is_valid(ghcb) &&
> + ghcb_rdx_is_valid(ghcb)))
> + return ES_VMM_ERROR;
> +
> + leaf->eax = ghcb->save.rax;
> + leaf->ebx = ghcb->save.rbx;
> + leaf->ecx = ghcb->save.rcx;
> + leaf->edx = ghcb->save.rdx;
> +
> + return ES_OK;
> +}
> +
> +struct cpuid_ctx {
> + struct ghcb *ghcb;
> + struct es_em_ctxt *ctxt;
> +};
> +
> +static void snp_cpuid_ghcb_prot(void *p, struct cpuid_leaf *leaf)
> +{
> + struct cpuid_ctx *ctx = p;
> +
> + if (__sev_cpuid_ghcb_prot(ctx->ghcb, ctx->ctxt, leaf))
> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
> +}
> +
> static int vc_handle_cpuid_snp(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
> {
> + struct cpuid_ctx ctx = { ghcb, ctxt };
> struct pt_regs *regs = ctxt->regs;
> struct cpuid_leaf leaf;
> int ret;
>
> leaf.fn = regs->ax;
> leaf.subfn = regs->cx;
> - ret = snp_cpuid(ghcb, ctxt, &leaf);
> + ret = snp_cpuid(snp_cpuid_ghcb_prot, &ctx, &leaf);
> if (!ret) {
> regs->ax = leaf.eax;
> regs->bx = leaf.ebx;
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 89075ff19afa..2cabf617de3c 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -552,7 +552,8 @@ struct cpuid_leaf {
> u32 edx;
> };
>
> -int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf);
> +int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
> + void *ctx, struct cpuid_leaf *leaf);
>
> void __noreturn sev_es_terminate(unsigned int set, unsigned int reason);
> enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 03/24] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call
2025-07-09 8:08 ` [PATCH v4 03/24] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call Ard Biesheuvel
@ 2025-07-09 15:50 ` Tom Lendacky
0 siblings, 0 replies; 43+ messages in thread
From: Tom Lendacky @ 2025-07-09 15:50 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On 7/9/25 03:08, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The early page state change API performs an SVSM call to PVALIDATE each
> page when running under a SVSM, and this involves either a GHCB page
> based call or a call based on the MSR protocol.
>
> The GHCB page based variant involves VA to PA translation of the GHCB
> address, and this is best avoided in the startup code, where virtual
> addresses are ambiguous (1:1 or kernel virtual).
>
> As this is the last remaining occurrence of svsm_perform_call_protocol()
> in the startup code, switch to the MSR protocol exclusively in this
> particular case, so that the GHCB based plumbing can be moved out of the
> startup code entirely in a subsequent patch.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
With some comments below.
> ---
> arch/x86/boot/compressed/sev.c | 20 --------------------
> arch/x86/boot/startup/sev-shared.c | 4 +++-
> 2 files changed, 3 insertions(+), 21 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index fd1b67dfea22..b71c1ab6a282 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -50,31 +50,11 @@ u64 svsm_get_caa_pa(void)
> return boot_svsm_caa_pa;
> }
>
> -int svsm_perform_call_protocol(struct svsm_call *call);
> -
> u8 snp_vmpl;
>
> /* Include code for early handlers */
> #include "../../boot/startup/sev-shared.c"
>
> -int svsm_perform_call_protocol(struct svsm_call *call)
> -{
> - struct ghcb *ghcb;
> - int ret;
> -
> - if (boot_ghcb)
> - ghcb = boot_ghcb;
> - else
> - ghcb = NULL;
> -
> - do {
> - ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
> - : svsm_perform_msr_protocol(call);
> - } while (ret == -EAGAIN);
> -
> - return ret;
> -}
> -
> static bool sev_snp_enabled(void)
> {
> return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> index 992abfa50508..1bb4b522dfaa 100644
> --- a/arch/x86/boot/startup/sev-shared.c
> +++ b/arch/x86/boot/startup/sev-shared.c
> @@ -753,7 +753,9 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
> call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
> call.rcx = pc_pa;
>
> - ret = svsm_perform_call_protocol(&call);
> + do {
> + ret = svsm_perform_msr_protocol(&call);
> + } while (ret == -EAGAIN);
There's no comment here, like in the previous change, about why we are
using this method directly. Can one be added?
Also, this makes the second place where this do-while call to the MSR
protocol is used, maybe a common function would be good to have now.
Thanks,
Tom
> if (ret)
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 04/24] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL
2025-07-09 8:08 ` [PATCH v4 04/24] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
@ 2025-07-09 16:13 ` Tom Lendacky
0 siblings, 0 replies; 43+ messages in thread
From: Tom Lendacky @ 2025-07-09 16:13 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On 7/9/25 03:08, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Determining the VMPL at which the kernel runs involves performing a
> RMPADJUST operation on an arbitrary page of memory, and observing whether
> it succeeds.
>
> The use of boot_ghcb_page in the core kernel in this case is completely
> arbitrary, but results in the need to provide a PIC alias for it. So use
> boot_svsm_ca_page instead, which already needs this alias for other
> reasons.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
> arch/x86/boot/compressed/sev.c | 2 +-
> arch/x86/boot/startup/sev-shared.c | 5 +++--
> arch/x86/boot/startup/sev-startup.c | 2 +-
> 3 files changed, 5 insertions(+), 4 deletions(-)
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 06/24] x86/sev: Avoid global variable to store virtual address of SVSM area
2025-07-09 8:08 ` [PATCH v4 06/24] x86/sev: Avoid global variable to store virtual address of SVSM area Ard Biesheuvel
@ 2025-07-09 17:49 ` Tom Lendacky
0 siblings, 0 replies; 43+ messages in thread
From: Tom Lendacky @ 2025-07-09 17:49 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On 7/9/25 03:08, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The boottime SVSM calling area is used both by the startup code running
> from a 1:1 mapping, and potentially later on running from the ordinary
> kernel mapping.
>
> This SVSM calling area is statically allocated, and so its physical
> address doesn't change. However, its virtual address depends on the
> calling context (1:1 mapping or kernel virtual mapping), and even though
> the variable that holds the virtual address of this calling area gets
> updated from 1:1 address to kernel address during the boot, it is hard
> to reason about why this is guaranteed to be safe.
>
> So instead, take the RIP-relative address of the boottime SVSM calling
> area whenever its virtual address is required, and only use a global
> variable for the physical address.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Just a minor comment below.
> ---
> arch/x86/boot/compressed/sev.c | 5 ++---
> arch/x86/boot/startup/sev-shared.c | 6 ------
> arch/x86/boot/startup/sev-startup.c | 4 ++--
> arch/x86/coco/sev/core.c | 9 ---------
> arch/x86/include/asm/sev-internal.h | 3 +--
> arch/x86/include/asm/sev.h | 2 --
> arch/x86/mm/mem_encrypt_amd.c | 6 ------
> 7 files changed, 5 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 3628e9bddc6a..6c0f91d38595 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -37,12 +37,12 @@ struct ghcb *boot_ghcb;
>
> #define __BOOT_COMPRESSED
>
> -extern struct svsm_ca *boot_svsm_caa;
> extern u64 boot_svsm_caa_pa;
>
> struct svsm_ca *svsm_get_caa(void)
> {
> - return boot_svsm_caa;
> + /* The decompressor is mapped 1:1 so VA == PA */
> + return (struct svsm_ca *)boot_svsm_caa_pa;
> }
>
> u64 svsm_get_caa_pa(void)
> @@ -530,7 +530,6 @@ bool early_is_sevsnp_guest(void)
>
> /* Obtain the address of the calling area to use */
> boot_rdmsr(MSR_SVSM_CAA, &m);
> - boot_svsm_caa = (void *)m.q;
> boot_svsm_caa_pa = m.q;
>
> /*
> diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> index 7d2415d690c9..7ca59038269f 100644
> --- a/arch/x86/boot/startup/sev-shared.c
> +++ b/arch/x86/boot/startup/sev-shared.c
> @@ -26,7 +26,6 @@
> * early boot, both with identity mapped virtual addresses and proper kernel
> * virtual addresses.
> */
> -struct svsm_ca *boot_svsm_caa __ro_after_init;
> u64 boot_svsm_caa_pa __ro_after_init;
>
> /*
> @@ -697,11 +696,6 @@ static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
> if (caa & (PAGE_SIZE - 1))
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CAA);
>
> - /*
> - * The CA is identity mapped when this routine is called, both by the
> - * decompressor code and the early kernel code.
> - */
> - boot_svsm_caa = (struct svsm_ca *)caa;
> boot_svsm_caa_pa = caa;
>
> /* Advertise the SVSM presence via CPUID. */
> diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
> index 050e071504af..8edf1ba78a48 100644
> --- a/arch/x86/boot/startup/sev-startup.c
> +++ b/arch/x86/boot/startup/sev-startup.c
> @@ -252,6 +252,7 @@ static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
>
> static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
> {
> + struct snp_secrets_page *secrets = (void *)cc_info->secrets_phys;
> struct svsm_call call = {};
> int ret;
> u64 pa;
> @@ -280,7 +281,7 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
> * RAX = 0 (Protocol=0, CallID=0)
> * RCX = New CA GPA
> */
> - call.caa = svsm_get_caa();
> + call.caa = (struct svsm_ca *)secrets->svsm_caa;
I would be good to add an additional sentence to the block comment above
this section about how the identity mapping allows use of the direct CAA
value from the secrets page.
Thanks,
Tom
> call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
> call.rcx = pa;
> do {
> @@ -289,7 +290,6 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
> if (ret)
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL);
>
> - boot_svsm_caa = (struct svsm_ca *)pa;
> boot_svsm_caa_pa = pa;
> }
>
> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> index 15be9e52848d..bea67d017bf0 100644
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -1643,15 +1643,6 @@ void sev_show_status(void)
> pr_cont("\n");
> }
>
> -void __init snp_update_svsm_ca(void)
> -{
> - if (!snp_vmpl)
> - return;
> -
> - /* Update the CAA to a proper kernel address */
> - boot_svsm_caa = &boot_svsm_ca_page;
> -}
> -
> #ifdef CONFIG_SYSFS
> static ssize_t vmpl_show(struct kobject *kobj,
> struct kobj_attribute *attr, char *buf)
> diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
> index 6199b35a82e4..ffe4755962fe 100644
> --- a/arch/x86/include/asm/sev-internal.h
> +++ b/arch/x86/include/asm/sev-internal.h
> @@ -60,7 +60,6 @@ void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> DECLARE_PER_CPU(struct svsm_ca *, svsm_caa);
> DECLARE_PER_CPU(u64, svsm_caa_pa);
>
> -extern struct svsm_ca *boot_svsm_caa;
> extern u64 boot_svsm_caa_pa;
>
> static __always_inline struct svsm_ca *svsm_get_caa(void)
> @@ -68,7 +67,7 @@ static __always_inline struct svsm_ca *svsm_get_caa(void)
> if (sev_cfg.use_cas)
> return this_cpu_read(svsm_caa);
> else
> - return boot_svsm_caa;
> + return rip_rel_ptr(&boot_svsm_ca_page);
> }
>
> static __always_inline u64 svsm_get_caa_pa(void)
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index 135e91a17d04..f3acbfcdca9a 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -519,7 +519,6 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end);
> u64 snp_get_unsupported_features(u64 status);
> u64 sev_get_status(void);
> void sev_show_status(void);
> -void snp_update_svsm_ca(void);
> int prepare_pte_enc(struct pte_enc_desc *d);
> void set_pte_enc_mask(pte_t *kpte, unsigned long pfn, pgprot_t new_prot);
> void snp_kexec_finish(void);
> @@ -600,7 +599,6 @@ static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
> static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
> static inline u64 sev_get_status(void) { return 0; }
> static inline void sev_show_status(void) { }
> -static inline void snp_update_svsm_ca(void) { }
> static inline int prepare_pte_enc(struct pte_enc_desc *d) { return 0; }
> static inline void set_pte_enc_mask(pte_t *kpte, unsigned long pfn, pgprot_t new_prot) { }
> static inline void snp_kexec_finish(void) { }
> diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
> index faf3a13fb6ba..2f8c32173972 100644
> --- a/arch/x86/mm/mem_encrypt_amd.c
> +++ b/arch/x86/mm/mem_encrypt_amd.c
> @@ -536,12 +536,6 @@ void __init sme_early_init(void)
> x86_init.resources.dmi_setup = snp_dmi_setup;
> }
>
> - /*
> - * Switch the SVSM CA mapping (if active) from identity mapped to
> - * kernel mapped.
> - */
> - snp_update_svsm_ca();
> -
> if (sev_status & MSR_AMD64_SNP_SECURE_TSC)
> setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> }
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 07/24] x86/sev: Move MSR save/restore out of early page state change helper
2025-07-09 8:08 ` [PATCH v4 07/24] x86/sev: Move MSR save/restore out of early page state change helper Ard Biesheuvel
@ 2025-07-09 18:16 ` Tom Lendacky
0 siblings, 0 replies; 43+ messages in thread
From: Tom Lendacky @ 2025-07-09 18:16 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On 7/9/25 03:08, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The function __page_state_change() in the decompressor is very similar
> to the loop in early_set_pages_state(), and they can share this code
> once the MSR save/restore is moved out.
>
> This also avoids doing the preserve/restore for each page in a longer
> sequence unnecessarily.
>
> This simplifies subsequent changes, where the APIs used by
> __page_state_change() are modified for better separation between startup
> code and ordinary code.
The reason for the calls being in __page_state_change() is because of
the call to pavalidate_4k_page(). If that code path is ever changed to
cause a #VC, then the GHCB MSR will be messed up when the UEFI #VC
handler gets control and will be an issue.
Thanks,
Tom
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/boot/compressed/sev.c | 17 +++++++++++------
> 1 file changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 6c0f91d38595..f00f68175f14 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -71,9 +71,6 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
> if (op == SNP_PAGE_STATE_SHARED)
> pvalidate_4k_page(paddr, paddr, false);
>
> - /* Save the current GHCB MSR value */
> - msr = sev_es_rd_ghcb_msr();
> -
> /* Issue VMGEXIT to change the page state in RMP table. */
> sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> VMGEXIT();
> @@ -83,9 +80,6 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
> if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
>
> - /* Restore the GHCB MSR value */
> - sev_es_wr_ghcb_msr(msr);
> -
> /*
> * Now that page state is changed in the RMP table, validate it so that it is
> * consistent with the RMP entry.
> @@ -96,18 +90,26 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
>
> void snp_set_page_private(unsigned long paddr)
> {
> + u64 msr;
> +
> if (!sev_snp_enabled())
> return;
>
> + msr = sev_es_rd_ghcb_msr();
> __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
> + sev_es_wr_ghcb_msr(msr);
> }
>
> void snp_set_page_shared(unsigned long paddr)
> {
> + u64 msr;
> +
> if (!sev_snp_enabled())
> return;
>
> + msr = sev_es_rd_ghcb_msr();
> __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
> + sev_es_wr_ghcb_msr(msr);
> }
>
> bool early_setup_ghcb(void)
> @@ -132,8 +134,11 @@ bool early_setup_ghcb(void)
>
> void snp_accept_memory(phys_addr_t start, phys_addr_t end)
> {
> + u64 msr = sev_es_rd_ghcb_msr();
> +
> for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
> __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
> + sev_es_wr_ghcb_msr(msr);
> }
>
> void sev_es_shutdown_ghcb(void)
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 08/24] x86/sev: Share implementation of MSR-based page state change
2025-07-09 8:08 ` [PATCH v4 08/24] x86/sev: Share implementation of MSR-based page state change Ard Biesheuvel
@ 2025-07-09 18:24 ` Tom Lendacky
2025-07-15 5:09 ` Ard Biesheuvel
0 siblings, 1 reply; 43+ messages in thread
From: Tom Lendacky @ 2025-07-09 18:24 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On 7/9/25 03:08, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Both the decompressor and the SEV startup code implement the exact same
> sequence for invoking the MSR based communication protocol to effectuate
> a page state change.
>
> Before tweaking the internal APIs used in both versions, merge them and
> share them so those tweaks are only needed in a single place.
I think you can keep the save and restore of the MSR in the the combined
code so that you don't need the previous patch and that will keep
everything safe. We should be doing a minimal amount of MSR protocol
page state changes, so it really shouldn't have much effect.
Thanks,
Tom
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/boot/compressed/sev.c | 34 ++------------------
> arch/x86/boot/startup/sev-shared.c | 29 +++++++++++++++++
> arch/x86/boot/startup/sev-startup.c | 29 +----------------
> 3 files changed, 33 insertions(+), 59 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index f00f68175f14..6d3ed7ed03a4 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -60,34 +60,6 @@ static bool sev_snp_enabled(void)
> return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> }
>
> -static void __page_state_change(unsigned long paddr, enum psc_op op)
> -{
> - u64 val, msr;
> -
> - /*
> - * If private -> shared then invalidate the page before requesting the
> - * state change in the RMP table.
> - */
> - if (op == SNP_PAGE_STATE_SHARED)
> - pvalidate_4k_page(paddr, paddr, false);
> -
> - /* Issue VMGEXIT to change the page state in RMP table. */
> - sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> - VMGEXIT();
> -
> - /* Read the response of the VMGEXIT. */
> - val = sev_es_rd_ghcb_msr();
> - if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
> - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
> -
> - /*
> - * Now that page state is changed in the RMP table, validate it so that it is
> - * consistent with the RMP entry.
> - */
> - if (op == SNP_PAGE_STATE_PRIVATE)
> - pvalidate_4k_page(paddr, paddr, true);
> -}
> -
> void snp_set_page_private(unsigned long paddr)
> {
> u64 msr;
> @@ -96,7 +68,7 @@ void snp_set_page_private(unsigned long paddr)
> return;
>
> msr = sev_es_rd_ghcb_msr();
> - __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
> + __page_state_change(paddr, paddr, SNP_PAGE_STATE_PRIVATE);
> sev_es_wr_ghcb_msr(msr);
> }
>
> @@ -108,7 +80,7 @@ void snp_set_page_shared(unsigned long paddr)
> return;
>
> msr = sev_es_rd_ghcb_msr();
> - __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
> + __page_state_change(paddr, paddr, SNP_PAGE_STATE_SHARED);
> sev_es_wr_ghcb_msr(msr);
> }
>
> @@ -137,7 +109,7 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
> u64 msr = sev_es_rd_ghcb_msr();
>
> for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
> - __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
> + __page_state_change(pa, pa, SNP_PAGE_STATE_PRIVATE);
> sev_es_wr_ghcb_msr(msr);
> }
>
> diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> index 7ca59038269f..f553268d31d7 100644
> --- a/arch/x86/boot/startup/sev-shared.c
> +++ b/arch/x86/boot/startup/sev-shared.c
> @@ -640,6 +640,35 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
> }
> }
>
> +static void __head __page_state_change(unsigned long vaddr, unsigned long paddr,
> + enum psc_op op)
> +{
> + u64 val;
> +
> + /*
> + * If private -> shared then invalidate the page before requesting the
> + * state change in the RMP table.
> + */
> + if (op == SNP_PAGE_STATE_SHARED)
> + pvalidate_4k_page(vaddr, paddr, false);
> +
> + /* Issue VMGEXIT to change the page state in RMP table. */
> + sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> + VMGEXIT();
> +
> + /* Read the response of the VMGEXIT. */
> + val = sev_es_rd_ghcb_msr();
> + if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
> +
> + /*
> + * Now that page state is changed in the RMP table, validate it so that it is
> + * consistent with the RMP entry.
> + */
> + if (op == SNP_PAGE_STATE_PRIVATE)
> + pvalidate_4k_page(vaddr, paddr, true);
> +}
> +
> /*
> * Maintain the GPA of the SVSM Calling Area (CA) in order to utilize the SVSM
> * services needed when not running in VMPL0.
> diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
> index 8edf1ba78a48..2ffd8bf09357 100644
> --- a/arch/x86/boot/startup/sev-startup.c
> +++ b/arch/x86/boot/startup/sev-startup.c
> @@ -135,7 +135,6 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> unsigned long npages, enum psc_op op)
> {
> unsigned long paddr_end;
> - u64 val;
>
> vaddr = vaddr & PAGE_MASK;
>
> @@ -143,37 +142,11 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> paddr_end = paddr + (npages << PAGE_SHIFT);
>
> while (paddr < paddr_end) {
> - /* Page validation must be rescinded before changing to shared */
> - if (op == SNP_PAGE_STATE_SHARED)
> - pvalidate_4k_page(vaddr, paddr, false);
> -
> - /*
> - * Use the MSR protocol because this function can be called before
> - * the GHCB is established.
> - */
> - sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> - VMGEXIT();
> -
> - val = sev_es_rd_ghcb_msr();
> -
> - if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP)
> - goto e_term;
> -
> - if (GHCB_MSR_PSC_RESP_VAL(val))
> - goto e_term;
> -
> - /* Page validation must be performed after changing to private */
> - if (op == SNP_PAGE_STATE_PRIVATE)
> - pvalidate_4k_page(vaddr, paddr, true);
> + __page_state_change(vaddr, paddr, op);
>
> vaddr += PAGE_SIZE;
> paddr += PAGE_SIZE;
> }
> -
> - return;
> -
> -e_term:
> - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
> }
>
> void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-07-09 15:12 ` Tom Lendacky
@ 2025-07-09 23:21 ` Ard Biesheuvel
2025-07-11 20:59 ` Borislav Petkov
1 sibling, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-09 23:21 UTC (permalink / raw)
To: Tom Lendacky
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Borislav Petkov,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On Thu, 10 Jul 2025 at 01:12, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 7/9/25 03:08, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > There are two distinct callers of snp_cpuid(): one where the MSR
> > protocol is always used, and one where the GHCB page based interface is
> > always used.
> >
> > The snp_cpuid() logic does not care about the distinction, which only
> > matters at a lower level. But the fact that it supports both interfaces
> > means that the GHCB page based logic is pulled into the early startup
> > code where PA to VA conversions are problematic, given that it runs from
> > the 1:1 mapping of memory.
> >
> > So keep snp_cpuid() itself in the startup code, but factor out the
> > hypervisor calls via a callback, so that the GHCB page handling can be
> > moved out.
> >
> > Code refactoring only - no functional change intended.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/x86/boot/startup/sev-shared.c | 64 ++++----------------
> > arch/x86/coco/sev/vc-shared.c | 49 ++++++++++++++-
> > arch/x86/include/asm/sev.h | 3 +-
> > 3 files changed, 63 insertions(+), 53 deletions(-)
> >
> > diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> > index 7a706db87b93..992abfa50508 100644
> > --- a/arch/x86/boot/startup/sev-shared.c
> > +++ b/arch/x86/boot/startup/sev-shared.c
> > @@ -319,7 +319,7 @@ static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
> > return 0;
> > }
> >
> > -static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
> > +static int __sev_cpuid_msr_prot(struct cpuid_leaf *leaf)
>
> Not sure the renaming makes it read any easier or say anything more. It
> does add extra changes to the diff that have to be read through, though,
> so I don't think it is beneficial.
>
These additional changes were provided by Boris as a delta patch on
top of my v3, so I will leave to him to respond to this.
https://lore.kernel.org/all/20250512190834.332684-24-ardb+git@google.com/T/#u
> > {
> > int ret;
> >
> > @@ -342,44 +342,7 @@ static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
> > return ret;
> > }
> >
> > -static int __sev_cpuid_hv_ghcb(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > -{
> > - u32 cr4 = native_read_cr4();
> > - int ret;
> > -
> > - ghcb_set_rax(ghcb, leaf->fn);
> > - ghcb_set_rcx(ghcb, leaf->subfn);
> > -
> > - if (cr4 & X86_CR4_OSXSAVE)
> > - /* Safe to read xcr0 */
> > - ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
> > - else
> > - /* xgetbv will cause #UD - use reset value for xcr0 */
> > - ghcb_set_xcr0(ghcb, 1);
> > -
> > - ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
> > - if (ret != ES_OK)
> > - return ret;
> > -
> > - if (!(ghcb_rax_is_valid(ghcb) &&
> > - ghcb_rbx_is_valid(ghcb) &&
> > - ghcb_rcx_is_valid(ghcb) &&
> > - ghcb_rdx_is_valid(ghcb)))
> > - return ES_VMM_ERROR;
> >
> > - leaf->eax = ghcb->save.rax;
> > - leaf->ebx = ghcb->save.rbx;
> > - leaf->ecx = ghcb->save.rcx;
> > - leaf->edx = ghcb->save.rdx;
> > -
> > - return ES_OK;
> > -}
> > -
> > -static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > -{
> > - return ghcb ? __sev_cpuid_hv_ghcb(ghcb, ctxt, leaf)
> > - : __sev_cpuid_hv_msr(leaf);
> > -}
> >
> > /*
> > * This may be called early while still running on the initial identity
> > @@ -484,21 +447,20 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> > return false;
> > }
> >
> > -static void snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > +static void snp_cpuid_msr_prot(void *ctx, struct cpuid_leaf *leaf)
> > {
> > - if (sev_cpuid_hv(ghcb, ctxt, leaf))
> > + if (__sev_cpuid_msr_prot(leaf))
> > sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
> > }
> >
> > -static int __head
> > -snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > - struct cpuid_leaf *leaf)
> > +static int __head snp_cpuid_postprocess(void (*cpuid)(void *ctx, struct cpuid_leaf *),
> > + void *ctx, struct cpuid_leaf *leaf)
> > {
> > struct cpuid_leaf leaf_hv = *leaf;
> >
> > switch (leaf->fn) {
> > case 0x1:
> > - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> > + cpuid(ctx, &leaf_hv);
>
> Maybe rename this parameter to snp_cpuid or snp_cpuid_fn or similar,
> because it can be very confusing to see "cpuid" on its own like this.
>
> >
> > /* initial APIC ID */
> > leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
> > @@ -517,7 +479,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > break;
> > case 0xB:
> > leaf_hv.subfn = 0;
> > - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> > + cpuid(ctx, &leaf_hv);
> >
> > /* extended APIC ID */
> > leaf->edx = leaf_hv.edx;
> > @@ -565,7 +527,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > }
> > break;
> > case 0x8000001E:
> > - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> > + cpuid(ctx, &leaf_hv);
> >
> > /* extended APIC ID */
> > leaf->eax = leaf_hv.eax;
> > @@ -586,8 +548,8 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > * Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
> > * should be treated as fatal by caller.
> > */
> > -int __head
> > -snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > +int __head snp_cpuid(void (*cpuid)(void *ctx, struct cpuid_leaf *), void *ctx,
> > + struct cpuid_leaf *leaf)
> > {
> > const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> >
> > @@ -621,7 +583,7 @@ snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > return 0;
> > }
> >
> > - return snp_cpuid_postprocess(ghcb, ctxt, leaf);
> > + return snp_cpuid_postprocess(cpuid, ctx, leaf);
> > }
> >
> > /*
> > @@ -648,14 +610,14 @@ void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
> > leaf.fn = fn;
> > leaf.subfn = subfn;
> >
> > - ret = snp_cpuid(NULL, NULL, &leaf);
> > + ret = snp_cpuid(snp_cpuid_msr_prot, NULL, &leaf);
> > if (!ret)
> > goto cpuid_done;
> >
> > if (ret != -EOPNOTSUPP)
> > goto fail;
> >
> > - if (__sev_cpuid_hv_msr(&leaf))
> > + if (__sev_cpuid_msr_prot(&leaf))
> > goto fail;
> >
> > cpuid_done:
> > diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
> > index 2c0ab0fdc060..776cb90be530 100644
> > --- a/arch/x86/coco/sev/vc-shared.c
> > +++ b/arch/x86/coco/sev/vc-shared.c
> > @@ -409,15 +409,62 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
> > return ret;
> > }
> >
> > +static int __sev_cpuid_ghcb_prot(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
>
> Ditto here and below, keeping the __sev_cpuid_hv_ghcb() /
> sev_cpuid_hv_ghcb() name would be best.
>
> Thanks,
> Tom
>
> > +{
> > + u32 cr4 = native_read_cr4();
> > + int ret;
> > +
> > + ghcb_set_rax(ghcb, leaf->fn);
> > + ghcb_set_rcx(ghcb, leaf->subfn);
> > +
> > + if (cr4 & X86_CR4_OSXSAVE)
> > + /* Safe to read xcr0 */
> > + ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
> > + else
> > + /* xgetbv will cause #UD - use reset value for xcr0 */
> > + ghcb_set_xcr0(ghcb, 1);
> > +
> > + ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
> > + if (ret != ES_OK)
> > + return ret;
> > +
> > + if (!(ghcb_rax_is_valid(ghcb) &&
> > + ghcb_rbx_is_valid(ghcb) &&
> > + ghcb_rcx_is_valid(ghcb) &&
> > + ghcb_rdx_is_valid(ghcb)))
> > + return ES_VMM_ERROR;
> > +
> > + leaf->eax = ghcb->save.rax;
> > + leaf->ebx = ghcb->save.rbx;
> > + leaf->ecx = ghcb->save.rcx;
> > + leaf->edx = ghcb->save.rdx;
> > +
> > + return ES_OK;
> > +}
> > +
> > +struct cpuid_ctx {
> > + struct ghcb *ghcb;
> > + struct es_em_ctxt *ctxt;
> > +};
> > +
> > +static void snp_cpuid_ghcb_prot(void *p, struct cpuid_leaf *leaf)
> > +{
> > + struct cpuid_ctx *ctx = p;
> > +
> > + if (__sev_cpuid_ghcb_prot(ctx->ghcb, ctx->ctxt, leaf))
> > + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
> > +}
> > +
> > static int vc_handle_cpuid_snp(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
> > {
> > + struct cpuid_ctx ctx = { ghcb, ctxt };
> > struct pt_regs *regs = ctxt->regs;
> > struct cpuid_leaf leaf;
> > int ret;
> >
> > leaf.fn = regs->ax;
> > leaf.subfn = regs->cx;
> > - ret = snp_cpuid(ghcb, ctxt, &leaf);
> > + ret = snp_cpuid(snp_cpuid_ghcb_prot, &ctx, &leaf);
> > if (!ret) {
> > regs->ax = leaf.eax;
> > regs->bx = leaf.ebx;
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index 89075ff19afa..2cabf617de3c 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -552,7 +552,8 @@ struct cpuid_leaf {
> > u32 edx;
> > };
> >
> > -int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf);
> > +int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
> > + void *ctx, struct cpuid_leaf *leaf);
> >
> > void __noreturn sev_es_terminate(unsigned int set, unsigned int reason);
> > enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check
2025-07-09 8:08 ` [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
@ 2025-07-10 4:21 ` Nikunj A Dadhania
2025-07-10 4:24 ` Ard Biesheuvel
0 siblings, 1 reply; 43+ messages in thread
From: Nikunj A Dadhania @ 2025-07-10 4:21 UTC (permalink / raw)
To: Ard Biesheuvel, linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
Ard Biesheuvel <ardb+git@google.com> writes:
> From: Ard Biesheuvel <ardb@kernel.org>
...
> So move the HV feature check into a helper function and call that
> instead. For the core kernel, move the check to an earlier boot stage,
> right after the point where it is established that the guest is
> executing in SEV-SNP mode.
This change is causing the SNP guest to fail ...
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/boot/startup/sme.c | 2 ++
> arch/x86/coco/sev/core.c | 11 -------
> diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
> index 70ea1748c0a7..529090e20d2a 100644
> --- a/arch/x86/boot/startup/sme.c
> +++ b/arch/x86/boot/startup/sme.c
> @@ -533,6 +533,8 @@ void __head sme_enable(struct boot_params *bp)
> if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> + sev_hv_features = snp_check _hv_features();
> +
...
snp_check_hv_features()
-> get_hv_features() fails as ghcb_version is not yet initalized
ghcb_version is initialized as part of sev_es_negotiate_protocol(),
atleast this function needs to be called before get_hv_features() is
called.
> /* Check if memory encryption is enabled */
> if (feature_mask == AMD_SME_BIT) {
> if (!(bp->hdr.xloadflags & XLF_MEM_ENCRYPTION))
> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> index 4fe0928bc0ad..f73dea313f55 100644
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -1344,17 +1344,6 @@ void __init sev_es_init_vc_handling(void)
> if (!sev_es_check_cpu_features())
> panic("SEV-ES CPU Features missing");
>
> - /*
> - * SNP is supported in v2 of the GHCB spec which mandates support for HV
> - * features.
> - */
> - if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> - sev_hv_features = get_hv_features();
> -
> - if (!(sev_hv_features & GHCB_HV_FT_SNP))
> - sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
> - }
> -
With the below change, SNP-guest boots fine. I will wait for Tom's
confirmation if this is enough.
diff --git a/arch/x86/boot/startup/exports.h b/arch/x86/boot/startup/exports.h
index 01d2363dc445..dc7ca6f14cfe 100644
--- a/arch/x86/boot/startup/exports.h
+++ b/arch/x86/boot/startup/exports.h
@@ -12,3 +12,4 @@ PROVIDE(snp_cpuid = __pi_snp_cpuid);
PROVIDE(snp_cpuid_get_table = __pi_snp_cpuid_get_table);
PROVIDE(svsm_issue_call = __pi_svsm_issue_call);
PROVIDE(svsm_process_result_codes = __pi_svsm_process_result_codes);
+PROVIDE(sev_es_negotiate_protocol = __pi_sev_es_negotiate_protocol);
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index ce11bab57d4f..ff6f3372cbb4 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -740,3 +740,24 @@ static bool __init svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
return true;
}
+
+bool sev_es_negotiate_protocol(void)
+{
+ u64 val;
+
+ /* Do the GHCB protocol version negotiation */
+ sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
+ return false;
+
+ if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
+ GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
+ return false;
+
+ ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+
+ return true;
+}
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index c3eff6d5102c..a7f8ee64e211 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -533,6 +533,9 @@ void __init sme_enable(struct boot_params *bp)
if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();
+ if (!sev_es_negotiate_protocol())
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
+
sev_hv_features = snp_check_hv_features();
/* Check if memory encryption is enabled */
diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
index 3d44474f46e7..af2d0fae2e18 100644
--- a/arch/x86/coco/sev/vc-shared.c
+++ b/arch/x86/coco/sev/vc-shared.c
@@ -622,24 +622,3 @@ bool __init sev_es_check_cpu_features(void)
return true;
}
-
-bool sev_es_negotiate_protocol(void)
-{
- u64 val;
-
- /* Do the GHCB protocol version negotiation */
- sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
-
- if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
- return false;
-
- if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
- GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
- return false;
-
- ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
-
- return true;
-}
Regards
Nikunj
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check
2025-07-10 4:21 ` Nikunj A Dadhania
@ 2025-07-10 4:24 ` Ard Biesheuvel
[not found] ` <85o6tsv8m2.fsf@amd.com>
0 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-10 4:24 UTC (permalink / raw)
To: Nikunj A Dadhania
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Borislav Petkov,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky,
Josh Poimboeuf, Peter Zijlstra
On Thu, 10 Jul 2025 at 14:21, Nikunj A Dadhania <nikunj@amd.com> wrote:
>
> Ard Biesheuvel <ardb+git@google.com> writes:
>
> > From: Ard Biesheuvel <ardb@kernel.org>
>
> ...
>
> > So move the HV feature check into a helper function and call that
> > instead. For the core kernel, move the check to an earlier boot stage,
> > right after the point where it is established that the guest is
> > executing in SEV-SNP mode.
>
> This change is causing the SNP guest to fail ...
>
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
>
> > arch/x86/boot/startup/sme.c | 2 ++
> > arch/x86/coco/sev/core.c | 11 -------
>
> > diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
> > index 70ea1748c0a7..529090e20d2a 100644
> > --- a/arch/x86/boot/startup/sme.c
> > +++ b/arch/x86/boot/startup/sme.c
> > @@ -533,6 +533,8 @@ void __head sme_enable(struct boot_params *bp)
> > if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
> > snp_abort();
> >
> > + sev_hv_features = snp_check _hv_features();
> > +
>
> ...
> snp_check_hv_features()
> -> get_hv_features() fails as ghcb_version is not yet initalized
>
> ghcb_version is initialized as part of sev_es_negotiate_protocol(),
> atleast this function needs to be called before get_hv_features() is
> called.
>
Thanks for the diagnosis.
I added back the ghcb_version check, even though it is redundant,
given that SNP support implies ghcb version >= 2
Would the below change be sufficient too?
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -49,9 +49,6 @@ static u64 __init get_hv_features(void)
{
u64 val;
- if (ghcb_version < 2)
- return 0;
-
sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
VMGEXIT();
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check
[not found] ` <85o6tsv8m2.fsf@amd.com>
@ 2025-07-10 7:36 ` Ard Biesheuvel
2025-07-10 8:02 ` Nikunj A Dadhania
0 siblings, 1 reply; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-10 7:36 UTC (permalink / raw)
To: Nikunj A Dadhania
Cc: Linux Kernel Mailing List, linux-efi, X86 ML, Borislav Petkov,
Ingo Molnar, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
(please keep the CCs)
On Thu, 10 Jul 2025 at 16:54, Nikunj A Dadhania <nikunj@amd.com> wrote:
>
> Ard Biesheuvel <ardb@kernel.org> writes:
>
> > On Thu, 10 Jul 2025 at 14:21, Nikunj A Dadhania <nikunj@amd.com> wrote:
> >>
> >> Ard Biesheuvel <ardb+git@google.com> writes:
> >>
> >> > From: Ard Biesheuvel <ardb@kernel.org>
> >>
> >> ...
> >>
> >> > So move the HV feature check into a helper function and call that
> >> > instead. For the core kernel, move the check to an earlier boot stage,
> >> > right after the point where it is established that the guest is
> >> > executing in SEV-SNP mode.
> >>
> >> This change is causing the SNP guest to fail ...
> >>
> >> >
> >> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> >> > ---
> >>
> >> > arch/x86/boot/startup/sme.c | 2 ++
> >> > arch/x86/coco/sev/core.c | 11 -------
> >>
> >> > diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
> >> > index 70ea1748c0a7..529090e20d2a 100644
> >> > --- a/arch/x86/boot/startup/sme.c
> >> > +++ b/arch/x86/boot/startup/sme.c
> >> > @@ -533,6 +533,8 @@ void __head sme_enable(struct boot_params *bp)
> >> > if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
> >> > snp_abort();
> >> >
> >> > + sev_hv_features = snp_check _hv_features();
> >> > +
> >>
> >> ...
> >> snp_check_hv_features()
> >> -> get_hv_features() fails as ghcb_version is not yet initalized
> >>
> >> ghcb_version is initialized as part of sev_es_negotiate_protocol(),
> >> atleast this function needs to be called before get_hv_features() is
> >> called.
> >>
> >
> > Thanks for the diagnosis.
> >
> > I added back the ghcb_version check,
>
> Not sure if I understand, this check is there since:
>
> cbd3d4f7c4e5 x86/sev: Check SEV-SNP features support
>
> > even though it is redundant,
> > given that SNP support implies ghcb version >= 2
> >
> > Would the below change be sufficient too?
>
> KVM_SEV_INIT2 allows the user space to provide ghcb_version, setting
> ghcb_version=1 in QEMU;
>
> with my changes: SNP guest does not boot, as the ghcb_version < 2
> with the below change: SNP guest boots, which is not correct
>
> This needs to be fixed in KVM as well to not allow creating SNP guest if
> ghcb_version < 2.
>
OK, thanks for clariyfing this - it means that checking the GHCB
version remains necessary, even though it is only done for SNP guests,
where GHCB version >=2 is mandated.
I'll take this into account in the next revision.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check
2025-07-10 7:36 ` Ard Biesheuvel
@ 2025-07-10 8:02 ` Nikunj A Dadhania
0 siblings, 0 replies; 43+ messages in thread
From: Nikunj A Dadhania @ 2025-07-10 8:02 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Linux Kernel Mailing List, linux-efi, X86 ML, Borislav Petkov,
Ingo Molnar, Kevin Loughlin, Tom Lendacky, Josh Poimboeuf,
Peter Zijlstra
Ard Biesheuvel <ardb@kernel.org> writes:
> (please keep the CCs)
I hit reply instead of reply-all, thanks for including all.
Regards
Nikunj
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-07-09 15:12 ` Tom Lendacky
2025-07-09 23:21 ` Ard Biesheuvel
@ 2025-07-11 20:59 ` Borislav Petkov
2025-07-12 14:54 ` Tom Lendacky
1 sibling, 1 reply; 43+ messages in thread
From: Borislav Petkov @ 2025-07-11 20:59 UTC (permalink / raw)
To: Tom Lendacky
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On Wed, Jul 09, 2025 at 10:12:48AM -0500, Tom Lendacky wrote:
> Not sure the renaming makes it read any easier or say anything more. It
> does add extra changes to the diff that have to be read through, though,
> so I don't think it is beneficial.
So it really comes natural to split them into a msr_prot and a ghcb_prot
variant. If we added a separate patch ontop that does only the renaming, then
that would probably be more churn than necessary.
> Maybe rename this parameter to snp_cpuid or snp_cpuid_fn or similar,
> because it can be very confusing to see "cpuid" on its own like this.a
Yeah, that's a good point - snp_cpuid_fn clearly states that it is a function
pointer and not *the* cpuid() function.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-07-11 20:59 ` Borislav Petkov
@ 2025-07-12 14:54 ` Tom Lendacky
2025-07-12 15:02 ` Borislav Petkov
0 siblings, 1 reply; 43+ messages in thread
From: Tom Lendacky @ 2025-07-12 14:54 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On 7/11/25 15:59, Borislav Petkov wrote:
> On Wed, Jul 09, 2025 at 10:12:48AM -0500, Tom Lendacky wrote:
>> Not sure the renaming makes it read any easier or say anything more. It
>> does add extra changes to the diff that have to be read through, though,
>> so I don't think it is beneficial.
>
> So it really comes natural to split them into a msr_prot and a ghcb_prot
> variant. If we added a separate patch ontop that does only the renaming, then
> that would probably be more churn than necessary.
Right, they already are though:
__sev_cpuid_hv_msr() and __sev_cpuid_hv_ghcb()
the first one meaning that the hypervisor is being called using the msr
protocol and the second one meaning that the hypervisor is being called
using the ghcb protocol.
That's why I made the comment. Just changing
__sev_cpuid_hv_msr() to __sev_cpuid_msr_prot()
isn't saying anything more in my opinion.
Thanks,
Tom
>
>> Maybe rename this parameter to snp_cpuid or snp_cpuid_fn or similar,
>> because it can be very confusing to see "cpuid" on its own like this.a
>
> Yeah, that's a good point - snp_cpuid_fn clearly states that it is a function
> pointer and not *the* cpuid() function.
>
> Thx.
>
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-07-12 14:54 ` Tom Lendacky
@ 2025-07-12 15:02 ` Borislav Petkov
0 siblings, 0 replies; 43+ messages in thread
From: Borislav Petkov @ 2025-07-12 15:02 UTC (permalink / raw)
To: Tom Lendacky
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On Sat, Jul 12, 2025 at 09:54:20AM -0500, Tom Lendacky wrote:
> > So it really comes natural to split them into a msr_prot and a ghcb_prot
> > variant. If we added a separate patch ontop that does only the renaming, then
> > that would probably be more churn than necessary.
>
> Right, they already are though:
>
> __sev_cpuid_hv_msr() and __sev_cpuid_hv_ghcb()
>
> the first one meaning that the hypervisor is being called using the msr
> protocol and the second one meaning that the hypervisor is being called
> using the ghcb protocol.
>
> That's why I made the comment. Just changing
>
> __sev_cpuid_hv_msr() to __sev_cpuid_msr_prot()
>
> isn't saying anything more in my opinion.
Ok, then let's keep 'em that way. I was reacting to snp_cpuid_hv_no_ghcb()
which is snp_cpuid_hv_msr I guess.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v4 08/24] x86/sev: Share implementation of MSR-based page state change
2025-07-09 18:24 ` Tom Lendacky
@ 2025-07-15 5:09 ` Ard Biesheuvel
0 siblings, 0 replies; 43+ messages in thread
From: Ard Biesheuvel @ 2025-07-15 5:09 UTC (permalink / raw)
To: Tom Lendacky
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Borislav Petkov,
Ingo Molnar, Dionna Amalie Glaze, Kevin Loughlin, Josh Poimboeuf,
Peter Zijlstra
On Thu, 10 Jul 2025 at 04:24, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 7/9/25 03:08, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Both the decompressor and the SEV startup code implement the exact same
> > sequence for invoking the MSR based communication protocol to effectuate
> > a page state change.
> >
> > Before tweaking the internal APIs used in both versions, merge them and
> > share them so those tweaks are only needed in a single place.
>
> I think you can keep the save and restore of the MSR in the the combined
> code so that you don't need the previous patch and that will keep
> everything safe. We should be doing a minimal amount of MSR protocol
> page state changes, so it really shouldn't have much effect.
>
OK, so I'll drop the previous patch, and rework this one accordingly.
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2025-07-15 5:09 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-09 8:08 [PATCH v4 00/24] x86: strict separation of startup code Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 01/24] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-07-09 15:12 ` Tom Lendacky
2025-07-09 23:21 ` Ard Biesheuvel
2025-07-11 20:59 ` Borislav Petkov
2025-07-12 14:54 ` Tom Lendacky
2025-07-12 15:02 ` Borislav Petkov
2025-07-09 8:08 ` [PATCH v4 02/24] x86/sev: Use MSR protocol for remapping SVSM calling area Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 03/24] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call Ard Biesheuvel
2025-07-09 15:50 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 04/24] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
2025-07-09 16:13 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 05/24] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 06/24] x86/sev: Avoid global variable to store virtual address of SVSM area Ard Biesheuvel
2025-07-09 17:49 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 07/24] x86/sev: Move MSR save/restore out of early page state change helper Ard Biesheuvel
2025-07-09 18:16 ` Tom Lendacky
2025-07-09 8:08 ` [PATCH v4 08/24] x86/sev: Share implementation of MSR-based page state change Ard Biesheuvel
2025-07-09 18:24 ` Tom Lendacky
2025-07-15 5:09 ` Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 09/24] x86/sev: Pass SVSM calling area down to early page state change API Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 10/24] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 11/24] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 12/24] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
2025-07-10 4:21 ` Nikunj A Dadhania
2025-07-10 4:24 ` Ard Biesheuvel
[not found] ` <85o6tsv8m2.fsf@amd.com>
2025-07-10 7:36 ` Ard Biesheuvel
2025-07-10 8:02 ` Nikunj A Dadhania
2025-07-09 8:08 ` [PATCH v4 13/24] x86/boot: Provide PIC aliases for 5-level paging related constants Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 14/24] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 15/24] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 16/24] x86/sev: Export startup routines for later use Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 17/24] objtool: Add action to check for absence of absolute relocations Ard Biesheuvel
2025-07-09 8:08 ` [PATCH v4 18/24] x86/boot: Check startup code " Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 19/24] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 20/24] x86/kbuild: Incorporate boot/startup/ via Kbuild makefile Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 21/24] x86/boot: Create a confined code area for startup code Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 22/24] efistub/x86: Remap inittext read-execute when needed Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 23/24] x86/boot: Move startup code out of __head section Ard Biesheuvel
2025-07-09 8:09 ` [PATCH v4 24/24] x86/boot: Get rid of the .head.text section Ard Biesheuvel
2025-07-09 10:23 ` [PATCH v4 00/24] x86: strict separation of startup code Kirill A. Shutemov
2025-07-09 10:32 ` Ard Biesheuvel
2025-07-09 14:01 ` Nikunj A Dadhania
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).