* [RFT PATCH v3 00/21] x86: strict separation of startup code
@ 2025-05-12 19:08 Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
` (22 more replies)
0 siblings, 23 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
!!! Boot tested on non-SEV guest ONLY !!!!
This RFT series implements a strict separation between startup code and
ordinary code, where startup code is built in a way that tolerates being
invoked from the initial 1:1 mapping of memory.
The existsing approach of emitting this code into .head.text and
checking for absolute relocations in that section is not 100% safe, and
produces diagnostics that are sometimes difficult to interpret. [0]
Instead, rely on symbol prefixes, similar to how this is implemented for
the EFI stub and for the startup code in the arm64 port. This ensures
that startup code can only call other startup code, unless a special
symbol alias is emitted that exposes a non-startup routine to the
startup code.
This is somewhat intrusive, as there are many data objects that are
referenced both by startup code and by ordinary code, and an alias needs
to be emitted for each of those. If startup code references anything
that has not been made available to it explicitly, a build time link
error will occur.
This ultimately allows the .head.text section to be dropped entirely, as
it no longer has a special significance. Instead, code that only
executes at boot is emitted into .init.text as it should.
The majority of changes is around early SEV code. The main issue is that
the use of GHCB pages and SVSM calling areas in code that may run from
both the 1:1 mapping and the kernel virtual mapping is problematic as it
relies on __pa() to perform VA to PA translations, which are ambiguous
in this context. Also, __pa() pulls in non-trivial instrumented code
when CONFIG_DEBUG_VIRTUAL=y and so it is better to avoid VA to PA
translations altogether in the startup code.
Change since RFT/v2:
- Rebase onto tip/x86/boot and drop the patches from the previous
revision that have been applied in the meantime.
- Omit the pgtable_l5_enabled() changes for now, and just expose PIC
aliases for the variables in question - this can be sorted later.
- Don't use the boot SVSM calling area in snp_kexec_finish(), but pass
down the correct per-CPU one to the early page state API.
- Rename arch/x86/coco/sev/sev-noinstr.o to arch/x86/coco/sev/noinstr.o
- Further reduce the amount of SEV code that needs to be constructed in
a special way.
Change since RFC/v1:
- Include a major disentanglement/refactor of the SEV-SNP startup code,
so that only code that really needs to run from the 1:1 mapping is
included in the startup/ code
- Incorporate some early notes from Ingo
Build tested defconfig and allmodconfig
!!! Boot tested on non-SEV guest ONLY !!!!
Again, I will need to lean on Tom to determine whether this breaks
SEV-SNP guest boot. As I mentioned before, I am still waiting for
SEV-SNP capable hardware to be delivered.
Cc: Borislav Petkov <bp@alien8.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Dionna Amalie Glaze <dionnaglaze@google.com>
Cc: Kevin Loughlin <kevinloughlin@google.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
[0] https://lore.kernel.org/all/CAHk-=wj7k9nvJn6cpa3-5Ciwn2RGyE605BMkjWE4MqnvC9E92A@mail.gmail.com/
Ard Biesheuvel (21):
x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
x86/sev: Use MSR protocol for remapping SVSM calling area
x86/sev: Use MSR protocol only for early SVSM PVALIDATE call
x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL
x86/sev: Move GHCB page based HV communication out of startup code
x86/sev: Avoid global variable to store virtual address of SVSM area
x86/sev: Move MSR save/restore out of early page state change helper
x86/sev: Share implementation of MSR-based page state change
x86/sev: Pass SVSM calling area down to early page state change API
x86/sev: Use boot SVSM CA for all startup and init code
x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check
x86/sev: Unify SEV-SNP hypervisor feature check
x86/sev: Provide PIC aliases for SEV related data objects
x86/boot: Provide PIC aliases for 5-level paging related constants
x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
x86/sev: Export startup routines for later use
x86/boot: Create a confined code area for startup code
x86/boot: Move startup code out of __head section
x86/boot: Disallow absolute symbol references in startup code
x86/boot: Revert "Reject absolute references in .head.text"
x86/boot: Get rid of the .head.text section
arch/x86/boot/compressed/sev-handle-vc.c | 3 +
arch/x86/boot/compressed/sev.c | 132 ++-----
arch/x86/boot/startup/Makefile | 21 ++
arch/x86/boot/startup/exports.h | 12 +
arch/x86/boot/startup/gdt_idt.c | 4 +-
arch/x86/boot/startup/map_kernel.c | 4 +-
arch/x86/boot/startup/sev-shared.c | 361 +++++---------------
arch/x86/boot/startup/sev-startup.c | 190 ++---------
arch/x86/boot/startup/sme.c | 29 +-
arch/x86/coco/sev/Makefile | 6 +-
arch/x86/coco/sev/core.c | 179 +++++++---
arch/x86/coco/sev/{sev-nmi.c => noinstr.c} | 74 ++++
arch/x86/coco/sev/vc-handle.c | 3 +-
arch/x86/coco/sev/vc-shared.c | 143 +++++++-
arch/x86/include/asm/init.h | 6 -
arch/x86/include/asm/setup.h | 1 +
arch/x86/include/asm/sev-internal.h | 29 +-
arch/x86/include/asm/sev.h | 67 +++-
arch/x86/kernel/head64.c | 5 +-
arch/x86/kernel/head_32.S | 2 +-
arch/x86/kernel/head_64.S | 10 +-
arch/x86/kernel/vmlinux.lds.S | 7 +-
arch/x86/mm/mem_encrypt_amd.c | 6 -
arch/x86/mm/mem_encrypt_boot.S | 6 +-
arch/x86/platform/pvh/head.S | 2 +-
arch/x86/tools/relocs.c | 8 +-
26 files changed, 621 insertions(+), 689 deletions(-)
create mode 100644 arch/x86/boot/startup/exports.h
rename arch/x86/coco/sev/{sev-nmi.c => noinstr.c} (61%)
base-commit: ed4d95d033e359f9445e85bf5a768a5859a5830b
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply [flat|nested] 58+ messages in thread
* [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-15 7:22 ` Ingo Molnar
2025-05-15 11:10 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 02/21] x86/sev: Use MSR protocol for remapping SVSM calling area Ard Biesheuvel
` (21 subsequent siblings)
22 siblings, 2 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
There are two distinct callers of snp_cpuid(): one where the MSR
protocol is always used, and one where the GHCB page based interface is
always used.
The snp_cpuid() logic does not care about the distinction, which only
matters at a lower level. But the fact that it supports both interfaces
means that the GHCB page based logic is pulled into the early startup
code where PA to VA conversions are problematic, given that it runs from
the 1:1 mapping of memory.
So keep snp_cpuid() itself in the startup code, but factor out the
hypervisor calls via a callback, so that the GHCB page handling can be
moved out.
Code refactoring only - no functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/sev-shared.c | 58 ++++----------------
arch/x86/coco/sev/vc-shared.c | 49 ++++++++++++++++-
arch/x86/include/asm/sev.h | 3 +-
3 files changed, 61 insertions(+), 49 deletions(-)
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 7a706db87b93..408e064a80d9 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -342,44 +342,7 @@ static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
return ret;
}
-static int __sev_cpuid_hv_ghcb(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
-{
- u32 cr4 = native_read_cr4();
- int ret;
-
- ghcb_set_rax(ghcb, leaf->fn);
- ghcb_set_rcx(ghcb, leaf->subfn);
-
- if (cr4 & X86_CR4_OSXSAVE)
- /* Safe to read xcr0 */
- ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
- else
- /* xgetbv will cause #UD - use reset value for xcr0 */
- ghcb_set_xcr0(ghcb, 1);
-
- ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
- if (ret != ES_OK)
- return ret;
-
- if (!(ghcb_rax_is_valid(ghcb) &&
- ghcb_rbx_is_valid(ghcb) &&
- ghcb_rcx_is_valid(ghcb) &&
- ghcb_rdx_is_valid(ghcb)))
- return ES_VMM_ERROR;
- leaf->eax = ghcb->save.rax;
- leaf->ebx = ghcb->save.rbx;
- leaf->ecx = ghcb->save.rcx;
- leaf->edx = ghcb->save.rdx;
-
- return ES_OK;
-}
-
-static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
-{
- return ghcb ? __sev_cpuid_hv_ghcb(ghcb, ctxt, leaf)
- : __sev_cpuid_hv_msr(leaf);
-}
/*
* This may be called early while still running on the initial identity
@@ -484,21 +447,21 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
return false;
}
-static void snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+static void snp_cpuid_hv_no_ghcb(void *ctx, struct cpuid_leaf *leaf)
{
- if (sev_cpuid_hv(ghcb, ctxt, leaf))
+ if (__sev_cpuid_hv_msr(leaf))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
}
static int __head
-snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
- struct cpuid_leaf *leaf)
+snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
+ void *ctx, struct cpuid_leaf *leaf)
{
struct cpuid_leaf leaf_hv = *leaf;
switch (leaf->fn) {
case 0x1:
- snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
+ cpuid_hv(ctx, &leaf_hv);
/* initial APIC ID */
leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
@@ -517,7 +480,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
break;
case 0xB:
leaf_hv.subfn = 0;
- snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
+ cpuid_hv(ctx, &leaf_hv);
/* extended APIC ID */
leaf->edx = leaf_hv.edx;
@@ -565,7 +528,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
}
break;
case 0x8000001E:
- snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
+ cpuid_hv(ctx, &leaf_hv);
/* extended APIC ID */
leaf->eax = leaf_hv.eax;
@@ -587,7 +550,8 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
* should be treated as fatal by caller.
*/
int __head
-snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
+ void *ctx, struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -621,7 +585,7 @@ snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
return 0;
}
- return snp_cpuid_postprocess(ghcb, ctxt, leaf);
+ return snp_cpuid_postprocess(cpuid_hv, ctx, leaf);
}
/*
@@ -648,7 +612,7 @@ void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
leaf.fn = fn;
leaf.subfn = subfn;
- ret = snp_cpuid(NULL, NULL, &leaf);
+ ret = snp_cpuid(snp_cpuid_hv_no_ghcb, NULL, &leaf);
if (!ret)
goto cpuid_done;
diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
index 2c0ab0fdc060..b4688f69102e 100644
--- a/arch/x86/coco/sev/vc-shared.c
+++ b/arch/x86/coco/sev/vc-shared.c
@@ -409,15 +409,62 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}
+static int __sev_cpuid_hv_ghcb(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+{
+ u32 cr4 = native_read_cr4();
+ int ret;
+
+ ghcb_set_rax(ghcb, leaf->fn);
+ ghcb_set_rcx(ghcb, leaf->subfn);
+
+ if (cr4 & X86_CR4_OSXSAVE)
+ /* Safe to read xcr0 */
+ ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
+ else
+ /* xgetbv will cause #UD - use reset value for xcr0 */
+ ghcb_set_xcr0(ghcb, 1);
+
+ ret = sev_es_ghcb_hv_call(ghcb, ctxt, SVM_EXIT_CPUID, 0, 0);
+ if (ret != ES_OK)
+ return ret;
+
+ if (!(ghcb_rax_is_valid(ghcb) &&
+ ghcb_rbx_is_valid(ghcb) &&
+ ghcb_rcx_is_valid(ghcb) &&
+ ghcb_rdx_is_valid(ghcb)))
+ return ES_VMM_ERROR;
+
+ leaf->eax = ghcb->save.rax;
+ leaf->ebx = ghcb->save.rbx;
+ leaf->ecx = ghcb->save.rcx;
+ leaf->edx = ghcb->save.rdx;
+
+ return ES_OK;
+}
+
+struct cpuid_ctx {
+ struct ghcb *ghcb;
+ struct es_em_ctxt *ctxt;
+};
+
+static void snp_cpuid_hv_ghcb(void *p, struct cpuid_leaf *leaf)
+{
+ struct cpuid_ctx *ctx = p;
+
+ if (__sev_cpuid_hv_ghcb(ctx->ghcb, ctx->ctxt, leaf))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
+}
+
static int vc_handle_cpuid_snp(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
{
+ struct cpuid_ctx ctx = { ghcb, ctxt };
struct pt_regs *regs = ctxt->regs;
struct cpuid_leaf leaf;
int ret;
leaf.fn = regs->ax;
leaf.subfn = regs->cx;
- ret = snp_cpuid(ghcb, ctxt, &leaf);
+ ret = snp_cpuid(snp_cpuid_hv_ghcb, &ctx, &leaf);
if (!ret) {
regs->ax = leaf.eax;
regs->bx = leaf.ebx;
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 6158893786d6..f50a606be4f1 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -533,7 +533,8 @@ struct cpuid_leaf {
u32 edx;
};
-int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf);
+int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
+ void *ctx, struct cpuid_leaf *leaf);
void __noreturn sev_es_terminate(unsigned int set, unsigned int reason);
enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 02/21] x86/sev: Use MSR protocol for remapping SVSM calling area
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-15 16:43 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 03/21] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call Ard Biesheuvel
` (20 subsequent siblings)
22 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
As the preceding code comment already indicates, remapping the SVSM
calling area occurs long before the GHCB page is configured, and so
calling svsm_perform_call_protocol() is guaranteed to result in a call
to svsm_perform_msr_protocol().
So just call the latter directly. This allows most of the GHCB based API
infrastructure to be moved out of the startup code in a subsequent
patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/sev-startup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 435853a55768..a1d5a5632d58 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -325,7 +325,9 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
call.caa = svsm_get_caa();
call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
call.rcx = pa;
- ret = svsm_perform_call_protocol(&call);
+ do {
+ ret = svsm_perform_msr_protocol(&call);
+ } while (ret == -EAGAIN);
if (ret)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL);
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 03/21] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 02/21] x86/sev: Use MSR protocol for remapping SVSM calling area Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 04/21] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
` (19 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
The early page state change API performs an SVSM call to PVALIDATE each
page when running under a SVSM, and this involves either a GHCB page
based call or a call based on the MSR protocol.
The GHCB page based variant involves VA to PA translation of the GHCB
address, and this is best avoided in the startup code, where virtual
addresses are ambiguous (1:1 or kernel virtual).
As this is the last remaining occurrence of svsm_perform_call_protocol()
in the startup code, switch to the MSR protocol exclusively in this
particular case, so that the GHCB based plumbing can be moved out of the
startup code entirely in a subsequent patch.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 20 --------------------
arch/x86/boot/startup/sev-shared.c | 4 +++-
2 files changed, 3 insertions(+), 21 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 612b443296d3..bc4ec45d9935 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -50,31 +50,11 @@ u64 svsm_get_caa_pa(void)
return boot_svsm_caa_pa;
}
-int svsm_perform_call_protocol(struct svsm_call *call);
-
u8 snp_vmpl;
/* Include code for early handlers */
#include "../../boot/startup/sev-shared.c"
-int svsm_perform_call_protocol(struct svsm_call *call)
-{
- struct ghcb *ghcb;
- int ret;
-
- if (boot_ghcb)
- ghcb = boot_ghcb;
- else
- ghcb = NULL;
-
- do {
- ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
- : svsm_perform_msr_protocol(call);
- } while (ret == -EAGAIN);
-
- return ret;
-}
-
static bool sev_snp_enabled(void)
{
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 408e064a80d9..297d2abe8e3d 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -755,7 +755,9 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
call.rax = SVSM_CORE_CALL(SVSM_CORE_PVALIDATE);
call.rcx = pc_pa;
- ret = svsm_perform_call_protocol(&call);
+ do {
+ ret = svsm_perform_msr_protocol(&call);
+ } while (ret == -EAGAIN);
if (ret)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PVALIDATE);
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 04/21] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (2 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 03/21] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-20 9:44 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
` (18 subsequent siblings)
22 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Determining the VMPL at which the kernel runs involves performing a
RMPADJUST operation on an arbitary page of memory, and observing whether
it succeeds.
The use of boot_ghcb_page in the core kernel in this case is completely
arbitary, but results in the need to provide a PIC alias for it. So use
boot_svsm_ca_page instead, which already needs this alias for other
reasons.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 2 +-
arch/x86/boot/startup/sev-shared.c | 5 +++--
arch/x86/boot/startup/sev-startup.c | 2 +-
3 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index bc4ec45d9935..2141936daba7 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -327,7 +327,7 @@ static bool early_snp_init(struct boot_params *bp)
* running at VMPL0. The CA will be used to communicate with the
* SVSM and request its services.
*/
- svsm_setup_ca(cc_info);
+ svsm_setup_ca(cc_info, rip_rel_ptr(&boot_ghcb_page));
/*
* Pass run-time kernel a pointer to CC info via boot_params so EFI
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 297d2abe8e3d..9c8dd6bfe833 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -782,7 +782,8 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
* Maintain the GPA of the SVSM Calling Area (CA) in order to utilize the SVSM
* services needed when not running in VMPL0.
*/
-static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info)
+static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
+ void *page)
{
struct snp_secrets_page *secrets_page;
struct snp_cpuid_table *cpuid_table;
@@ -805,7 +806,7 @@ static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info)
* routine is running identity mapped when called, both by the decompressor
* code and the early kernel code.
*/
- if (!rmpadjust((unsigned long)rip_rel_ptr(&boot_ghcb_page), RMP_PG_SIZE_4K, 1))
+ if (!rmpadjust((unsigned long)page, RMP_PG_SIZE_4K, 1))
return false;
/*
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index a1d5a5632d58..1f928e8264bb 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -303,7 +303,7 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
* running at VMPL0. The CA will be used to communicate with the
* SVSM to perform the SVSM services.
*/
- if (!svsm_setup_ca(cc_info))
+ if (!svsm_setup_ca(cc_info, rip_rel_ptr(&boot_svsm_ca_page)))
return;
/*
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (3 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 04/21] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-20 11:38 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 06/21] x86/sev: Avoid global variable to store virtual address of SVSM area Ard Biesheuvel
` (17 subsequent siblings)
22 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Both the decompressor and the core kernel implement an early #VC
handler, which only deals with CPUID instructions, and full featured
one, which can handle any #VC exception.
The former communicates with the hypervisor using the MSR based
protocol, whereas the latter uses a shared GHCB page, which is
configured a bit later during the boot, when the kernel runs from its
ordinary virtual mapping, rather than the 1:1 mapping that the startup
code uses.
Accessing this shared GHCB page from the core kernel's startup code is
problematic, because it involves converting the GHCB address provided by
the caller to a physical address. In the startup code, virtual to
physical address translations are problematic, given that the virtual
address might be a 1:1 mapped address, and such translations should
therefore be avoided.
This means that exposing startup code dealing with the GHCB to callers
that execute from the ordinary kernel virtual mapping should be avoided
too. So move all GHCB page based communication out of the startup code,
now that all communication occurring before the kernel virtual mapping
is up relies on the MSR protocol only.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev-handle-vc.c | 3 +
arch/x86/boot/startup/sev-shared.c | 190 +-------------------
arch/x86/boot/startup/sev-startup.c | 42 -----
arch/x86/coco/sev/core.c | 76 ++++++++
arch/x86/coco/sev/vc-handle.c | 2 +
arch/x86/coco/sev/vc-shared.c | 94 ++++++++++
arch/x86/include/asm/sev-internal.h | 7 +-
arch/x86/include/asm/sev.h | 59 +++++-
8 files changed, 236 insertions(+), 237 deletions(-)
diff --git a/arch/x86/boot/compressed/sev-handle-vc.c b/arch/x86/boot/compressed/sev-handle-vc.c
index 89dd02de2a0f..7530ad8b768b 100644
--- a/arch/x86/boot/compressed/sev-handle-vc.c
+++ b/arch/x86/boot/compressed/sev-handle-vc.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
#include "misc.h"
+#include "error.h"
#include "sev.h"
#include <linux/kernel.h>
@@ -14,6 +15,8 @@
#include <asm/fpu/xcr.h>
#define __BOOT_COMPRESSED
+#undef __init
+#define __init
/* Basic instruction decoding support needed */
#include "../../lib/inat.c"
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 9c8dd6bfe833..7884884c0898 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -13,12 +13,9 @@
#ifndef __BOOT_COMPRESSED
#define error(v) pr_err(v)
-#define has_cpuflag(f) boot_cpu_has(f)
#else
#undef WARN
#define WARN(condition, format...) (!!(condition))
-#undef vc_forward_exception
-#define vc_forward_exception(c) panic("SNP: Hypervisor requested exception\n")
#endif
/*
@@ -39,7 +36,7 @@ u64 boot_svsm_caa_pa __ro_after_init;
*
* GHCB protocol version negotiated with the hypervisor.
*/
-static u16 ghcb_version __ro_after_init;
+u16 ghcb_version __ro_after_init;
/* Copy of the SNP firmware's CPUID page. */
static struct snp_cpuid_table cpuid_table_copy __ro_after_init;
@@ -54,16 +51,6 @@ static u32 cpuid_std_range_max __ro_after_init;
static u32 cpuid_hyp_range_max __ro_after_init;
static u32 cpuid_ext_range_max __ro_after_init;
-bool __init sev_es_check_cpu_features(void)
-{
- if (!has_cpuflag(X86_FEATURE_RDRAND)) {
- error("RDRAND instruction not supported - no trusted source of randomness available\n");
- return false;
- }
-
- return true;
-}
-
void __head __noreturn
sev_es_terminate(unsigned int set, unsigned int reason)
{
@@ -100,123 +87,7 @@ u64 get_hv_features(void)
return GHCB_MSR_HV_FT_RESP_VAL(val);
}
-void snp_register_ghcb_early(unsigned long paddr)
-{
- unsigned long pfn = paddr >> PAGE_SHIFT;
- u64 val;
-
- sev_es_wr_ghcb_msr(GHCB_MSR_REG_GPA_REQ_VAL(pfn));
- VMGEXIT();
-
- val = sev_es_rd_ghcb_msr();
-
- /* If the response GPA is not ours then abort the guest */
- if ((GHCB_RESP_CODE(val) != GHCB_MSR_REG_GPA_RESP) ||
- (GHCB_MSR_REG_GPA_RESP_VAL(val) != pfn))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_REGISTER);
-}
-
-bool sev_es_negotiate_protocol(void)
-{
- u64 val;
-
- /* Do the GHCB protocol version negotiation */
- sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
- VMGEXIT();
- val = sev_es_rd_ghcb_msr();
-
- if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
- return false;
-
- if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
- GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
- return false;
-
- ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
-
- return true;
-}
-
-static enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
-{
- u32 ret;
-
- ret = ghcb->save.sw_exit_info_1 & GENMASK_ULL(31, 0);
- if (!ret)
- return ES_OK;
-
- if (ret == 1) {
- u64 info = ghcb->save.sw_exit_info_2;
- unsigned long v = info & SVM_EVTINJ_VEC_MASK;
-
- /* Check if exception information from hypervisor is sane. */
- if ((info & SVM_EVTINJ_VALID) &&
- ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
- ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
- ctxt->fi.vector = v;
-
- if (info & SVM_EVTINJ_VALID_ERR)
- ctxt->fi.error_code = info >> 32;
-
- return ES_EXCEPTION;
- }
- }
-
- return ES_VMM_ERROR;
-}
-
-static inline int svsm_process_result_codes(struct svsm_call *call)
-{
- switch (call->rax_out) {
- case SVSM_SUCCESS:
- return 0;
- case SVSM_ERR_INCOMPLETE:
- case SVSM_ERR_BUSY:
- return -EAGAIN;
- default:
- return -EINVAL;
- }
-}
-
-/*
- * Issue a VMGEXIT to call the SVSM:
- * - Load the SVSM register state (RAX, RCX, RDX, R8 and R9)
- * - Set the CA call pending field to 1
- * - Issue VMGEXIT
- * - Save the SVSM return register state (RAX, RCX, RDX, R8 and R9)
- * - Perform atomic exchange of the CA call pending field
- *
- * - See the "Secure VM Service Module for SEV-SNP Guests" specification for
- * details on the calling convention.
- * - The calling convention loosely follows the Microsoft X64 calling
- * convention by putting arguments in RCX, RDX, R8 and R9.
- * - RAX specifies the SVSM protocol/callid as input and the return code
- * as output.
- */
-static __always_inline void svsm_issue_call(struct svsm_call *call, u8 *pending)
-{
- register unsigned long rax asm("rax") = call->rax;
- register unsigned long rcx asm("rcx") = call->rcx;
- register unsigned long rdx asm("rdx") = call->rdx;
- register unsigned long r8 asm("r8") = call->r8;
- register unsigned long r9 asm("r9") = call->r9;
-
- call->caa->call_pending = 1;
-
- asm volatile("rep; vmmcall\n\t"
- : "+r" (rax), "+r" (rcx), "+r" (rdx), "+r" (r8), "+r" (r9)
- : : "memory");
-
- *pending = xchg(&call->caa->call_pending, *pending);
-
- call->rax_out = rax;
- call->rcx_out = rcx;
- call->rdx_out = rdx;
- call->r8_out = r8;
- call->r9_out = r9;
-}
-
-static int svsm_perform_msr_protocol(struct svsm_call *call)
+int svsm_perform_msr_protocol(struct svsm_call *call)
{
u8 pending = 0;
u64 val, resp;
@@ -247,63 +118,6 @@ static int svsm_perform_msr_protocol(struct svsm_call *call)
return svsm_process_result_codes(call);
}
-static int svsm_perform_ghcb_protocol(struct ghcb *ghcb, struct svsm_call *call)
-{
- struct es_em_ctxt ctxt;
- u8 pending = 0;
-
- vc_ghcb_invalidate(ghcb);
-
- /*
- * Fill in protocol and format specifiers. This can be called very early
- * in the boot, so use rip-relative references as needed.
- */
- ghcb->protocol_version = ghcb_version;
- ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
-
- ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_SNP_RUN_VMPL);
- ghcb_set_sw_exit_info_1(ghcb, 0);
- ghcb_set_sw_exit_info_2(ghcb, 0);
-
- sev_es_wr_ghcb_msr(__pa(ghcb));
-
- svsm_issue_call(call, &pending);
-
- if (pending)
- return -EINVAL;
-
- switch (verify_exception_info(ghcb, &ctxt)) {
- case ES_OK:
- break;
- case ES_EXCEPTION:
- vc_forward_exception(&ctxt);
- fallthrough;
- default:
- return -EINVAL;
- }
-
- return svsm_process_result_codes(call);
-}
-
-enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
- struct es_em_ctxt *ctxt,
- u64 exit_code, u64 exit_info_1,
- u64 exit_info_2)
-{
- /* Fill in protocol and format specifiers */
- ghcb->protocol_version = ghcb_version;
- ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
-
- ghcb_set_sw_exit_code(ghcb, exit_code);
- ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
- ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
-
- sev_es_wr_ghcb_msr(__pa(ghcb));
- VMGEXIT();
-
- return verify_exception_info(ghcb, ctxt);
-}
-
static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
{
u64 val;
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 1f928e8264bb..0000885dc24c 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -41,15 +41,6 @@
#include <asm/cpuid.h>
#include <asm/cmdline.h>
-/* For early boot hypervisor communication in SEV-ES enabled guests */
-struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
-
-/*
- * Needs to be in the .data section because we need it NULL before bss is
- * cleared
- */
-struct ghcb *boot_ghcb __section(".data");
-
/* Bitmap of SEV features supported by the hypervisor */
u64 sev_hv_features __ro_after_init;
@@ -139,39 +130,6 @@ noinstr void __sev_put_ghcb(struct ghcb_state *state)
}
}
-int svsm_perform_call_protocol(struct svsm_call *call)
-{
- struct ghcb_state state;
- unsigned long flags;
- struct ghcb *ghcb;
- int ret;
-
- /*
- * This can be called very early in the boot, use native functions in
- * order to avoid paravirt issues.
- */
- flags = native_local_irq_save();
-
- if (sev_cfg.ghcbs_initialized)
- ghcb = __sev_get_ghcb(&state);
- else if (boot_ghcb)
- ghcb = boot_ghcb;
- else
- ghcb = NULL;
-
- do {
- ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
- : svsm_perform_msr_protocol(call);
- } while (ret == -EAGAIN);
-
- if (sev_cfg.ghcbs_initialized)
- __sev_put_ghcb(&state);
-
- native_local_irq_restore(flags);
-
- return ret;
-}
-
void __head
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index ac400525de73..310d867be4dc 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -100,6 +100,15 @@ DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
u8 snp_vmpl __ro_after_init;
EXPORT_SYMBOL_GPL(snp_vmpl);
+/* For early boot hypervisor communication in SEV-ES enabled guests */
+static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
+
+/*
+ * Needs to be in the .data section because we need it NULL before bss is
+ * cleared
+ */
+struct ghcb *boot_ghcb __section(".data");
+
static u64 __init get_snp_jump_table_addr(void)
{
struct snp_secrets_page *secrets;
@@ -153,6 +162,73 @@ static u64 __init get_jump_table_addr(void)
return ret;
}
+static int svsm_perform_ghcb_protocol(struct ghcb *ghcb, struct svsm_call *call)
+{
+ struct es_em_ctxt ctxt;
+ u8 pending = 0;
+
+ vc_ghcb_invalidate(ghcb);
+
+ /*
+ * Fill in protocol and format specifiers. This can be called very early
+ * in the boot, so use rip-relative references as needed.
+ */
+ ghcb->protocol_version = ghcb_version;
+ ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
+
+ ghcb_set_sw_exit_code(ghcb, SVM_VMGEXIT_SNP_RUN_VMPL);
+ ghcb_set_sw_exit_info_1(ghcb, 0);
+ ghcb_set_sw_exit_info_2(ghcb, 0);
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+
+ svsm_issue_call(call, &pending);
+
+ if (pending)
+ return -EINVAL;
+
+ switch (verify_exception_info(ghcb, &ctxt)) {
+ case ES_OK:
+ break;
+ case ES_EXCEPTION:
+ vc_forward_exception(&ctxt);
+ fallthrough;
+ default:
+ return -EINVAL;
+ }
+
+ return svsm_process_result_codes(call);
+}
+
+static int svsm_perform_call_protocol(struct svsm_call *call)
+{
+ struct ghcb_state state;
+ unsigned long flags;
+ struct ghcb *ghcb;
+ int ret;
+
+ flags = native_local_irq_save();
+
+ if (sev_cfg.ghcbs_initialized)
+ ghcb = __sev_get_ghcb(&state);
+ else if (boot_ghcb)
+ ghcb = boot_ghcb;
+ else
+ ghcb = NULL;
+
+ do {
+ ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
+ : svsm_perform_msr_protocol(call);
+ } while (ret == -EAGAIN);
+
+ if (sev_cfg.ghcbs_initialized)
+ __sev_put_ghcb(&state);
+
+ native_local_irq_restore(flags);
+
+ return ret;
+}
+
static inline void __pval_terminate(u64 pfn, bool action, unsigned int page_size,
int ret, u64 svsm_ret)
{
diff --git a/arch/x86/coco/sev/vc-handle.c b/arch/x86/coco/sev/vc-handle.c
index b4895c648024..3cadaa46b9d7 100644
--- a/arch/x86/coco/sev/vc-handle.c
+++ b/arch/x86/coco/sev/vc-handle.c
@@ -344,6 +344,8 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt,
}
#define sev_printk(fmt, ...) printk(fmt, ##__VA_ARGS__)
+#define error(v)
+#define has_cpuflag(f) boot_cpu_has(f)
#include "vc-shared.c"
diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
index b4688f69102e..9b01c9ad81be 100644
--- a/arch/x86/coco/sev/vc-shared.c
+++ b/arch/x86/coco/sev/vc-shared.c
@@ -409,6 +409,53 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}
+enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
+{
+ u32 ret;
+
+ ret = ghcb->save.sw_exit_info_1 & GENMASK_ULL(31, 0);
+ if (!ret)
+ return ES_OK;
+
+ if (ret == 1) {
+ u64 info = ghcb->save.sw_exit_info_2;
+ unsigned long v = info & SVM_EVTINJ_VEC_MASK;
+
+ /* Check if exception information from hypervisor is sane. */
+ if ((info & SVM_EVTINJ_VALID) &&
+ ((v == X86_TRAP_GP) || (v == X86_TRAP_UD)) &&
+ ((info & SVM_EVTINJ_TYPE_MASK) == SVM_EVTINJ_TYPE_EXEPT)) {
+ ctxt->fi.vector = v;
+
+ if (info & SVM_EVTINJ_VALID_ERR)
+ ctxt->fi.error_code = info >> 32;
+
+ return ES_EXCEPTION;
+ }
+ }
+
+ return ES_VMM_ERROR;
+}
+
+enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
+ struct es_em_ctxt *ctxt,
+ u64 exit_code, u64 exit_info_1,
+ u64 exit_info_2)
+{
+ /* Fill in protocol and format specifiers */
+ ghcb->protocol_version = ghcb_version;
+ ghcb->ghcb_usage = GHCB_DEFAULT_USAGE;
+
+ ghcb_set_sw_exit_code(ghcb, exit_code);
+ ghcb_set_sw_exit_info_1(ghcb, exit_info_1);
+ ghcb_set_sw_exit_info_2(ghcb, exit_info_2);
+
+ sev_es_wr_ghcb_msr(__pa(ghcb));
+ VMGEXIT();
+
+ return verify_exception_info(ghcb, ctxt);
+}
+
static int __sev_cpuid_hv_ghcb(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
{
u32 cr4 = native_read_cr4();
@@ -549,3 +596,50 @@ static enum es_result vc_handle_rdtsc(struct ghcb *ghcb,
return ES_OK;
}
+
+void snp_register_ghcb_early(unsigned long paddr)
+{
+ unsigned long pfn = paddr >> PAGE_SHIFT;
+ u64 val;
+
+ sev_es_wr_ghcb_msr(GHCB_MSR_REG_GPA_REQ_VAL(pfn));
+ VMGEXIT();
+
+ val = sev_es_rd_ghcb_msr();
+
+ /* If the response GPA is not ours then abort the guest */
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_REG_GPA_RESP) ||
+ (GHCB_MSR_REG_GPA_RESP_VAL(val) != pfn))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_REGISTER);
+}
+
+bool __init sev_es_check_cpu_features(void)
+{
+ if (!has_cpuflag(X86_FEATURE_RDRAND)) {
+ error("RDRAND instruction not supported - no trusted source of randomness available\n");
+ return false;
+ }
+
+ return true;
+}
+
+bool sev_es_negotiate_protocol(void)
+{
+ u64 val;
+
+ /* Do the GHCB protocol version negotiation */
+ sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
+ VMGEXIT();
+ val = sev_es_rd_ghcb_msr();
+
+ if (GHCB_MSR_INFO(val) != GHCB_MSR_SEV_INFO_RESP)
+ return false;
+
+ if (GHCB_MSR_PROTO_MAX(val) < GHCB_PROTOCOL_MIN ||
+ GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
+ return false;
+
+ ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+
+ return true;
+}
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index b7232081f8f7..4269d9dbefdf 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -2,7 +2,6 @@
#define DR7_RESET_VALUE 0x400
-extern struct ghcb boot_ghcb_page;
extern u64 sev_hv_features;
extern u64 sev_secrets_pa;
@@ -80,7 +79,8 @@ static __always_inline u64 svsm_get_caa_pa(void)
return boot_svsm_caa_pa;
}
-int svsm_perform_call_protocol(struct svsm_call *call);
+enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt);
+void vc_forward_exception(struct es_em_ctxt *ctxt);
static inline u64 sev_es_rd_ghcb_msr(void)
{
@@ -97,9 +97,6 @@ static __always_inline void sev_es_wr_ghcb_msr(u64 val)
native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
}
-void snp_register_ghcb_early(unsigned long paddr);
-bool sev_es_negotiate_protocol(void);
-bool sev_es_check_cpu_features(void);
u64 get_hv_features(void);
const struct snp_cpuid_table *snp_cpuid_get_table(void);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index f50a606be4f1..07081bb85331 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -485,6 +485,7 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
struct snp_guest_request_ioctl;
void setup_ghcb(void);
+void snp_register_ghcb_early(unsigned long paddr);
void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned long npages);
void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
@@ -521,8 +522,6 @@ static __always_inline void vc_ghcb_invalidate(struct ghcb *ghcb)
__builtin_memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
}
-void vc_forward_exception(struct es_em_ctxt *ctxt);
-
/* I/O parameters for CPUID-related helpers */
struct cpuid_leaf {
u32 fn;
@@ -533,15 +532,71 @@ struct cpuid_leaf {
u32 edx;
};
+int svsm_perform_msr_protocol(struct svsm_call *call);
int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
void *ctx, struct cpuid_leaf *leaf);
+/*
+ * Issue a VMGEXIT to call the SVSM:
+ * - Load the SVSM register state (RAX, RCX, RDX, R8 and R9)
+ * - Set the CA call pending field to 1
+ * - Issue VMGEXIT
+ * - Save the SVSM return register state (RAX, RCX, RDX, R8 and R9)
+ * - Perform atomic exchange of the CA call pending field
+ *
+ * - See the "Secure VM Service Module for SEV-SNP Guests" specification for
+ * details on the calling convention.
+ * - The calling convention loosely follows the Microsoft X64 calling
+ * convention by putting arguments in RCX, RDX, R8 and R9.
+ * - RAX specifies the SVSM protocol/callid as input and the return code
+ * as output.
+ */
+static __always_inline void svsm_issue_call(struct svsm_call *call, u8 *pending)
+{
+ register unsigned long rax asm("rax") = call->rax;
+ register unsigned long rcx asm("rcx") = call->rcx;
+ register unsigned long rdx asm("rdx") = call->rdx;
+ register unsigned long r8 asm("r8") = call->r8;
+ register unsigned long r9 asm("r9") = call->r9;
+
+ call->caa->call_pending = 1;
+
+ asm volatile("rep; vmmcall\n\t"
+ : "+r" (rax), "+r" (rcx), "+r" (rdx), "+r" (r8), "+r" (r9)
+ : : "memory");
+
+ *pending = xchg(&call->caa->call_pending, *pending);
+
+ call->rax_out = rax;
+ call->rcx_out = rcx;
+ call->rdx_out = rdx;
+ call->r8_out = r8;
+ call->r9_out = r9;
+}
+
+static inline int svsm_process_result_codes(struct svsm_call *call)
+{
+ switch (call->rax_out) {
+ case SVSM_SUCCESS:
+ return 0;
+ case SVSM_ERR_INCOMPLETE:
+ case SVSM_ERR_BUSY:
+ return -EAGAIN;
+ default:
+ return -EINVAL;
+ }
+}
+
void __noreturn sev_es_terminate(unsigned int set, unsigned int reason);
enum es_result sev_es_ghcb_hv_call(struct ghcb *ghcb,
struct es_em_ctxt *ctxt,
u64 exit_code, u64 exit_info_1,
u64 exit_info_2);
+bool sev_es_negotiate_protocol(void);
+bool sev_es_check_cpu_features(void);
+
+extern u16 ghcb_version;
extern struct ghcb *boot_ghcb;
#else /* !CONFIG_AMD_MEM_ENCRYPT */
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 06/21] x86/sev: Avoid global variable to store virtual address of SVSM area
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (4 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 07/21] x86/sev: Move MSR save/restore out of early page state change helper Ard Biesheuvel
` (16 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
The boottime SVSM calling area is used both by the startup code running
from a 1:1 mapping, and potentially later on running from the ordinary
kernel mapping.
This SVSM calling area is statically allocated, and so its physical
address doesn't change. However, its virtual address depends on the
calling context (1:1 mapping or kernel virtual mapping), and even though
the variable that holds the virtual address of this calling area gets
updated from 1:1 address to kernel address during the boot, it is hard
to reason about why this is guaranteed to be safe.
So instead, take the RIP-relative address of the boottime SVSM calling
area whenever its virtual address is required, and only use a global
variable for the physical address.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 5 ++---
arch/x86/boot/startup/sev-shared.c | 6 ------
arch/x86/boot/startup/sev-startup.c | 4 ++--
arch/x86/coco/sev/core.c | 9 ---------
arch/x86/include/asm/sev-internal.h | 3 +--
arch/x86/include/asm/sev.h | 2 --
arch/x86/mm/mem_encrypt_amd.c | 6 ------
7 files changed, 5 insertions(+), 30 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 2141936daba7..70c3f4fc4349 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -37,12 +37,12 @@ struct ghcb *boot_ghcb;
#define __BOOT_COMPRESSED
-extern struct svsm_ca *boot_svsm_caa;
extern u64 boot_svsm_caa_pa;
struct svsm_ca *svsm_get_caa(void)
{
- return boot_svsm_caa;
+ /* The decompressor is mapped 1:1 so VA == PA */
+ return (struct svsm_ca *)boot_svsm_caa_pa;
}
u64 svsm_get_caa_pa(void)
@@ -530,7 +530,6 @@ bool early_is_sevsnp_guest(void)
/* Obtain the address of the calling area to use */
boot_rdmsr(MSR_SVSM_CAA, &m);
- boot_svsm_caa = (void *)m.q;
boot_svsm_caa_pa = m.q;
/*
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 7884884c0898..9e0573aa29c1 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -26,7 +26,6 @@
* early boot, both with identity mapped virtual addresses and proper kernel
* virtual addresses.
*/
-struct svsm_ca *boot_svsm_caa __ro_after_init;
u64 boot_svsm_caa_pa __ro_after_init;
/*
@@ -648,11 +647,6 @@ static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
if (caa & (PAGE_SIZE - 1))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CAA);
- /*
- * The CA is identity mapped when this routine is called, both by the
- * decompressor code and the early kernel code.
- */
- boot_svsm_caa = (struct svsm_ca *)caa;
boot_svsm_caa_pa = caa;
/* Advertise the SVSM presence via CPUID. */
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 0000885dc24c..24e7082e1a50 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -252,6 +252,7 @@ static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
{
+ struct snp_secrets_page *secrets = (void *)cc_info->secrets_phys;
struct svsm_call call = {};
int ret;
u64 pa;
@@ -280,7 +281,7 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
* RAX = 0 (Protocol=0, CallID=0)
* RCX = New CA GPA
*/
- call.caa = svsm_get_caa();
+ call.caa = (struct svsm_ca *)secrets->svsm_caa;
call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
call.rcx = pa;
do {
@@ -289,7 +290,6 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
if (ret)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL);
- boot_svsm_caa = (struct svsm_ca *)pa;
boot_svsm_caa_pa = pa;
}
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 310d867be4dc..0e0ddf4c92aa 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -1496,15 +1496,6 @@ void sev_show_status(void)
pr_cont("\n");
}
-void __init snp_update_svsm_ca(void)
-{
- if (!snp_vmpl)
- return;
-
- /* Update the CAA to a proper kernel address */
- boot_svsm_caa = &boot_svsm_ca_page;
-}
-
#ifdef CONFIG_SYSFS
static ssize_t vmpl_show(struct kobject *kobj,
struct kobj_attribute *attr, char *buf)
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 4269d9dbefdf..e3b203c280aa 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -60,7 +60,6 @@ void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
DECLARE_PER_CPU(struct svsm_ca *, svsm_caa);
DECLARE_PER_CPU(u64, svsm_caa_pa);
-extern struct svsm_ca *boot_svsm_caa;
extern u64 boot_svsm_caa_pa;
static __always_inline struct svsm_ca *svsm_get_caa(void)
@@ -68,7 +67,7 @@ static __always_inline struct svsm_ca *svsm_get_caa(void)
if (sev_cfg.use_cas)
return this_cpu_read(svsm_caa);
else
- return boot_svsm_caa;
+ return rip_rel_ptr(&boot_svsm_ca_page);
}
static __always_inline u64 svsm_get_caa_pa(void)
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 07081bb85331..ae2502253bd3 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -501,7 +501,6 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end);
u64 snp_get_unsupported_features(u64 status);
u64 sev_get_status(void);
void sev_show_status(void);
-void snp_update_svsm_ca(void);
int prepare_pte_enc(struct pte_enc_desc *d);
void set_pte_enc_mask(pte_t *kpte, unsigned long pfn, pgprot_t new_prot);
void snp_kexec_finish(void);
@@ -629,7 +628,6 @@ static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
static inline u64 sev_get_status(void) { return 0; }
static inline void sev_show_status(void) { }
-static inline void snp_update_svsm_ca(void) { }
static inline int prepare_pte_enc(struct pte_enc_desc *d) { return 0; }
static inline void set_pte_enc_mask(pte_t *kpte, unsigned long pfn, pgprot_t new_prot) { }
static inline void snp_kexec_finish(void) { }
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index faf3a13fb6ba..2f8c32173972 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -536,12 +536,6 @@ void __init sme_early_init(void)
x86_init.resources.dmi_setup = snp_dmi_setup;
}
- /*
- * Switch the SVSM CA mapping (if active) from identity mapped to
- * kernel mapped.
- */
- snp_update_svsm_ca();
-
if (sev_status & MSR_AMD64_SNP_SECURE_TSC)
setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
}
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 07/21] x86/sev: Move MSR save/restore out of early page state change helper
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (5 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 06/21] x86/sev: Avoid global variable to store virtual address of SVSM area Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 08/21] x86/sev: Share implementation of MSR-based page state change Ard Biesheuvel
` (15 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
The function __page_state_change() in the decompressor is very similar
to the loop in early_set_pages_state(), and they can share this code
once the MSR save/restore is moved out.
This also avoids doing the preserve/restore for each page in a longer
sequence unnecessarily.
This simplifies subsequent changes, where the APIs used by
__page_state_change() are modified for better separation between startup
code and ordinary code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 70c3f4fc4349..bdedf4bd23ec 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -71,9 +71,6 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if (op == SNP_PAGE_STATE_SHARED)
pvalidate_4k_page(paddr, paddr, false);
- /* Save the current GHCB MSR value */
- msr = sev_es_rd_ghcb_msr();
-
/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
VMGEXIT();
@@ -83,9 +80,6 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
- /* Restore the GHCB MSR value */
- sev_es_wr_ghcb_msr(msr);
-
/*
* Now that page state is changed in the RMP table, validate it so that it is
* consistent with the RMP entry.
@@ -96,18 +90,26 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
void snp_set_page_private(unsigned long paddr)
{
+ u64 msr;
+
if (!sev_snp_enabled())
return;
+ msr = sev_es_rd_ghcb_msr();
__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+ sev_es_wr_ghcb_msr(msr);
}
void snp_set_page_shared(unsigned long paddr)
{
+ u64 msr;
+
if (!sev_snp_enabled())
return;
+ msr = sev_es_rd_ghcb_msr();
__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+ sev_es_wr_ghcb_msr(msr);
}
bool early_setup_ghcb(void)
@@ -132,8 +134,11 @@ bool early_setup_ghcb(void)
void snp_accept_memory(phys_addr_t start, phys_addr_t end)
{
+ u64 msr = sev_es_rd_ghcb_msr();
+
for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
__page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
+ sev_es_wr_ghcb_msr(msr);
}
void sev_es_shutdown_ghcb(void)
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 08/21] x86/sev: Share implementation of MSR-based page state change
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (6 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 07/21] x86/sev: Move MSR save/restore out of early page state change helper Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 09/21] x86/sev: Pass SVSM calling area down to early page state change API Ard Biesheuvel
` (14 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Both the decompressor and the SEV startup code implement the exact same
sequence for invoking the MSR based communication protocol to effectuate
a page state change.
Before tweaking the internal APIs used in both versions, merge them and
share them so those tweaks are only needed in a single place.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 28 -------------------
arch/x86/boot/startup/sev-shared.c | 28 +++++++++++++++++++
arch/x86/boot/startup/sev-startup.c | 29 +-------------------
3 files changed, 29 insertions(+), 56 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index bdedf4bd23ec..7a01eef9ae01 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -60,34 +60,6 @@ static bool sev_snp_enabled(void)
return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
}
-static void __page_state_change(unsigned long paddr, enum psc_op op)
-{
- u64 val, msr;
-
- /*
- * If private -> shared then invalidate the page before requesting the
- * state change in the RMP table.
- */
- if (op == SNP_PAGE_STATE_SHARED)
- pvalidate_4k_page(paddr, paddr, false);
-
- /* Issue VMGEXIT to change the page state in RMP table. */
- sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
- VMGEXIT();
-
- /* Read the response of the VMGEXIT. */
- val = sev_es_rd_ghcb_msr();
- if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- /*
- * Now that page state is changed in the RMP table, validate it so that it is
- * consistent with the RMP entry.
- */
- if (op == SNP_PAGE_STATE_PRIVATE)
- pvalidate_4k_page(paddr, paddr, true);
-}
-
void snp_set_page_private(unsigned long paddr)
{
u64 msr;
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 9e0573aa29c1..dae770327b50 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -591,6 +591,34 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
}
}
+static void __head __page_state_change(unsigned long paddr, enum psc_op op)
+{
+ u64 val;
+
+ /*
+ * If private -> shared then invalidate the page before requesting the
+ * state change in the RMP table.
+ */
+ if (op == SNP_PAGE_STATE_SHARED)
+ pvalidate_4k_page(paddr, paddr, false);
+
+ /* Issue VMGEXIT to change the page state in RMP table. */
+ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
+ VMGEXIT();
+
+ /* Read the response of the VMGEXIT. */
+ val = sev_es_rd_ghcb_msr();
+ if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+
+ /*
+ * Now that page state is changed in the RMP table, validate it so that it is
+ * consistent with the RMP entry.
+ */
+ if (op == SNP_PAGE_STATE_PRIVATE)
+ pvalidate_4k_page(paddr, paddr, true);
+}
+
/*
* Maintain the GPA of the SVSM Calling Area (CA) in order to utilize the SVSM
* services needed when not running in VMPL0.
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 24e7082e1a50..28bf68753580 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -135,7 +135,6 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op)
{
unsigned long paddr_end;
- u64 val;
vaddr = vaddr & PAGE_MASK;
@@ -143,37 +142,11 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
paddr_end = paddr + (npages << PAGE_SHIFT);
while (paddr < paddr_end) {
- /* Page validation must be rescinded before changing to shared */
- if (op == SNP_PAGE_STATE_SHARED)
- pvalidate_4k_page(vaddr, paddr, false);
-
- /*
- * Use the MSR protocol because this function can be called before
- * the GHCB is established.
- */
- sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
- VMGEXIT();
-
- val = sev_es_rd_ghcb_msr();
-
- if (GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP)
- goto e_term;
-
- if (GHCB_MSR_PSC_RESP_VAL(val))
- goto e_term;
-
- /* Page validation must be performed after changing to private */
- if (op == SNP_PAGE_STATE_PRIVATE)
- pvalidate_4k_page(vaddr, paddr, true);
+ __page_state_change(paddr, op);
vaddr += PAGE_SIZE;
paddr += PAGE_SIZE;
}
-
- return;
-
-e_term:
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
}
void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 09/21] x86/sev: Pass SVSM calling area down to early page state change API
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (7 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 08/21] x86/sev: Share implementation of MSR-based page state change Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-13 13:55 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 10/21] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
` (13 subsequent siblings)
22 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
The early page state change API is mostly only used very early, when
only the boot time SVSM calling area is in use. However, this API is
also called by the kexec finishing code, which runs very late, and
potentially from a different CPU (which uses a different calling area).
To avoid pulling the per-CPU SVSM calling area pointers and related SEV
state into the startup code, refactor the page state change API so the
SVSM calling area virtual and physical addresses can be provided by the
caller.
No functional change intended.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 12 +++++++++---
arch/x86/boot/startup/sev-shared.c | 18 ++++++++++--------
arch/x86/boot/startup/sev-startup.c | 11 +++++++----
arch/x86/coco/sev/core.c | 3 ++-
arch/x86/include/asm/sev-internal.h | 3 ++-
5 files changed, 30 insertions(+), 17 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 7a01eef9ae01..04bc39d065ff 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -68,7 +68,9 @@ void snp_set_page_private(unsigned long paddr)
return;
msr = sev_es_rd_ghcb_msr();
- __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
+ __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE,
+ (struct svsm_ca *)boot_svsm_caa_pa,
+ boot_svsm_caa_pa);
sev_es_wr_ghcb_msr(msr);
}
@@ -80,7 +82,9 @@ void snp_set_page_shared(unsigned long paddr)
return;
msr = sev_es_rd_ghcb_msr();
- __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
+ __page_state_change(paddr, SNP_PAGE_STATE_SHARED,
+ (struct svsm_ca *)boot_svsm_caa_pa,
+ boot_svsm_caa_pa);
sev_es_wr_ghcb_msr(msr);
}
@@ -109,7 +113,9 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
u64 msr = sev_es_rd_ghcb_msr();
for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
- __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
+ __page_state_change(pa, SNP_PAGE_STATE_PRIVATE,
+ (struct svsm_ca *)boot_svsm_caa_pa,
+ boot_svsm_caa_pa);
sev_es_wr_ghcb_msr(msr);
}
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index dae770327b50..70ad9a0aa023 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -538,7 +538,8 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
}
}
-static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
+static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
+ struct svsm_ca *caa, u64 caa_pa)
{
struct svsm_pvalidate_call *pc;
struct svsm_call call = {};
@@ -552,10 +553,10 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
*/
flags = native_local_irq_save();
- call.caa = svsm_get_caa();
+ call.caa = caa;
pc = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
- pc_pa = svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
+ pc_pa = caa_pa + offsetof(struct svsm_ca, svsm_buffer);
pc->num_entries = 1;
pc->cur_index = 0;
@@ -578,12 +579,12 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
}
static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
- bool validate)
+ bool validate, struct svsm_ca *caa, u64 caa_pa)
{
int ret;
if (snp_vmpl) {
- svsm_pval_4k_page(paddr, validate);
+ svsm_pval_4k_page(paddr, validate, caa, caa_pa);
} else {
ret = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
if (ret)
@@ -591,7 +592,8 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
}
}
-static void __head __page_state_change(unsigned long paddr, enum psc_op op)
+static void __head __page_state_change(unsigned long paddr, enum psc_op op,
+ struct svsm_ca *caa, u64 caa_pa)
{
u64 val;
@@ -600,7 +602,7 @@ static void __head __page_state_change(unsigned long paddr, enum psc_op op)
* state change in the RMP table.
*/
if (op == SNP_PAGE_STATE_SHARED)
- pvalidate_4k_page(paddr, paddr, false);
+ pvalidate_4k_page(paddr, paddr, false, caa, caa_pa);
/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
@@ -616,7 +618,7 @@ static void __head __page_state_change(unsigned long paddr, enum psc_op op)
* consistent with the RMP entry.
*/
if (op == SNP_PAGE_STATE_PRIVATE)
- pvalidate_4k_page(paddr, paddr, true);
+ pvalidate_4k_page(paddr, paddr, true, caa, caa_pa);
}
/*
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 28bf68753580..7a3ad17d06f6 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -132,7 +132,8 @@ noinstr void __sev_put_ghcb(struct ghcb_state *state)
void __head
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
- unsigned long npages, enum psc_op op)
+ unsigned long npages, enum psc_op op,
+ struct svsm_ca *caa, u64 caa_pa)
{
unsigned long paddr_end;
@@ -142,7 +143,7 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
paddr_end = paddr + (npages << PAGE_SHIFT);
while (paddr < paddr_end) {
- __page_state_change(paddr, op);
+ __page_state_change(paddr, op, caa, caa_pa);
vaddr += PAGE_SIZE;
paddr += PAGE_SIZE;
@@ -165,7 +166,8 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* Ask the hypervisor to mark the memory pages as private in the RMP
* table.
*/
- early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE);
+ early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE,
+ svsm_get_caa(), svsm_get_caa_pa());
}
void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
@@ -181,7 +183,8 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
return;
/* Ask hypervisor to mark the memory pages shared in the RMP table. */
- early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED);
+ early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED,
+ svsm_get_caa(), svsm_get_caa_pa());
}
/*
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 0e0ddf4c92aa..39bbbea09c24 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -584,7 +584,8 @@ static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
/* Use the MSR protocol when a GHCB is not available. */
if (!boot_ghcb)
- return early_set_pages_state(vaddr, __pa(vaddr), npages, op);
+ return early_set_pages_state(vaddr, __pa(vaddr), npages, op,
+ svsm_get_caa(), svsm_get_caa_pa());
vaddr = vaddr & PAGE_MASK;
vaddr_end = vaddr + (npages << PAGE_SHIFT);
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index e3b203c280aa..08e2cfdef512 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -55,7 +55,8 @@ DECLARE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
DECLARE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
- unsigned long npages, enum psc_op op);
+ unsigned long npages, enum psc_op op,
+ struct svsm_ca *ca, u64 caa_pa);
DECLARE_PER_CPU(struct svsm_ca *, svsm_caa);
DECLARE_PER_CPU(u64, svsm_caa_pa);
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 10/21] x86/sev: Use boot SVSM CA for all startup and init code
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (8 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 09/21] x86/sev: Pass SVSM calling area down to early page state change API Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 11/21] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
` (12 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
To avoid having to reason about whether or not to use the per-CPU SVSM
calling area when running startup and init code on the boot CPU, reuse
the boot SVSM calling area as the per-CPU area for CPU #0.
This removes the need to make the per-CPU variables and associated state
in sev_cfg accessible to the startup code once confined.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 13 ------
arch/x86/boot/startup/sev-startup.c | 7 +--
arch/x86/coco/sev/core.c | 47 +++++++++-----------
arch/x86/include/asm/sev-internal.h | 16 -------
4 files changed, 24 insertions(+), 59 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 04bc39d065ff..fc0119bdc878 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -37,19 +37,6 @@ struct ghcb *boot_ghcb;
#define __BOOT_COMPRESSED
-extern u64 boot_svsm_caa_pa;
-
-struct svsm_ca *svsm_get_caa(void)
-{
- /* The decompressor is mapped 1:1 so VA == PA */
- return (struct svsm_ca *)boot_svsm_caa_pa;
-}
-
-u64 svsm_get_caa_pa(void)
-{
- return boot_svsm_caa_pa;
-}
-
u8 snp_vmpl;
/* Include code for early handlers */
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 7a3ad17d06f6..2e946dab036c 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -50,9 +50,6 @@ u64 sev_secrets_pa __ro_after_init;
/* For early boot SVSM communication */
struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);
-DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
-DEFINE_PER_CPU(u64, svsm_caa_pa);
-
/*
* Nothing shall interrupt this code path while holding the per-CPU
* GHCB. The backup GHCB is only for NMIs interrupting this path.
@@ -167,7 +164,7 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* table.
*/
early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE,
- svsm_get_caa(), svsm_get_caa_pa());
+ rip_rel_ptr(&boot_svsm_ca_page), boot_svsm_caa_pa);
}
void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
@@ -184,7 +181,7 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
/* Ask hypervisor to mark the memory pages shared in the RMP table. */
early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED,
- svsm_get_caa(), svsm_get_caa_pa());
+ rip_rel_ptr(&boot_svsm_ca_page), boot_svsm_caa_pa);
}
/*
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index 39bbbea09c24..fa7fdd11a45b 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -45,6 +45,25 @@
#include <asm/cpuid.h>
#include <asm/cmdline.h>
+DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
+DEFINE_PER_CPU(u64, svsm_caa_pa);
+
+static inline struct svsm_ca *svsm_get_caa(void)
+{
+ if (sev_cfg.use_cas)
+ return this_cpu_read(svsm_caa);
+ else
+ return rip_rel_ptr(&boot_svsm_ca_page);
+}
+
+static inline u64 svsm_get_caa_pa(void)
+{
+ if (sev_cfg.use_cas)
+ return this_cpu_read(svsm_caa_pa);
+ else
+ return boot_svsm_caa_pa;
+}
+
/* AP INIT values as documented in the APM2 section "Processor Initialization State" */
#define AP_INIT_CS_LIMIT 0xffff
#define AP_INIT_DS_LIMIT 0xffff
@@ -1207,7 +1226,8 @@ static void __init alloc_runtime_data(int cpu)
struct svsm_ca *caa;
/* Allocate the SVSM CA page if an SVSM is present */
- caa = memblock_alloc_or_panic(sizeof(*caa), PAGE_SIZE);
+ caa = cpu ? memblock_alloc_or_panic(sizeof(*caa), PAGE_SIZE)
+ : &boot_svsm_ca_page;
per_cpu(svsm_caa, cpu) = caa;
per_cpu(svsm_caa_pa, cpu) = __pa(caa);
@@ -1261,32 +1281,9 @@ void __init sev_es_init_vc_handling(void)
init_ghcb(cpu);
}
- /* If running under an SVSM, switch to the per-cpu CA */
- if (snp_vmpl) {
- struct svsm_call call = {};
- unsigned long flags;
- int ret;
-
- local_irq_save(flags);
-
- /*
- * SVSM_CORE_REMAP_CA call:
- * RAX = 0 (Protocol=0, CallID=0)
- * RCX = New CA GPA
- */
- call.caa = svsm_get_caa();
- call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
- call.rcx = this_cpu_read(svsm_caa_pa);
- ret = svsm_perform_call_protocol(&call);
- if (ret)
- panic("Can't remap the SVSM CA, ret=%d, rax_out=0x%llx\n",
- ret, call.rax_out);
-
+ if (snp_vmpl)
sev_cfg.use_cas = true;
- local_irq_restore(flags);
- }
-
sev_es_setup_play_dead();
/* Secondary CPUs use the runtime #VC handler */
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 08e2cfdef512..3690994275dd 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -63,22 +63,6 @@ DECLARE_PER_CPU(u64, svsm_caa_pa);
extern u64 boot_svsm_caa_pa;
-static __always_inline struct svsm_ca *svsm_get_caa(void)
-{
- if (sev_cfg.use_cas)
- return this_cpu_read(svsm_caa);
- else
- return rip_rel_ptr(&boot_svsm_ca_page);
-}
-
-static __always_inline u64 svsm_get_caa_pa(void)
-{
- if (sev_cfg.use_cas)
- return this_cpu_read(svsm_caa_pa);
- else
- return boot_svsm_caa_pa;
-}
-
enum es_result verify_exception_info(struct ghcb *ghcb, struct es_em_ctxt *ctxt);
void vc_forward_exception(struct es_em_ctxt *ctxt);
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 11/21] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (9 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 10/21] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
` (11 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
snp_vmpl will be assigned a non-zero value when executing at a VMPL
other than 0, and this is inferred from a call to RMPADJUST, which only
works when running at VMPL0.
This means that testing snp_vmpl is sufficient, and there is no need to
perform the same check again.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 20 +++-----------------
1 file changed, 3 insertions(+), 17 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index fc0119bdc878..68fc3d179bbe 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -403,30 +403,16 @@ void sev_enable(struct boot_params *bp)
*/
if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
u64 hv_features;
- int ret;
hv_features = get_hv_features();
if (!(hv_features & GHCB_HV_FT_SNP))
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
/*
- * Enforce running at VMPL0 or with an SVSM.
- *
- * Use RMPADJUST (see the rmpadjust() function for a description of
- * what the instruction does) to update the VMPL1 permissions of a
- * page. If the guest is running at VMPL0, this will succeed. If the
- * guest is running at any other VMPL, this will fail. Linux SNP guests
- * only ever run at a single VMPL level so permission mask changes of a
- * lesser-privileged VMPL are a don't-care.
+ * Running at VMPL0 is required unless an SVSM is present and
+ * the hypervisor supports the required SVSM GHCB events.
*/
- ret = rmpadjust((unsigned long)&boot_ghcb_page, RMP_PG_SIZE_4K, 1);
-
- /*
- * Running at VMPL0 is not required if an SVSM is present and the hypervisor
- * supports the required SVSM GHCB events.
- */
- if (ret &&
- !(snp_vmpl && (hv_features & GHCB_HV_FT_SNP_MULTI_VMPL)))
+ if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
}
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (10 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 11/21] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-30 11:16 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 13/21] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
` (10 subsequent siblings)
22 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
The decompressor and the core kernel both check the hypervisor feature
mask exposed by the hypervisor, but test it in slightly different ways.
This disparity seems unintentional, and simply a result of the fact that
the decompressor and the core kernel evolve differently over time when
it comes to setting up the SEV-SNP execution context.
So move the HV feature check into a helper function and call that
instead. For the core kernel, move the check to an earlier boot stage,
right after the point where it is established that the guest is
executing in SEV-SNP mode.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 19 +----------
arch/x86/boot/startup/sev-shared.c | 33 +++++++++++++++-----
arch/x86/boot/startup/sme.c | 2 ++
arch/x86/coco/sev/core.c | 11 -------
arch/x86/include/asm/sev-internal.h | 2 +-
arch/x86/include/asm/sev.h | 2 ++
6 files changed, 32 insertions(+), 37 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 68fc3d179bbe..4b7a99b2f822 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -397,24 +397,7 @@ void sev_enable(struct boot_params *bp)
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_PROT_UNSUPPORTED);
}
- /*
- * SNP is supported in v2 of the GHCB spec which mandates support for HV
- * features.
- */
- if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
- u64 hv_features;
-
- hv_features = get_hv_features();
- if (!(hv_features & GHCB_HV_FT_SNP))
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
-
- /*
- * Running at VMPL0 is required unless an SVSM is present and
- * the hypervisor supports the required SVSM GHCB events.
- */
- if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
- }
+ snp_check_hv_features();
if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
error("SEV-SNP supported indicated by CC blob, but not SEV status MSR.");
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 70ad9a0aa023..560985ef8df6 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -66,16 +66,10 @@ sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}
-/*
- * The hypervisor features are available from GHCB version 2 onward.
- */
-u64 get_hv_features(void)
+static u64 __head get_hv_features(void)
{
u64 val;
- if (ghcb_version < 2)
- return 0;
-
sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
VMGEXIT();
@@ -86,6 +80,31 @@ u64 get_hv_features(void)
return GHCB_MSR_HV_FT_RESP_VAL(val);
}
+u64 __head snp_check_hv_features(void)
+{
+ /*
+ * SNP is supported in v2 of the GHCB spec which mandates support for HV
+ * features.
+ */
+ if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
+ u64 hv_features;
+
+ hv_features = get_hv_features();
+ if (!(hv_features & GHCB_HV_FT_SNP))
+ sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
+
+ /*
+ * Running at VMPL0 is required unless an SVSM is present and
+ * the hypervisor supports the required SVSM GHCB events.
+ */
+ if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
+ sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
+
+ return hv_features;
+ }
+ return 0;
+}
+
int svsm_perform_msr_protocol(struct svsm_call *call)
{
u8 pending = 0;
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 753cd2094080..0ae04e333f51 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -533,6 +533,8 @@ void __head sme_enable(struct boot_params *bp)
if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();
+ sev_hv_features = snp_check_hv_features();
+
/* Check if memory encryption is enabled */
if (feature_mask == AMD_SME_BIT) {
if (!(bp->hdr.xloadflags & XLF_MEM_ENCRYPTION))
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index fa7fdd11a45b..fc4f6f188d42 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -1264,17 +1264,6 @@ void __init sev_es_init_vc_handling(void)
if (!sev_es_check_cpu_features())
panic("SEV-ES CPU Features missing");
- /*
- * SNP is supported in v2 of the GHCB spec which mandates support for HV
- * features.
- */
- if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
- sev_hv_features = get_hv_features();
-
- if (!(sev_hv_features & GHCB_HV_FT_SNP))
- sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
- }
-
/* Initialize per-cpu GHCB pages */
for_each_possible_cpu(cpu) {
alloc_runtime_data(cpu);
diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
index 3690994275dd..7ad8faf5e88b 100644
--- a/arch/x86/include/asm/sev-internal.h
+++ b/arch/x86/include/asm/sev-internal.h
@@ -81,6 +81,6 @@ static __always_inline void sev_es_wr_ghcb_msr(u64 val)
native_wrmsr(MSR_AMD64_SEV_ES_GHCB, low, high);
}
-u64 get_hv_features(void);
+void check_hv_features(void);
const struct snp_cpuid_table *snp_cpuid_get_table(void);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index ae2502253bd3..17b03a1f5694 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -418,6 +418,7 @@ struct svsm_call {
#ifdef CONFIG_AMD_MEM_ENCRYPT
extern u8 snp_vmpl;
+extern u64 sev_hv_features;
extern void __sev_es_ist_enter(struct pt_regs *regs);
extern void __sev_es_ist_exit(void);
@@ -495,6 +496,7 @@ void snp_set_memory_private(unsigned long vaddr, unsigned long npages);
void snp_set_wakeup_secondary_cpu(void);
bool snp_init(struct boot_params *bp);
void __noreturn snp_abort(void);
+u64 snp_check_hv_features(void);
void snp_dmi_setup(void);
int snp_issue_svsm_attest_req(u64 call_id, struct svsm_call *call, struct svsm_attest_call *input);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 13/21] x86/sev: Provide PIC aliases for SEV related data objects
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (11 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 14/21] x86/boot: Provide PIC aliases for 5-level paging related constants Ard Biesheuvel
` (9 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Provide PIC aliases for data objects that are shared between the SEV
startup code and the SEV code that executes later. This is needed so
that the confined startup code is permitted to access them.
This requires some of these variables to be moved into a source file
that is not part of the startup code, as the PIC alias is already
implied, and exporting variables in the opposite direction is not
supported.
Move ghcb_version as well, but don't provide a PIC alias as it is not
actually needed.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 3 ++
arch/x86/boot/startup/sev-shared.c | 19 -----------
arch/x86/boot/startup/sev-startup.c | 9 ------
arch/x86/coco/sev/core.c | 33 ++++++++++++++++++++
4 files changed, 36 insertions(+), 28 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 4b7a99b2f822..750f08d1c7a1 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -38,6 +38,9 @@ struct ghcb *boot_ghcb;
#define __BOOT_COMPRESSED
u8 snp_vmpl;
+u16 ghcb_version;
+
+u64 boot_svsm_caa_pa;
/* Include code for early handlers */
#include "../../boot/startup/sev-shared.c"
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 560985ef8df6..21d5a0e33145 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -18,25 +18,6 @@
#define WARN(condition, format...) (!!(condition))
#endif
-/*
- * SVSM related information:
- * During boot, the page tables are set up as identity mapped and later
- * changed to use kernel virtual addresses. Maintain separate virtual and
- * physical addresses for the CAA to allow SVSM functions to be used during
- * early boot, both with identity mapped virtual addresses and proper kernel
- * virtual addresses.
- */
-u64 boot_svsm_caa_pa __ro_after_init;
-
-/*
- * Since feature negotiation related variables are set early in the boot
- * process they must reside in the .data section so as not to be zeroed
- * out when the .bss section is later cleared.
- *
- * GHCB protocol version negotiated with the hypervisor.
- */
-u16 ghcb_version __ro_after_init;
-
/* Copy of the SNP firmware's CPUID page. */
static struct snp_cpuid_table cpuid_table_copy __ro_after_init;
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 2e946dab036c..9dc0e64ef040 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -41,15 +41,6 @@
#include <asm/cpuid.h>
#include <asm/cmdline.h>
-/* Bitmap of SEV features supported by the hypervisor */
-u64 sev_hv_features __ro_after_init;
-
-/* Secrets page physical address from the CC blob */
-u64 sev_secrets_pa __ro_after_init;
-
-/* For early boot SVSM communication */
-struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);
-
/*
* Nothing shall interrupt this code path while holding the per-CPU
* GHCB. The backup GHCB is only for NMIs interrupting this path.
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index fc4f6f188d42..ebf1f5ee8386 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -45,6 +45,29 @@
#include <asm/cpuid.h>
#include <asm/cmdline.h>
+/* Bitmap of SEV features supported by the hypervisor */
+u64 sev_hv_features __ro_after_init;
+SYM_PIC_ALIAS(sev_hv_features);
+
+/* Secrets page physical address from the CC blob */
+u64 sev_secrets_pa __ro_after_init;
+SYM_PIC_ALIAS(sev_secrets_pa);
+
+/* For early boot SVSM communication */
+struct svsm_ca boot_svsm_ca_page __aligned(PAGE_SIZE);
+SYM_PIC_ALIAS(boot_svsm_ca_page);
+
+/*
+ * SVSM related information:
+ * During boot, the page tables are set up as identity mapped and later
+ * changed to use kernel virtual addresses. Maintain separate virtual and
+ * physical addresses for the CAA to allow SVSM functions to be used during
+ * early boot, both with identity mapped virtual addresses and proper kernel
+ * virtual addresses.
+ */
+u64 boot_svsm_caa_pa __ro_after_init;
+SYM_PIC_ALIAS(boot_svsm_caa_pa);
+
DEFINE_PER_CPU(struct svsm_ca *, svsm_caa);
DEFINE_PER_CPU(u64, svsm_caa_pa);
@@ -118,6 +141,16 @@ DEFINE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
*/
u8 snp_vmpl __ro_after_init;
EXPORT_SYMBOL_GPL(snp_vmpl);
+SYM_PIC_ALIAS(snp_vmpl);
+
+/*
+ * Since feature negotiation related variables are set early in the boot
+ * process they must reside in the .data section so as not to be zeroed
+ * out when the .bss section is later cleared.
+ *
+ * GHCB protocol version negotiated with the hypervisor.
+ */
+u16 ghcb_version __ro_after_init;
/* For early boot hypervisor communication in SEV-ES enabled guests */
static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE);
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 14/21] x86/boot: Provide PIC aliases for 5-level paging related constants
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (12 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 13/21] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 15/21] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
` (8 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
For the time being, provide PIC aliases for the global variables related
to 5-level paging. Some or all of these are in the process of being
removed, but currently, they are still assigned by the startup code.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/head64.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 510fb41f55fc..8c69cea84297 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -53,10 +53,13 @@ pmdval_t early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX);
#ifdef CONFIG_X86_5LEVEL
unsigned int __pgtable_l5_enabled __ro_after_init;
+SYM_PIC_ALIAS(__pgtable_l5_enabled);
unsigned int pgdir_shift __ro_after_init = 39;
EXPORT_SYMBOL(pgdir_shift);
+SYM_PIC_ALIAS(pgdir_shift);
unsigned int ptrs_per_p4d __ro_after_init = 1;
EXPORT_SYMBOL(ptrs_per_p4d);
+SYM_PIC_ALIAS(ptrs_per_p4d);
#endif
#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 15/21] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (13 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 14/21] x86/boot: Provide PIC aliases for 5-level paging related constants Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 16/21] x86/sev: Export startup routines for later use Ard Biesheuvel
` (7 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Rename sev-nmi.c to noinstr.c, and move the get/put GHCB routines
into it too, which are also annotated as 'noinstr' and suffer from the
same problem as the NMI code, i.e., that GCC may ignore the
__no_sanitize_address__ function attribute implied by 'noinstr' and
insert KASAN instrumentation anyway.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/sev-startup.c | 74 --------------------
arch/x86/coco/sev/Makefile | 6 +-
arch/x86/coco/sev/{sev-nmi.c => noinstr.c} | 74 ++++++++++++++++++++
3 files changed, 77 insertions(+), 77 deletions(-)
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index 9dc0e64ef040..b56a585f57ab 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -41,83 +41,9 @@
#include <asm/cpuid.h>
#include <asm/cmdline.h>
-/*
- * Nothing shall interrupt this code path while holding the per-CPU
- * GHCB. The backup GHCB is only for NMIs interrupting this path.
- *
- * Callers must disable local interrupts around it.
- */
-noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
-{
- struct sev_es_runtime_data *data;
- struct ghcb *ghcb;
-
- WARN_ON(!irqs_disabled());
-
- data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;
-
- if (unlikely(data->ghcb_active)) {
- /* GHCB is already in use - save its contents */
-
- if (unlikely(data->backup_ghcb_active)) {
- /*
- * Backup-GHCB is also already in use. There is no way
- * to continue here so just kill the machine. To make
- * panic() work, mark GHCBs inactive so that messages
- * can be printed out.
- */
- data->ghcb_active = false;
- data->backup_ghcb_active = false;
-
- instrumentation_begin();
- panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
- instrumentation_end();
- }
-
- /* Mark backup_ghcb active before writing to it */
- data->backup_ghcb_active = true;
-
- state->ghcb = &data->backup_ghcb;
-
- /* Backup GHCB content */
- *state->ghcb = *ghcb;
- } else {
- state->ghcb = NULL;
- data->ghcb_active = true;
- }
-
- return ghcb;
-}
-
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"
-noinstr void __sev_put_ghcb(struct ghcb_state *state)
-{
- struct sev_es_runtime_data *data;
- struct ghcb *ghcb;
-
- WARN_ON(!irqs_disabled());
-
- data = this_cpu_read(runtime_data);
- ghcb = &data->ghcb_page;
-
- if (state->ghcb) {
- /* Restore GHCB from Backup */
- *ghcb = *state->ghcb;
- data->backup_ghcb_active = false;
- state->ghcb = NULL;
- } else {
- /*
- * Invalidate the GHCB so a VMGEXIT instruction issued
- * from userspace won't appear to be valid.
- */
- vc_ghcb_invalidate(ghcb);
- data->ghcb_active = false;
- }
-}
-
void __head
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op,
diff --git a/arch/x86/coco/sev/Makefile b/arch/x86/coco/sev/Makefile
index db3255b979bd..53e964a22759 100644
--- a/arch/x86/coco/sev/Makefile
+++ b/arch/x86/coco/sev/Makefile
@@ -1,9 +1,9 @@
# SPDX-License-Identifier: GPL-2.0
-obj-y += core.o sev-nmi.o vc-handle.o
+obj-y += core.o noinstr.o vc-handle.o
# Clang 14 and older may fail to respect __no_sanitize_undefined when inlining
-UBSAN_SANITIZE_sev-nmi.o := n
+UBSAN_SANITIZE_noinstr.o := n
# GCC may fail to respect __no_sanitize_address when inlining
-KASAN_SANITIZE_sev-nmi.o := n
+KASAN_SANITIZE_noinstr.o := n
diff --git a/arch/x86/coco/sev/sev-nmi.c b/arch/x86/coco/sev/noinstr.c
similarity index 61%
rename from arch/x86/coco/sev/sev-nmi.c
rename to arch/x86/coco/sev/noinstr.c
index d8dfaddfb367..b527eafb6312 100644
--- a/arch/x86/coco/sev/sev-nmi.c
+++ b/arch/x86/coco/sev/noinstr.c
@@ -106,3 +106,77 @@ void noinstr __sev_es_nmi_complete(void)
__sev_put_ghcb(&state);
}
+
+/*
+ * Nothing shall interrupt this code path while holding the per-CPU
+ * GHCB. The backup GHCB is only for NMIs interrupting this path.
+ *
+ * Callers must disable local interrupts around it.
+ */
+noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (unlikely(data->ghcb_active)) {
+ /* GHCB is already in use - save its contents */
+
+ if (unlikely(data->backup_ghcb_active)) {
+ /*
+ * Backup-GHCB is also already in use. There is no way
+ * to continue here so just kill the machine. To make
+ * panic() work, mark GHCBs inactive so that messages
+ * can be printed out.
+ */
+ data->ghcb_active = false;
+ data->backup_ghcb_active = false;
+
+ instrumentation_begin();
+ panic("Unable to handle #VC exception! GHCB and Backup GHCB are already in use");
+ instrumentation_end();
+ }
+
+ /* Mark backup_ghcb active before writing to it */
+ data->backup_ghcb_active = true;
+
+ state->ghcb = &data->backup_ghcb;
+
+ /* Backup GHCB content */
+ *state->ghcb = *ghcb;
+ } else {
+ state->ghcb = NULL;
+ data->ghcb_active = true;
+ }
+
+ return ghcb;
+}
+
+noinstr void __sev_put_ghcb(struct ghcb_state *state)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ WARN_ON(!irqs_disabled());
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ if (state->ghcb) {
+ /* Restore GHCB from Backup */
+ *ghcb = *state->ghcb;
+ data->backup_ghcb_active = false;
+ state->ghcb = NULL;
+ } else {
+ /*
+ * Invalidate the GHCB so a VMGEXIT instruction issued
+ * from userspace won't appear to be valid.
+ */
+ vc_ghcb_invalidate(ghcb);
+ data->ghcb_active = false;
+ }
+}
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 16/21] x86/sev: Export startup routines for later use
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (14 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 15/21] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 17/21] x86/boot: Create a confined code area for startup code Ard Biesheuvel
` (6 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Create aliases that expose routines that are part of the startup code to
other code in the core kernel, so that they can be called later as well.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/exports.h | 12 ++++++++++++
arch/x86/kernel/vmlinux.lds.S | 2 ++
2 files changed, 14 insertions(+)
diff --git a/arch/x86/boot/startup/exports.h b/arch/x86/boot/startup/exports.h
new file mode 100644
index 000000000000..00ea376fde33
--- /dev/null
+++ b/arch/x86/boot/startup/exports.h
@@ -0,0 +1,12 @@
+
+/*
+ * The symbols below are functions that are implemented by the startup code,
+ * but called at runtime by the SEV code residing in the core kernel.
+ */
+PROVIDE(early_set_pages_state = __pi_early_set_pages_state);
+PROVIDE(early_snp_set_memory_private = __pi_early_snp_set_memory_private);
+PROVIDE(early_snp_set_memory_shared = __pi_early_snp_set_memory_shared);
+PROVIDE(get_hv_features = __pi_get_hv_features);
+PROVIDE(sev_es_terminate = __pi_sev_es_terminate);
+PROVIDE(snp_cpuid = __pi_snp_cpuid);
+PROVIDE(snp_cpuid_get_table = __pi_snp_cpuid_get_table);
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 9340c74b680d..4aaa1693b262 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -517,3 +517,5 @@ xen_elfnote_entry_value =
xen_elfnote_phys32_entry_value =
ABSOLUTE(xen_elfnote_phys32_entry) + ABSOLUTE(pvh_start_xen - LOAD_OFFSET);
#endif
+
+#include "../boot/startup/exports.h"
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 17/21] x86/boot: Create a confined code area for startup code
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (15 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 16/21] x86/sev: Export startup routines for later use Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 18/21] x86/boot: Move startup code out of __head section Ard Biesheuvel
` (5 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
In order to be able to have tight control over which code may execute
from the early 1:1 mapping of memory, but still link vmlinux as a single
executable, prefix all symbol references in startup code with __pi_, and
invoke it from outside using the __pi_ prefix.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/Makefile | 13 +++++++++++++
arch/x86/boot/startup/sev-shared.c | 4 +---
arch/x86/boot/startup/sme.c | 1 -
arch/x86/coco/sev/core.c | 2 +-
arch/x86/coco/sev/vc-handle.c | 1 -
arch/x86/include/asm/setup.h | 1 +
arch/x86/include/asm/sev.h | 1 +
arch/x86/kernel/head64.c | 2 +-
arch/x86/kernel/head_64.S | 8 ++++----
arch/x86/mm/mem_encrypt_boot.S | 6 +++---
10 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile
index b514f7e81332..c7e690091f81 100644
--- a/arch/x86/boot/startup/Makefile
+++ b/arch/x86/boot/startup/Makefile
@@ -28,3 +28,16 @@ lib-$(CONFIG_EFI_MIXED) += efi-mixed.o
# to be linked into the decompressor or the EFI stub but not vmlinux
#
$(patsubst %.o,$(obj)/%.o,$(lib-y)): OBJECT_FILES_NON_STANDARD := y
+
+#
+# Confine the startup code by prefixing all symbols with __pi_ (for position
+# independent). This ensures that startup code can only call other startup
+# code, or code that has explicitly been made accessible to it via a symbol
+# alias.
+#
+$(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_
+$(obj)/%.pi.o: $(obj)/%.o FORCE
+ $(call if_changed,objcopy)
+
+extra-y := $(obj-y)
+obj-y := $(patsubst %.o,%.pi.o,$(obj-y))
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 21d5a0e33145..49440955885a 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -11,9 +11,7 @@
#include <asm/setup_data.h>
-#ifndef __BOOT_COMPRESSED
-#define error(v) pr_err(v)
-#else
+#ifdef __BOOT_COMPRESSED
#undef WARN
#define WARN(condition, format...) (!!(condition))
#endif
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index 0ae04e333f51..ffd0185aaa9d 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -569,7 +569,6 @@ void __head sme_enable(struct boot_params *bp)
#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION
/* Local version for startup code, which never operates on user page tables */
-__weak
pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd)
{
return pgd;
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index ebf1f5ee8386..8e316b09a646 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -270,7 +270,7 @@ static int svsm_perform_call_protocol(struct svsm_call *call)
do {
ret = ghcb ? svsm_perform_ghcb_protocol(ghcb, call)
- : svsm_perform_msr_protocol(call);
+ : __pi_svsm_perform_msr_protocol(call);
} while (ret == -EAGAIN);
if (sev_cfg.ghcbs_initialized)
diff --git a/arch/x86/coco/sev/vc-handle.c b/arch/x86/coco/sev/vc-handle.c
index 3cadaa46b9d7..47991807b231 100644
--- a/arch/x86/coco/sev/vc-handle.c
+++ b/arch/x86/coco/sev/vc-handle.c
@@ -1060,4 +1060,3 @@ bool __init handle_vc_boot_ghcb(struct pt_regs *regs)
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SEV_ES_GEN_REQ);
}
-
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h
index 6324f4c6c545..895d09faaf83 100644
--- a/arch/x86/include/asm/setup.h
+++ b/arch/x86/include/asm/setup.h
@@ -53,6 +53,7 @@ extern void i386_reserve_resources(void);
extern unsigned long __startup_64(unsigned long p2v_offset, struct boot_params *bp);
extern void startup_64_setup_gdt_idt(void);
extern void startup_64_load_idt(void *vc_handler);
+extern void __pi_startup_64_load_idt(void *vc_handler);
extern void early_setup_idt(void);
extern void __init do_early_exception(struct pt_regs *regs, int trapnr);
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index 17b03a1f5694..4ebdffdc5856 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -534,6 +534,7 @@ struct cpuid_leaf {
};
int svsm_perform_msr_protocol(struct svsm_call *call);
+int __pi_svsm_perform_msr_protocol(struct svsm_call *call);
int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
void *ctx, struct cpuid_leaf *leaf);
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 8c69cea84297..1d038ba07f29 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -320,5 +320,5 @@ void early_setup_idt(void)
handler = vc_boot_ghcb;
}
- startup_64_load_idt(handler);
+ __pi_startup_64_load_idt(handler);
}
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 069420853304..88cbf932f7c2 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -71,7 +71,7 @@ SYM_CODE_START_NOALIGN(startup_64)
xorl %edx, %edx
wrmsr
- call startup_64_setup_gdt_idt
+ call __pi_startup_64_setup_gdt_idt
/* Now switch to __KERNEL_CS so IRET works reliably */
pushq $__KERNEL_CS
@@ -91,7 +91,7 @@ SYM_CODE_START_NOALIGN(startup_64)
* subsequent code. Pass the boot_params pointer as the first argument.
*/
movq %r15, %rdi
- call sme_enable
+ call __pi_sme_enable
#endif
/* Sanitize CPU configuration */
@@ -111,7 +111,7 @@ SYM_CODE_START_NOALIGN(startup_64)
* programmed into CR3.
*/
movq %r15, %rsi
- call __startup_64
+ call __pi___startup_64
/* Form the CR3 value being sure to include the CR3 modifier */
leaq early_top_pgt(%rip), %rcx
@@ -562,7 +562,7 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb)
/* Call C handler */
movq %rsp, %rdi
movq ORIG_RAX(%rsp), %rsi
- call do_vc_no_ghcb
+ call __pi_do_vc_no_ghcb
/* Unwind pt_regs */
POP_REGS
diff --git a/arch/x86/mm/mem_encrypt_boot.S b/arch/x86/mm/mem_encrypt_boot.S
index f8a33b25ae86..edbf9c998848 100644
--- a/arch/x86/mm/mem_encrypt_boot.S
+++ b/arch/x86/mm/mem_encrypt_boot.S
@@ -16,7 +16,7 @@
.text
.code64
-SYM_FUNC_START(sme_encrypt_execute)
+SYM_FUNC_START(__pi_sme_encrypt_execute)
/*
* Entry parameters:
@@ -69,9 +69,9 @@ SYM_FUNC_START(sme_encrypt_execute)
ANNOTATE_UNRET_SAFE
ret
int3
-SYM_FUNC_END(sme_encrypt_execute)
+SYM_FUNC_END(__pi_sme_encrypt_execute)
-SYM_FUNC_START(__enc_copy)
+SYM_FUNC_START_LOCAL(__enc_copy)
ANNOTATE_NOENDBR
/*
* Routine used to encrypt memory in place.
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 18/21] x86/boot: Move startup code out of __head section
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (16 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 17/21] x86/boot: Create a confined code area for startup code Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 19/21] x86/boot: Disallow absolute symbol references in startup code Ard Biesheuvel
` (4 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Move startup code out of the __head section, now that this no longer has
a special significance. Move everything into .text or .init.text as
appropriate, so that startup code is not kept around unnecessarily.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/compressed/sev.c | 3 --
arch/x86/boot/startup/gdt_idt.c | 4 +--
arch/x86/boot/startup/map_kernel.c | 4 +--
arch/x86/boot/startup/sev-shared.c | 34 ++++++++++----------
arch/x86/boot/startup/sev-startup.c | 14 ++++----
arch/x86/boot/startup/sme.c | 26 +++++++--------
arch/x86/include/asm/init.h | 6 ----
arch/x86/kernel/head_32.S | 2 +-
arch/x86/kernel/head_64.S | 2 +-
arch/x86/platform/pvh/head.S | 2 +-
10 files changed, 44 insertions(+), 53 deletions(-)
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 750f08d1c7a1..79309944cb19 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -32,9 +32,6 @@ struct ghcb *boot_ghcb;
#undef __init
#define __init
-#undef __head
-#define __head
-
#define __BOOT_COMPRESSED
u8 snp_vmpl;
diff --git a/arch/x86/boot/startup/gdt_idt.c b/arch/x86/boot/startup/gdt_idt.c
index a3112a69b06a..d16102abdaec 100644
--- a/arch/x86/boot/startup/gdt_idt.c
+++ b/arch/x86/boot/startup/gdt_idt.c
@@ -24,7 +24,7 @@
static gate_desc bringup_idt_table[NUM_EXCEPTION_VECTORS] __page_aligned_data;
/* This may run while still in the direct mapping */
-void __head startup_64_load_idt(void *vc_handler)
+void startup_64_load_idt(void *vc_handler)
{
struct desc_ptr desc = {
.address = (unsigned long)rip_rel_ptr(bringup_idt_table),
@@ -46,7 +46,7 @@ void __head startup_64_load_idt(void *vc_handler)
/*
* Setup boot CPU state needed before kernel switches to virtual addresses.
*/
-void __head startup_64_setup_gdt_idt(void)
+void __init startup_64_setup_gdt_idt(void)
{
struct gdt_page *gp = rip_rel_ptr((void *)(__force unsigned long)&gdt_page);
void *handler = NULL;
diff --git a/arch/x86/boot/startup/map_kernel.c b/arch/x86/boot/startup/map_kernel.c
index 099ae2559336..75b3dd62da50 100644
--- a/arch/x86/boot/startup/map_kernel.c
+++ b/arch/x86/boot/startup/map_kernel.c
@@ -36,7 +36,7 @@ static inline bool check_la57_support(void)
return true;
}
-static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
+static unsigned long __init sme_postprocess_startup(struct boot_params *bp,
pmdval_t *pmd,
unsigned long p2v_offset)
{
@@ -90,7 +90,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp,
* the 1:1 mapping of memory. Kernel virtual addresses can be determined by
* subtracting p2v_offset from the RIP-relative address.
*/
-unsigned long __head __startup_64(unsigned long p2v_offset,
+unsigned long __init __startup_64(unsigned long p2v_offset,
struct boot_params *bp)
{
pmd_t (*early_pgts)[PTRS_PER_PMD] = rip_rel_ptr(early_dynamic_pgts);
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 49440955885a..77b34ab6c7d8 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -29,7 +29,7 @@ static u32 cpuid_std_range_max __ro_after_init;
static u32 cpuid_hyp_range_max __ro_after_init;
static u32 cpuid_ext_range_max __ro_after_init;
-void __head __noreturn
+void __noreturn
sev_es_terminate(unsigned int set, unsigned int reason)
{
u64 val = GHCB_MSR_TERM_REQ;
@@ -45,7 +45,7 @@ sev_es_terminate(unsigned int set, unsigned int reason)
asm volatile("hlt\n" : : : "memory");
}
-static u64 __head get_hv_features(void)
+static u64 __init get_hv_features(void)
{
u64 val;
@@ -59,7 +59,7 @@ static u64 __head get_hv_features(void)
return GHCB_MSR_HV_FT_RESP_VAL(val);
}
-u64 __head snp_check_hv_features(void)
+u64 __init snp_check_hv_features(void)
{
/*
* SNP is supported in v2 of the GHCB spec which mandates support for HV
@@ -186,7 +186,7 @@ const struct snp_cpuid_table *snp_cpuid_get_table(void)
*
* Return: XSAVE area size on success, 0 otherwise.
*/
-static u32 __head snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
+static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
u64 xfeatures_found = 0;
@@ -222,7 +222,7 @@ static u32 __head snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
return xsave_size;
}
-static bool __head
+static bool
snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -264,7 +264,7 @@ static void snp_cpuid_hv_no_ghcb(void *ctx, struct cpuid_leaf *leaf)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
}
-static int __head
+static int
snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
void *ctx, struct cpuid_leaf *leaf)
{
@@ -360,7 +360,7 @@ snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
* Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
* should be treated as fatal by caller.
*/
-int __head
+int
snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
void *ctx, struct cpuid_leaf *leaf)
{
@@ -404,7 +404,7 @@ snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
* page yet, so it only supports the MSR based communication with the
* hypervisor and only the CPUID exit-code.
*/
-void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
+void do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
{
unsigned int subfn = lower_bits(regs->cx, 32);
unsigned int fn = lower_bits(regs->ax, 32);
@@ -480,7 +480,7 @@ struct cc_setup_data {
* Search for a Confidential Computing blob passed in as a setup_data entry
* via the Linux Boot Protocol.
*/
-static __head
+static __init
struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
{
struct cc_setup_data *sd = NULL;
@@ -508,7 +508,7 @@ struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
* mapping needs to be updated in sync with all the changes to virtual memory
* layout and related mapping facilities throughout the boot process.
*/
-static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
+static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
{
const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
int i;
@@ -536,8 +536,8 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
}
}
-static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
- struct svsm_ca *caa, u64 caa_pa)
+static void svsm_pval_4k_page(unsigned long paddr, bool validate,
+ struct svsm_ca *caa, u64 caa_pa)
{
struct svsm_pvalidate_call *pc;
struct svsm_call call = {};
@@ -576,8 +576,8 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
native_local_irq_restore(flags);
}
-static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
- bool validate, struct svsm_ca *caa, u64 caa_pa)
+static void pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
+ bool validate, struct svsm_ca *caa, u64 caa_pa)
{
int ret;
@@ -590,8 +590,8 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
}
}
-static void __head __page_state_change(unsigned long paddr, enum psc_op op,
- struct svsm_ca *caa, u64 caa_pa)
+static void __page_state_change(unsigned long paddr, enum psc_op op,
+ struct svsm_ca *caa, u64 caa_pa)
{
u64 val;
@@ -623,7 +623,7 @@ static void __head __page_state_change(unsigned long paddr, enum psc_op op,
* Maintain the GPA of the SVSM Calling Area (CA) in order to utilize the SVSM
* services needed when not running in VMPL0.
*/
-static bool __head svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
+static bool __init svsm_setup_ca(const struct cc_blob_sev_info *cc_info,
void *page)
{
struct snp_secrets_page *secrets_page;
diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
index b56a585f57ab..21424157819c 100644
--- a/arch/x86/boot/startup/sev-startup.c
+++ b/arch/x86/boot/startup/sev-startup.c
@@ -44,7 +44,7 @@
/* Include code shared with pre-decompression boot stage */
#include "sev-shared.c"
-void __head
+void __init
early_set_pages_state(unsigned long vaddr, unsigned long paddr,
unsigned long npages, enum psc_op op,
struct svsm_ca *caa, u64 caa_pa)
@@ -64,7 +64,7 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
}
}
-void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
+void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
unsigned long npages)
{
/*
@@ -84,7 +84,7 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
rip_rel_ptr(&boot_svsm_ca_page), boot_svsm_caa_pa);
}
-void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
+void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
unsigned long npages)
{
/*
@@ -114,7 +114,7 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
*
* Scan for the blob in that order.
*/
-static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
+static struct cc_blob_sev_info *__init find_cc_blob(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;
@@ -140,7 +140,7 @@ static __head struct cc_blob_sev_info *find_cc_blob(struct boot_params *bp)
return cc_info;
}
-static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
+static void __init svsm_setup(struct cc_blob_sev_info *cc_info)
{
struct snp_secrets_page *secrets = (void *)cc_info->secrets_phys;
struct svsm_call call = {};
@@ -183,7 +183,7 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
boot_svsm_caa_pa = pa;
}
-bool __head snp_init(struct boot_params *bp)
+bool __init snp_init(struct boot_params *bp)
{
struct cc_blob_sev_info *cc_info;
@@ -212,7 +212,7 @@ bool __head snp_init(struct boot_params *bp)
return true;
}
-void __head __noreturn snp_abort(void)
+void __init __noreturn snp_abort(void)
{
sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
}
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
index ffd0185aaa9d..846f5d7c24e1 100644
--- a/arch/x86/boot/startup/sme.c
+++ b/arch/x86/boot/startup/sme.c
@@ -91,7 +91,7 @@ struct sme_populate_pgd_data {
*/
static char sme_workarea[2 * PMD_SIZE] __section(".init.scratch");
-static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd)
+static void __init sme_clear_pgd(struct sme_populate_pgd_data *ppd)
{
unsigned long pgd_start, pgd_end, pgd_size;
pgd_t *pgd_p;
@@ -106,7 +106,7 @@ static void __head sme_clear_pgd(struct sme_populate_pgd_data *ppd)
memset(pgd_p, 0, pgd_size);
}
-static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
+static pud_t __init *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
{
pgd_t *pgd;
p4d_t *p4d;
@@ -143,7 +143,7 @@ static pud_t __head *sme_prepare_pgd(struct sme_populate_pgd_data *ppd)
return pud;
}
-static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
+static void __init sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -159,7 +159,7 @@ static void __head sme_populate_pgd_large(struct sme_populate_pgd_data *ppd)
set_pmd(pmd, __pmd(ppd->paddr | ppd->pmd_flags));
}
-static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd)
+static void __init sme_populate_pgd(struct sme_populate_pgd_data *ppd)
{
pud_t *pud;
pmd_t *pmd;
@@ -185,7 +185,7 @@ static void __head sme_populate_pgd(struct sme_populate_pgd_data *ppd)
set_pte(pte, __pte(ppd->paddr | ppd->pte_flags));
}
-static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
+static void __init __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd_large(ppd);
@@ -195,7 +195,7 @@ static void __head __sme_map_range_pmd(struct sme_populate_pgd_data *ppd)
}
}
-static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
+static void __init __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
{
while (ppd->vaddr < ppd->vaddr_end) {
sme_populate_pgd(ppd);
@@ -205,7 +205,7 @@ static void __head __sme_map_range_pte(struct sme_populate_pgd_data *ppd)
}
}
-static void __head __sme_map_range(struct sme_populate_pgd_data *ppd,
+static void __init __sme_map_range(struct sme_populate_pgd_data *ppd,
pmdval_t pmd_flags, pteval_t pte_flags)
{
unsigned long vaddr_end;
@@ -229,22 +229,22 @@ static void __head __sme_map_range(struct sme_populate_pgd_data *ppd,
__sme_map_range_pte(ppd);
}
-static void __head sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
+static void __init sme_map_range_encrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_ENC, PTE_FLAGS_ENC);
}
-static void __head sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
+static void __init sme_map_range_decrypted(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC, PTE_FLAGS_DEC);
}
-static void __head sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
+static void __init sme_map_range_decrypted_wp(struct sme_populate_pgd_data *ppd)
{
__sme_map_range(ppd, PMD_FLAGS_DEC_WP, PTE_FLAGS_DEC_WP);
}
-static unsigned long __head sme_pgtable_calc(unsigned long len)
+static unsigned long __init sme_pgtable_calc(unsigned long len)
{
unsigned long entries = 0, tables = 0;
@@ -281,7 +281,7 @@ static unsigned long __head sme_pgtable_calc(unsigned long len)
return entries + tables;
}
-void __head sme_encrypt_kernel(struct boot_params *bp)
+void __init sme_encrypt_kernel(struct boot_params *bp)
{
unsigned long workarea_start, workarea_end, workarea_len;
unsigned long execute_start, execute_end, execute_len;
@@ -485,7 +485,7 @@ void __head sme_encrypt_kernel(struct boot_params *bp)
native_write_cr3(__native_read_cr3());
}
-void __head sme_enable(struct boot_params *bp)
+void __init sme_enable(struct boot_params *bp)
{
unsigned int eax, ebx, ecx, edx;
unsigned long feature_mask;
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index 8b1b1abcef15..01ccdd168df0 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -2,12 +2,6 @@
#ifndef _ASM_X86_INIT_H
#define _ASM_X86_INIT_H
-#if defined(CONFIG_CC_IS_CLANG) && CONFIG_CLANG_VERSION < 170000
-#define __head __section(".head.text") __no_sanitize_undefined __no_stack_protector
-#else
-#define __head __section(".head.text") __no_sanitize_undefined
-#endif
-
struct x86_mapping_info {
void *(*alloc_pgt_page)(void *); /* allocate buf for page table */
void (*free_pgt_page)(void *, void *); /* free buf for page table */
diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S
index 2e42056d2306..5962ff2a189a 100644
--- a/arch/x86/kernel/head_32.S
+++ b/arch/x86/kernel/head_32.S
@@ -61,7 +61,7 @@ RESERVE_BRK(pagetables, INIT_MAP_SIZE)
* any particular GDT layout, because we load our own as soon as we
* can.
*/
-__HEAD
+ __INIT
SYM_CODE_START(startup_32)
movl pa(initial_stack),%ecx
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 88cbf932f7c2..ac1116f623d3 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -33,7 +33,7 @@
* because we need identity-mapped pages.
*/
- __HEAD
+ __INIT
.code64
SYM_CODE_START_NOALIGN(startup_64)
UNWIND_HINT_END_OF_STACK
diff --git a/arch/x86/platform/pvh/head.S b/arch/x86/platform/pvh/head.S
index cfa18ec7d55f..16aa1f018b80 100644
--- a/arch/x86/platform/pvh/head.S
+++ b/arch/x86/platform/pvh/head.S
@@ -24,7 +24,7 @@
#include <asm/nospec-branch.h>
#include <xen/interface/elfnote.h>
- __HEAD
+ __INIT
/*
* Entry point for PVH guests.
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 19/21] x86/boot: Disallow absolute symbol references in startup code
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (17 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 18/21] x86/boot: Move startup code out of __head section Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 20/21] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
` (3 subsequent siblings)
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
Check that the objects built under arch/x86/boot/startup do not contain
any absolute symbol reference. Given that the code is built with -fPIC,
such references can only be emitted using R_X86_64_64 relocations, so
checking that those are absent is sufficient.
Note that debug sections and __patchable_funtion_entries section may
contain such relocations nonetheless, but these are unnecessary in the
startup code, so they can be dropped first.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/boot/startup/Makefile | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/arch/x86/boot/startup/Makefile b/arch/x86/boot/startup/Makefile
index c7e690091f81..7b83c8f597ef 100644
--- a/arch/x86/boot/startup/Makefile
+++ b/arch/x86/boot/startup/Makefile
@@ -35,9 +35,17 @@ $(patsubst %.o,$(obj)/%.o,$(lib-y)): OBJECT_FILES_NON_STANDARD := y
# code, or code that has explicitly been made accessible to it via a symbol
# alias.
#
-$(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_
+$(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ --strip-debug \
+ --remove-section=.rela__patchable_function_entries
$(obj)/%.pi.o: $(obj)/%.o FORCE
- $(call if_changed,objcopy)
+ $(call if_changed,piobjcopy)
+
+quiet_cmd_piobjcopy = $(quiet_cmd_objcopy)
+ cmd_piobjcopy = $(cmd_objcopy); \
+ if $(READELF) -r $(@) | grep R_X86_64_64; then \
+ echo "$@: R_X86_64_64 references not allowed in startup code" >&2; \
+ /bin/false; \
+ fi
extra-y := $(obj-y)
obj-y := $(patsubst %.o,%.pi.o,$(obj-y))
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 20/21] x86/boot: Revert "Reject absolute references in .head.text"
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (18 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 19/21] x86/boot: Disallow absolute symbol references in startup code Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-06-01 9:39 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 21/21] x86/boot: Get rid of the .head.text section Ard Biesheuvel
` (2 subsequent siblings)
22 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
This reverts commit faf0ed487415f76fe4acf7980ce360901f5e1698.
The startup code is checked directly for the absence of absolute symbol
references, so checking the .head.text section in the relocs tool is no
longer needed.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/tools/relocs.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 5778bc498415..e5a2b9a912d1 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -740,10 +740,10 @@ static void walk_relocs(int (*process)(struct section *sec, Elf_Rel *rel,
static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
const char *symname)
{
- int headtext = !strcmp(sec_name(sec->shdr.sh_info), ".head.text");
unsigned r_type = ELF64_R_TYPE(rel->r_info);
ElfW(Addr) offset = rel->r_offset;
int shn_abs = (sym->st_shndx == SHN_ABS) && !is_reloc(S_REL, symname);
+
if (sym->st_shndx == SHN_UNDEF)
return 0;
@@ -783,12 +783,6 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
break;
}
- if (headtext) {
- die("Absolute reference to symbol '%s' not permitted in .head.text\n",
- symname);
- break;
- }
-
/*
* Relocation offsets for 64 bit kernels are output
* as 32 bits and sign extended back to 64 bits when
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* [RFT PATCH v3 21/21] x86/boot: Get rid of the .head.text section
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (19 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 20/21] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
@ 2025-05-12 19:08 ` Ard Biesheuvel
2025-05-12 19:17 ` [RFT PATCH v3 00/21] x86: strict separation of startup code Borislav Petkov
2025-05-14 17:21 ` Borislav Petkov
22 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-12 19:08 UTC (permalink / raw)
To: linux-kernel
Cc: linux-efi, x86, Ard Biesheuvel, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
From: Ard Biesheuvel <ardb@kernel.org>
The .head.text section is now empty, so it can be dropped from the
linker script.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
arch/x86/kernel/vmlinux.lds.S | 5 -----
1 file changed, 5 deletions(-)
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4aaa1693b262..af8bc4a6f6c3 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -160,11 +160,6 @@ SECTIONS
} :text = 0xcccccccc
- /* bootstrapping code */
- .head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {
- HEAD_TEXT
- } :text = 0xcccccccc
-
/* End of text section, which should occupy whole number of pages */
_etext = .;
. = ALIGN(PAGE_SIZE);
--
2.49.0.1045.g170613ef41-goog
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (20 preceding siblings ...)
2025-05-12 19:08 ` [RFT PATCH v3 21/21] x86/boot: Get rid of the .head.text section Ard Biesheuvel
@ 2025-05-12 19:17 ` Borislav Petkov
2025-05-13 10:02 ` Ingo Molnar
2025-05-14 17:21 ` Borislav Petkov
22 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-12 19:17 UTC (permalink / raw)
To: Ingo Molnar
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> !!! Boot tested on non-SEV guest ONLY !!!!
...
> !!! Boot tested on non-SEV guest ONLY !!!!
>
> Again, I will need to lean on Tom to determine whether this breaks
> SEV-SNP guest boot. As I mentioned before, I am still waiting for
> SEV-SNP capable hardware to be delivered.
Ingo, please do not rush this stuff in before Tom and I have tested it
successfully with SEV* guests.
Thanks!
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-12 19:17 ` [RFT PATCH v3 00/21] x86: strict separation of startup code Borislav Petkov
@ 2025-05-13 10:02 ` Ingo Molnar
2025-05-13 10:12 ` Borislav Petkov
0 siblings, 1 reply; 58+ messages in thread
From: Ingo Molnar @ 2025-05-13 10:02 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
* Borislav Petkov <bp@alien8.de> wrote:
> On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > !!! Boot tested on non-SEV guest ONLY !!!!
>
> ...
>
> > !!! Boot tested on non-SEV guest ONLY !!!!
> >
> > Again, I will need to lean on Tom to determine whether this breaks
> > SEV-SNP guest boot. As I mentioned before, I am still waiting for
> > SEV-SNP capable hardware to be delivered.
>
> Ingo, please do not rush this stuff in before Tom and I have tested it
> successfully with SEV* guests.
>
> Thanks!
I don't intend to rush it, but note that AMD's SEV-SNP testing is
lagging a *lot* at the moment: Ard asked for testing the -v2 series on
May 4:
https://lore.kernel.org/r/20250504095230.2932860-25-ardb+git@google.com
That request for testing was ignored AFAICS. It's May 13 and still
crickets.
We also had SEV-SNP boot bugs pending since August 2024, that nobody
but (eventually) AMD triggered. Ie. very few people outside of the
vendor are testing SEV-SNP AFAICS, and even vendor testing is sporadic
...
Please ask AMD internally to get SEV-SNP tested more reliably. Testing
this -v3 series would be a good start. Hint, hint. ;-)
Thanks!
Ingo
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 10:02 ` Ingo Molnar
@ 2025-05-13 10:12 ` Borislav Petkov
2025-05-13 11:22 ` Ingo Molnar
0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-13 10:12 UTC (permalink / raw)
To: Ingo Molnar
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Tue, May 13, 2025 at 12:02:22PM +0200, Ingo Molnar wrote:
> I don't intend to rush it,
Thanks.
> That request for testing was ignored AFAICS. It's May 13 and still
> crickets.
Not ignored - Tom and I are testing but we're busy as hell too.
> We also had SEV-SNP boot bugs pending since August 2024, that nobody
> but (eventually) AMD triggered.
Where?
> Ie. very few people outside of the vendor are testing SEV-SNP AFAICS, and
> even vendor testing is sporadic ...
Not true - SEV* testing happens on a daily basis.
> Please ask AMD internally to get SEV-SNP tested more reliably. Testing
> this -v3 series would be a good start. Hint, hint. ;-)
We test everything that goes into linux-next. We haven't started testing
unreviewed patchsets yet because we don't do that - that stuff is moving.
So if you want to merge something, just ping me or Tom and we'll test it. But
you have to give us ample time to do so - you can't merge something which Ard
sent *on the same day*.
If you can't test it, you don't merge it but ask people to test it. You know
that.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 10:12 ` Borislav Petkov
@ 2025-05-13 11:22 ` Ingo Molnar
2025-05-13 14:16 ` Borislav Petkov
0 siblings, 1 reply; 58+ messages in thread
From: Ingo Molnar @ 2025-05-13 11:22 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
* Borislav Petkov <bp@alien8.de> wrote:
> On Tue, May 13, 2025 at 12:02:22PM +0200, Ingo Molnar wrote:
> > I don't intend to rush it,
>
> Thanks.
>
> > That request for testing was ignored AFAICS. It's May 13 and still
> > crickets.
>
> Not ignored - Tom and I are testing but we're busy as hell too.
Yeah, so the problem is that SEV* is hardware that basically no active
tester outside of the vendor (AMD) owns and is testing against
development trees AFAICS.
> > We also had SEV-SNP boot bugs pending since August 2024, that
> > nobody but (eventually) AMD triggered.
>
> Where?
I did a quick Git search, and here are a few examples:
For example, this commit from last summer:
6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
... was only fixed recently:
d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
Or this commit from June 2024:
34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas")
... was only fixed a few days ago:
f7387eff4bad ("x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL macro")
Or this commit from June 2024:
fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0")
... was fixed a few weeks ago:
8ed12ab1319b ("x86/boot/sev: Support memory acceptance in the EFI stub under SVSM")
Ie. bugfix latencies here were 10+ months.
Note that two of those fixes were from Ard who is working on further
robustifying the startup code - a much needed change.
Ie. when Ard is asking for SEV-SNP testing for WIP series, which he did
10+ days ago, you should not ignore it ... or if you do ignore his
request for testing, you should not complain about the changes being
merged eventually, once they pass review & testing on non-SEV
platforms.
> > Ie. very few people outside of the vendor are testing SEV-SNP
> > AFAICS, and even vendor testing is sporadic ...
>
> Not true - SEV* testing happens on a daily basis.
If you didn't have time to personally test Ard's -v2 series since May
2, that's OK: I can merge these proposed changes in an RFT branch so
that it gets tested in the daily testing flow. [see further below for
the Git link]
In other words: please no "gatekeeping". Please don't force Ard into a
catch-22 situation where he cannot test the patches on SEV-SNP, but you
are blocking these x86 startup code changes on the grounds that they
weren't tested on SEV-SNP ...
> > Please ask AMD internally to get SEV-SNP tested more reliably.
> > Testing this -v3 series would be a good start. Hint, hint. ;-)
>
> We test everything that goes into linux-next. We haven't started
> testing unreviewed patchsets yet because we don't do that - that
> stuff is moving.
>
> So if you want to merge something, just ping me or Tom and we'll test
> it.
Here's Ard's request from May 2:
https://lore.kernel.org/r/20250504095230.2932860-25-ardb+git@google.com
"Again, I will need to lean on Tom to determine whether this breaks
SEV-SNP guest boot. As I mentioned before, I am still waiting for
SEV-SNP capable hardware to be delivered."
This request for testing was ignored AFAICS.
> But you have to give us ample time to do so - you can't merge
> something which Ard sent *on the same day*.
Sure: -v2 was sent more than 10 days ago, and the testing request was
ignored AFAICS. Do 10 days count as 'ample time'?
Anyway, to make it even lower-overhead to test these changes, I've put
the -v3 series into the WIP.x86/boot tree:
git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/boot
Only lightly tested. Just a "boots/doesn't boot" kind of quick feedback
would be much appreciated.
Note that naturally this tree is still subject to rebasing, as review
feedback is incorporated.
Thanks!
Ingo
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 09/21] x86/sev: Pass SVSM calling area down to early page state change API
2025-05-12 19:08 ` [RFT PATCH v3 09/21] x86/sev: Pass SVSM calling area down to early page state change API Ard Biesheuvel
@ 2025-05-13 13:55 ` Ard Biesheuvel
2025-05-13 13:58 ` Ard Biesheuvel
0 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 13:55 UTC (permalink / raw)
Cc: linux-kernel, linux-efi, x86, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, 12 May 2025 at 20:11, Ard Biesheuvel <ardb+git@google.com> wrote:
>
> From: Ard Biesheuvel <ardb@kernel.org>
>
> The early page state change API is mostly only used very early, when
> only the boot time SVSM calling area is in use. However, this API is
> also called by the kexec finishing code, which runs very late, and
> potentially from a different CPU (which uses a different calling area).
>
> To avoid pulling the per-CPU SVSM calling area pointers and related SEV
> state into the startup code, refactor the page state change API so the
> SVSM calling area virtual and physical addresses can be provided by the
> caller.
>
> No functional change intended.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
This patch is broken - I'll send a followup fix asap.
> ---
> arch/x86/boot/compressed/sev.c | 12 +++++++++---
> arch/x86/boot/startup/sev-shared.c | 18 ++++++++++--------
> arch/x86/boot/startup/sev-startup.c | 11 +++++++----
> arch/x86/coco/sev/core.c | 3 ++-
> arch/x86/include/asm/sev-internal.h | 3 ++-
> 5 files changed, 30 insertions(+), 17 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
> index 7a01eef9ae01..04bc39d065ff 100644
> --- a/arch/x86/boot/compressed/sev.c
> +++ b/arch/x86/boot/compressed/sev.c
> @@ -68,7 +68,9 @@ void snp_set_page_private(unsigned long paddr)
> return;
>
> msr = sev_es_rd_ghcb_msr();
> - __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
> + __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE,
> + (struct svsm_ca *)boot_svsm_caa_pa,
> + boot_svsm_caa_pa);
> sev_es_wr_ghcb_msr(msr);
> }
>
> @@ -80,7 +82,9 @@ void snp_set_page_shared(unsigned long paddr)
> return;
>
> msr = sev_es_rd_ghcb_msr();
> - __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
> + __page_state_change(paddr, SNP_PAGE_STATE_SHARED,
> + (struct svsm_ca *)boot_svsm_caa_pa,
> + boot_svsm_caa_pa);
> sev_es_wr_ghcb_msr(msr);
> }
>
> @@ -109,7 +113,9 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
> u64 msr = sev_es_rd_ghcb_msr();
>
> for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
> - __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
> + __page_state_change(pa, SNP_PAGE_STATE_PRIVATE,
> + (struct svsm_ca *)boot_svsm_caa_pa,
> + boot_svsm_caa_pa);
> sev_es_wr_ghcb_msr(msr);
> }
>
> diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> index dae770327b50..70ad9a0aa023 100644
> --- a/arch/x86/boot/startup/sev-shared.c
> +++ b/arch/x86/boot/startup/sev-shared.c
> @@ -538,7 +538,8 @@ static void __head setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> }
> }
>
> -static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
> +static void __head svsm_pval_4k_page(unsigned long paddr, bool validate,
> + struct svsm_ca *caa, u64 caa_pa)
> {
> struct svsm_pvalidate_call *pc;
> struct svsm_call call = {};
> @@ -552,10 +553,10 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
> */
> flags = native_local_irq_save();
>
> - call.caa = svsm_get_caa();
> + call.caa = caa;
>
> pc = (struct svsm_pvalidate_call *)call.caa->svsm_buffer;
> - pc_pa = svsm_get_caa_pa() + offsetof(struct svsm_ca, svsm_buffer);
> + pc_pa = caa_pa + offsetof(struct svsm_ca, svsm_buffer);
>
> pc->num_entries = 1;
> pc->cur_index = 0;
> @@ -578,12 +579,12 @@ static void __head svsm_pval_4k_page(unsigned long paddr, bool validate)
> }
>
> static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
> - bool validate)
> + bool validate, struct svsm_ca *caa, u64 caa_pa)
> {
> int ret;
>
> if (snp_vmpl) {
> - svsm_pval_4k_page(paddr, validate);
> + svsm_pval_4k_page(paddr, validate, caa, caa_pa);
> } else {
> ret = pvalidate(vaddr, RMP_PG_SIZE_4K, validate);
> if (ret)
> @@ -591,7 +592,8 @@ static void __head pvalidate_4k_page(unsigned long vaddr, unsigned long paddr,
> }
> }
>
> -static void __head __page_state_change(unsigned long paddr, enum psc_op op)
> +static void __head __page_state_change(unsigned long paddr, enum psc_op op,
> + struct svsm_ca *caa, u64 caa_pa)
> {
> u64 val;
>
> @@ -600,7 +602,7 @@ static void __head __page_state_change(unsigned long paddr, enum psc_op op)
> * state change in the RMP table.
> */
> if (op == SNP_PAGE_STATE_SHARED)
> - pvalidate_4k_page(paddr, paddr, false);
> + pvalidate_4k_page(paddr, paddr, false, caa, caa_pa);
>
> /* Issue VMGEXIT to change the page state in RMP table. */
> sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
> @@ -616,7 +618,7 @@ static void __head __page_state_change(unsigned long paddr, enum psc_op op)
> * consistent with the RMP entry.
> */
> if (op == SNP_PAGE_STATE_PRIVATE)
> - pvalidate_4k_page(paddr, paddr, true);
> + pvalidate_4k_page(paddr, paddr, true, caa, caa_pa);
> }
>
> /*
> diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
> index 28bf68753580..7a3ad17d06f6 100644
> --- a/arch/x86/boot/startup/sev-startup.c
> +++ b/arch/x86/boot/startup/sev-startup.c
> @@ -132,7 +132,8 @@ noinstr void __sev_put_ghcb(struct ghcb_state *state)
>
> void __head
> early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> - unsigned long npages, enum psc_op op)
> + unsigned long npages, enum psc_op op,
> + struct svsm_ca *caa, u64 caa_pa)
> {
> unsigned long paddr_end;
>
> @@ -142,7 +143,7 @@ early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> paddr_end = paddr + (npages << PAGE_SHIFT);
>
> while (paddr < paddr_end) {
> - __page_state_change(paddr, op);
> + __page_state_change(paddr, op, caa, caa_pa);
>
> vaddr += PAGE_SIZE;
> paddr += PAGE_SIZE;
> @@ -165,7 +166,8 @@ void __head early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
> * Ask the hypervisor to mark the memory pages as private in the RMP
> * table.
> */
> - early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE);
> + early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_PRIVATE,
> + svsm_get_caa(), svsm_get_caa_pa());
> }
>
> void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> @@ -181,7 +183,8 @@ void __head early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
> return;
>
> /* Ask hypervisor to mark the memory pages shared in the RMP table. */
> - early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED);
> + early_set_pages_state(vaddr, paddr, npages, SNP_PAGE_STATE_SHARED,
> + svsm_get_caa(), svsm_get_caa_pa());
> }
>
> /*
> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> index 0e0ddf4c92aa..39bbbea09c24 100644
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -584,7 +584,8 @@ static void set_pages_state(unsigned long vaddr, unsigned long npages, int op)
>
> /* Use the MSR protocol when a GHCB is not available. */
> if (!boot_ghcb)
> - return early_set_pages_state(vaddr, __pa(vaddr), npages, op);
> + return early_set_pages_state(vaddr, __pa(vaddr), npages, op,
> + svsm_get_caa(), svsm_get_caa_pa());
>
> vaddr = vaddr & PAGE_MASK;
> vaddr_end = vaddr + (npages << PAGE_SHIFT);
> diff --git a/arch/x86/include/asm/sev-internal.h b/arch/x86/include/asm/sev-internal.h
> index e3b203c280aa..08e2cfdef512 100644
> --- a/arch/x86/include/asm/sev-internal.h
> +++ b/arch/x86/include/asm/sev-internal.h
> @@ -55,7 +55,8 @@ DECLARE_PER_CPU(struct sev_es_runtime_data*, runtime_data);
> DECLARE_PER_CPU(struct sev_es_save_area *, sev_vmsa);
>
> void early_set_pages_state(unsigned long vaddr, unsigned long paddr,
> - unsigned long npages, enum psc_op op);
> + unsigned long npages, enum psc_op op,
> + struct svsm_ca *ca, u64 caa_pa);
>
> DECLARE_PER_CPU(struct svsm_ca *, svsm_caa);
> DECLARE_PER_CPU(u64, svsm_caa_pa);
> --
> 2.49.0.1045.g170613ef41-goog
>
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 09/21] x86/sev: Pass SVSM calling area down to early page state change API
2025-05-13 13:55 ` Ard Biesheuvel
@ 2025-05-13 13:58 ` Ard Biesheuvel
0 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 13:58 UTC (permalink / raw)
Cc: linux-kernel, linux-efi, x86, Borislav Petkov, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Tue, 13 May 2025 at 14:55, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 12 May 2025 at 20:11, Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The early page state change API is mostly only used very early, when
> > only the boot time SVSM calling area is in use. However, this API is
> > also called by the kexec finishing code, which runs very late, and
> > potentially from a different CPU (which uses a different calling area).
> >
> > To avoid pulling the per-CPU SVSM calling area pointers and related SEV
> > state into the startup code, refactor the page state change API so the
> > SVSM calling area virtual and physical addresses can be provided by the
> > caller.
> >
> > No functional change intended.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> This patch is broken - I'll send a followup fix asap.
... or actually, the one before it.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 11:22 ` Ingo Molnar
@ 2025-05-13 14:16 ` Borislav Petkov
2025-05-13 15:01 ` Ard Biesheuvel
2025-05-14 6:20 ` Ingo Molnar
0 siblings, 2 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-13 14:16 UTC (permalink / raw)
To: Ingo Molnar
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Tue, May 13, 2025 at 01:22:16PM +0200, Ingo Molnar wrote:
> Yeah, so the problem is that SEV* is hardware that basically no active
> tester outside of the vendor (AMD) owns and is testing against
> development trees AFAICS.
I don't think you even know what you're talking about. Hell, I only recently
gave you the 101 on SEV because you asked me on IRC.
So no, not even close. If you had bothered to ask a search engine of your
choosing, you would've known better.
All the cloud vendors have it and we are getting bug reports and fixes from
them and everyone else that's using it. You could've seen that by doing a git
log on the SEV files in the kernel even.
> I did a quick Git search, and here are a few examples:
>
> For example, this commit from last summer:
>
> 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
>
> ... was only fixed recently:
>
> d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance")
Because we pretty much test with huge-page aligned memory sizes... memory
acceptance tracks pages at the 2mb level. It will accept memory if there is an
unaccepted memory EFI entry that isn't 2mb aligned at the start or end.
So when you have a 4G guest or 16G guest you don't have that. If you specify
4095M for the guest memory, then it will trigger. And since that was done
before SEV was initialized (at least in the EFI stub path, not the
decompressor path) things just didn't work.
> Or this commit from June 2024:
>
> 34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas")
>
> ... was only fixed a few days ago:
>
> f7387eff4bad ("x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL
> macro")
That was "fixed" because we never had to run a multi-VMPL level setup in Linux
yet as we run Linux guests differently with that respect. So we couldn't have
hit it even if we wanted to.
And even in the SVSM testing, Linux never requests a non-zero VMPL, and so it
wasn't caught during testing. Linux will always request VMPL0.
There is a lot of testing of the guest code with Coconut-SVSM and it is
a scenario that doesn't exist.
> Or this commit from June 2024:
>
> fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0")
>
> ... was fixed a few weeks ago:
>
> 8ed12ab1319b ("x86/boot/sev: Support memory acceptance in the EFI stub under SVSM")
That's a fix for the above fix d54d610243a4 which relates to the multiple VMPL
thing which we don't do (yet) in Linux.
> Ie. bugfix latencies here were 10+ months.
While doing your git search, did you check my reaction time when fixes are
sent too?
Or you decided to find some random patches with Fixes: tags pointing to SEV
code?
Or are you saying our crystal ball of what's broken is not working fast
enough?
Lemme see:
https://lore.kernel.org/r/c0af2efa-aea4-43aa-b1da-46ac4c50314b@amd.com
This is only the latest test report I requested.
So no, we test the hell of it and as much as possible. What you claim here is
dumbfounded and completely false.
> Note that two of those fixes were from Ard who is working on further
> robustifying the startup code - a much needed change.
Really? Much needed huh?
Please do explain why is it much needed?
Because the reason Ard is doing it is a different one but maybe
I misunderstood him...
> Ie. when Ard is asking for SEV-SNP testing for WIP series, which he did
> 10+ days ago, you should not ignore it
More unfounded claims. Here's Tom and me ignoring it:
https://lore.kernel.org/r/836eb6be-926b-dfb4-2c67-f55cba4a072b@amd.com
https://lore.kernel.org/r/20250507095801.GNaBsuqd7m15z0kHji@fat_crate.local
https://lore.kernel.org/r/20250508110800.GBaByQkJwmZlihk6Xp@fat_crate.local
https://lore.kernel.org/r/f4750413-a2e6-15c4-7fa5-2595b509500b@amd.com
https://lore.kernel.org/r/20250505160346.GJaBjhYp09sLZ5AyyJ@fat_crate.local
https://lore.kernel.org/r/20250505164759.GKaBjrv5SI4MX_NiX-@fat_crate.local
Nah, this is not ignoring - this is Tom and me rushing to see whether
something broke because *you* applied stuff on the same day without waiting
for any review!
This is basically you doing whatever the hell you like and not even asking
people working on that code.
And you completely ignored my requests on IRC to wait with that code a bit so
that we can take a look.
> ... or if you do ignore his request for testing, you should not complain
> about the changes being merged eventually, once they pass review & testing
> on non-SEV platforms.
What review?
Show me.
Commit-ID: bd4a58beaaf1f4aff025282c6e8b130bdb4a29e4
Gitweb: https://git.kernel.org/tip/bd4a58beaaf1f4aff025282c6e8b130bdb4a29e4
Author: Ard Biesheuvel <ardb@kernel.org>
AuthorDate: Sun, 04 May 2025 11:52:31 +02:00
Committer: Ingo Molnar <mingo@kernel.org>
CommitterDate: Sun, 04 May 2025 15:27:23 +02:00
What review do you expect to see in 3 hours on a Sunday?!?!
> If you didn't have time to personally test Ard's -v2 series since May
> 2, that's OK
It was May 4th, it was a Sunday. And you can see my replies on the next
days.
> I can merge these proposed changes in an RFT branch so that it gets tested
How about you wait first for those patches to be reviewed like every other
patchset on lkml and then take them?
I mean, normal review process. Remember?
The thing we all are supposed to do...
> In other words: please no "gatekeeping".
Sorry, zero gatekeeping here.
> Please don't force Ard into a catch-22 situation where he cannot test the
> patches on SEV-SNP, but you are blocking these x86 startup code changes on
> the grounds that they weren't tested on SEV-SNP ...
I'm helping Ard to get and setup a SEV machine. And I'm testing too.
If you had asked, you would've learned all that but you haven't.
> This request for testing was ignored AFAICS.
Review, then test. As always. You know that.
I keep telling you that but I don't think you're hearing me - you just do
whatever the hell you like.
> Sure: -v2 was sent more than 10 days ago, and the testing request was
> ignored AFAICS. Do 10 days count as 'ample time'?
I am reviewing. I can't just drop everything and concentrate only on this.
Hell, that set is not even ready yet:
https://lore.kernel.org/r/CAMj1kXH5C6FzMyrki_23TTk_Yma5NJdHTo-nv4DmZoz_qaGbVQ@mail.gmail.com
Now you tell me: why are *YOU* in such a hurry with that set?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 14:16 ` Borislav Petkov
@ 2025-05-13 15:01 ` Ard Biesheuvel
2025-05-13 16:44 ` Borislav Petkov
2025-05-14 6:32 ` Ingo Molnar
2025-05-14 6:20 ` Ingo Molnar
1 sibling, 2 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 15:01 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Ard Biesheuvel, linux-kernel, linux-efi, x86,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Tue, 13 May 2025 at 15:17, Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, May 13, 2025 at 01:22:16PM +0200, Ingo Molnar wrote:
...
> > Note that two of those fixes were from Ard who is working on further
> > robustifying the startup code - a much needed change.
>
> Really? Much needed huh?
>
> Please do explain why is it much needed?
>
> Because the reason Ard is doing it is a different one but maybe
> I misunderstood him...
>
I will refrain from inserting myself into the intra-tip review and
testing policy debate, but let me at least provide a quick recap of
what I am doing here and why.
Since commit
c88d71508e36 x86/boot/64: Rewrite startup_64() in C
dated Jun 6 2017, we have been using C code on the boot path in a way
that is not supported by the toolchain, i.e., to execute non-PIC C
code from a mapping of memory that is different from the one provided
to the linker. It should have been obvious at the time that this was a
bad idea, given the need to sprinkle fixup_pointer() calls left and
right to manipulate global variables (including non-pointer variables)
without crashing.
This C startup code has been expanding, and in particular, the SEV-SNP
startup code has been expanding over the past couple of years, and
grown many of these warts, where the C code needs to use special
annotations or helpers to access global objects.
Google uses Clang internally, and as expected, it does not behave
quite like GCC in this regard either. The result is that the SEV-SNP
boot tended to break in cryptic ways with Clang built kernels, due to
absolute references in the startup code that runs before those
absolute references are mapped.
I've done a preliminary pass upstream with RIP_REL_REF() and
rip_rel_ptr() and the use of the .head.text section for startup code
to ensure that we detect such issues at build time, and it has already
resulted in a number of true positives where the code in question
would have failed at boot time. At this point, I'm not aware of any
issues caused by absolute references going undetected.
However, Linus kindly indicated that the resulting diagnostics
produced by the relocs tool do not meet his high standards, and so I
proposed another approach, which I am implementing now (see cover
letter for details). Note that this approach is also much more robust,
as annotating things as __head by hand to get it emitted into the
section to which the diagnostics are applied is obviously not
foolproof.
Fixing the existing 5-level paging and kernel mapping code was rather
straight-forward. However, splitting up the SEV-SNP code has been
rather challenging due to the way it was put together, i.e., as a
single source file used everywhere, and to which additional
functionality has been added piecemeal (i.e., the SVSM support).
It is obvious that these changes should be tested before being merged,
hence the RFT in the subject. And I have been struggling a bit to get
access to usable hardware. (I do have access to internal development
systems, but those do not fit the 'usable' description by any measure,
given that I have to go through the cloud VM orchestration APIs to
test boot a simple kernel image).
What Boris might allude to is the fact that some of these changes also
form a prerequisite for being able to construct a generic EFI zboot
image for x86, which is a long-term objective that I am working on in
the background. But this is not the main reason.
In any case, there is no urgency wrt these changes as far as I am
concerned, and given that I already found an issue myself with v3,
perhaps it is better if we disregard it for the time being, and we can
come back to it for the next cycle. In the mean time, I can compare
notes with Boris and Tom directly to ensure that this is in the right
shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
as well (for which I sent out a RFC/v3 today).
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 15:01 ` Ard Biesheuvel
@ 2025-05-13 16:44 ` Borislav Petkov
2025-05-13 21:31 ` Ard Biesheuvel
2025-05-14 6:32 ` Ingo Molnar
1 sibling, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-13 16:44 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ingo Molnar, Ard Biesheuvel, linux-kernel, linux-efi, x86,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Tue, May 13, 2025 at 04:01:08PM +0100, Ard Biesheuvel wrote:
> I will refrain from inserting myself into the intra-tip review and
> testing policy debate, but let me at least provide a quick recap of
> what I am doing here and why.
Thanks for taking the time - much appreciated!
> Since commit
>
> c88d71508e36 x86/boot/64: Rewrite startup_64() in C
>
> dated Jun 6 2017, we have been using C code on the boot path in a way
> that is not supported by the toolchain, i.e., to execute non-PIC C
> code from a mapping of memory that is different from the one provided
> to the linker. It should have been obvious at the time that this was a
> bad idea, given the need to sprinkle fixup_pointer() calls left and
> right to manipulate global variables (including non-pointer variables)
> without crashing.
>
> This C startup code has been expanding, and in particular, the SEV-SNP
> startup code has been expanding over the past couple of years, and
> grown many of these warts, where the C code needs to use special
> annotations or helpers to access global objects.
>
> Google uses Clang internally, and as expected, it does not behave
> quite like GCC in this regard either. The result is that the SEV-SNP
> boot tended to break in cryptic ways with Clang built kernels, due to
> absolute references in the startup code that runs before those
> absolute references are mapped.
Oh look, Google is using SEV-SNP too. Non! Si! Oh!
https://www.youtube.com/watch?v=pFjSDM6D500
> I've done a preliminary pass upstream with RIP_REL_REF() and
> rip_rel_ptr() and the use of the .head.text section for startup code
> to ensure that we detect such issues at build time, and it has already
> resulted in a number of true positives where the code in question
> would have failed at boot time. At this point, I'm not aware of any
> issues caused by absolute references going undetected.
>
> However, Linus kindly indicated that the resulting diagnostics
> produced by the relocs tool do not meet his high standards, and so I
> proposed another approach, which I am implementing now (see cover
> letter for details). Note that this approach is also much more robust,
> as annotating things as __head by hand to get it emitted into the
> section to which the diagnostics are applied is obviously not
> foolproof.
AFAIR, this is what you've done for arm64 too, right?
> Fixing the existing 5-level paging and kernel mapping code was rather
> straight-forward. However, splitting up the SEV-SNP code has been
> rather challenging due to the way it was put together, i.e., as a
> single source file used everywhere, and to which additional
> functionality has been added piecemeal (i.e., the SVSM support).
>
> It is obvious that these changes should be tested before being merged,
Preach!
> hence the RFT in the subject. And I have been struggling a bit to get
> access to usable hardware. (I do have access to internal development
> systems, but those do not fit the 'usable' description by any measure,
> given that I have to go through the cloud VM orchestration APIs to
> test boot a simple kernel image).
>
> What Boris might allude to is the fact that some of these changes also
> form a prerequisite for being able to construct a generic EFI zboot
> image for x86, which is a long-term objective that I am working on in
> the background. But this is not the main reason.
>
> In any case, there is no urgency wrt these changes as far as I am
THANK YOU!
So basically we can all slow down and do normal review and testing. Without
any of that hurried patching back'n'forth. And we didn't need any of that
grief!
Basically something which I've been trying to say from the very beginning but
no one listens to me!
Geez.
> concerned, and given that I already found an issue myself with v3,
> perhaps it is better if we disregard it for the time being, and we can
> come back to it for the next cycle. In the mean time, I can compare
> notes with Boris and Tom directly to ensure that this is in the right
> shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> as well (for which I sent out a RFC/v3 today).
Thanks man, let's also sync on IRC.
We have enough time to run the set through all our testing pile and shake out
any outstanding issues.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 16:44 ` Borislav Petkov
@ 2025-05-13 21:31 ` Ard Biesheuvel
0 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-13 21:31 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Ard Biesheuvel, linux-kernel, linux-efi, x86,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Tue, 13 May 2025 at 17:44, Borislav Petkov <bp@alien8.de> wrote:
>
...
> > I've done a preliminary pass upstream with RIP_REL_REF() and
> > rip_rel_ptr() and the use of the .head.text section for startup code
> > to ensure that we detect such issues at build time, and it has already
> > resulted in a number of true positives where the code in question
> > would have failed at boot time. At this point, I'm not aware of any
> > issues caused by absolute references going undetected.
> >
> > However, Linus kindly indicated that the resulting diagnostics
> > produced by the relocs tool do not meet his high standards, and so I
> > proposed another approach, which I am implementing now (see cover
> > letter for details). Note that this approach is also much more robust,
> > as annotating things as __head by hand to get it emitted into the
> > section to which the diagnostics are applied is obviously not
> > foolproof.
>
> AFAIR, this is what you've done for arm64 too, right?
>
The new approach proposed in this series is what I implemented before
for arm64, yes.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 14:16 ` Borislav Petkov
2025-05-13 15:01 ` Ard Biesheuvel
@ 2025-05-14 6:20 ` Ingo Molnar
2025-05-14 8:17 ` Borislav Petkov
` (2 more replies)
1 sibling, 3 replies; 58+ messages in thread
From: Ingo Molnar @ 2025-05-14 6:20 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
* Borislav Petkov <bp@alien8.de> wrote:
> > Note that two of those fixes were from Ard who is working on
> > further robustifying the startup code - a much needed change.
>
> Please do explain why is it much needed?
See Ard's excellent description:
https://lore.kernel.org/r/CAMj1kXEzKEuePEiHB+HxvfQbFz0sTiHdn4B++zVBJ2mhkPkQ4Q@mail.gmail.com
This C startup code has been expanding, and in particular, the SEV-SNP
startup code has been expanding over the past couple of years, and
grown many of these warts, where the C code needs to use special
^^^^^^^^^^^^^^^^^^^^^^^^^^
annotations or helpers to access global objects.
Google uses Clang internally, and as expected, it does not behave
quite like GCC in this regard either. The result is that the SEV-SNP
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
boot tended to break in cryptic ways with Clang built kernels, due to
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
absolute references in the startup code that runs before those
absolute references are mapped.
I've done a preliminary pass upstream with RIP_REL_REF() and
rip_rel_ptr() and the use of the .head.text section for startup code
to ensure that we detect such issues at build time, and it has already
resulted in a number of true positives where the code in question
would have failed at boot time. At this point, I'm not aware of any
issues caused by absolute references going undetected.
However, Linus kindly indicated that the resulting diagnostics
produced by the relocs tool do not meet his high standards, and so I
proposed another approach, which I am implementing now (see cover
letter for details). Note that this approach is also much more robust,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
as annotating things as __head by hand to get it emitted into the
section to which the diagnostics are applied is obviously not
foolproof.
Fixing the existing 5-level paging and kernel mapping code was rather
straight-forward. However, splitting up the SEV-SNP code has been
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
rather challenging due to the way it was put together, i.e., as a
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
single source file used everywhere, and to which additional
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
functionality has been added piecemeal (i.e., the SVSM support).
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I've highlighted all the robustness problems and design flaws of the
existing SEV* startup code...
> Really? Much needed huh?
Your failure to follow and understand this series is not an excuse for
the flippant and condescending tone you are using in your replies ...
Ard is being very generous in his responses, and I'll defer to him as
to the timing of this series: v6.17 is perfectly fine too.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-13 15:01 ` Ard Biesheuvel
2025-05-13 16:44 ` Borislav Petkov
@ 2025-05-14 6:32 ` Ingo Molnar
2025-05-14 7:41 ` Ard Biesheuvel
1 sibling, 1 reply; 58+ messages in thread
From: Ingo Molnar @ 2025-05-14 6:32 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Borislav Petkov, Ard Biesheuvel, linux-kernel, linux-efi, x86,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
* Ard Biesheuvel <ardb@kernel.org> wrote:
> On Tue, 13 May 2025 at 15:17, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Tue, May 13, 2025 at 01:22:16PM +0200, Ingo Molnar wrote:
> ...
> > > Note that two of those fixes were from Ard who is working on further
> > > robustifying the startup code - a much needed change.
> >
> > Really? Much needed huh?
> >
> > Please do explain why is it much needed?
> >
> > Because the reason Ard is doing it is a different one but maybe
> > I misunderstood him...
> >
>
> I will refrain from inserting myself into the intra-tip review and
> testing policy debate, but let me at least provide a quick recap of
> what I am doing here and why.
>
> Since commit
>
> c88d71508e36 x86/boot/64: Rewrite startup_64() in C
>
> dated Jun 6 2017, we have been using C code on the boot path in a way
> that is not supported by the toolchain, i.e., to execute non-PIC C
> code from a mapping of memory that is different from the one provided
> to the linker. It should have been obvious at the time that this was a
> bad idea, given the need to sprinkle fixup_pointer() calls left and
> right to manipulate global variables (including non-pointer variables)
> without crashing.
>
> This C startup code has been expanding, and in particular, the SEV-SNP
> startup code has been expanding over the past couple of years, and
> grown many of these warts, where the C code needs to use special
> annotations or helpers to access global objects.
>
> Google uses Clang internally, and as expected, it does not behave
> quite like GCC in this regard either. The result is that the SEV-SNP
> boot tended to break in cryptic ways with Clang built kernels, due to
> absolute references in the startup code that runs before those
> absolute references are mapped.
>
> I've done a preliminary pass upstream with RIP_REL_REF() and
> rip_rel_ptr() and the use of the .head.text section for startup code
> to ensure that we detect such issues at build time, and it has already
> resulted in a number of true positives where the code in question
> would have failed at boot time. At this point, I'm not aware of any
> issues caused by absolute references going undetected.
>
> However, Linus kindly indicated that the resulting diagnostics
> produced by the relocs tool do not meet his high standards, and so I
> proposed another approach, which I am implementing now (see cover
> letter for details). Note that this approach is also much more robust,
> as annotating things as __head by hand to get it emitted into the
> section to which the diagnostics are applied is obviously not
> foolproof.
Exactly.
> Fixing the existing 5-level paging and kernel mapping code was rather
> straight-forward. However, splitting up the SEV-SNP code has been
> rather challenging due to the way it was put together, i.e., as a
> single source file used everywhere, and to which additional
> functionality has been added piecemeal (i.e., the SVSM support).
Yeah.
> It is obvious that these changes should be tested before being merged,
> hence the RFT in the subject. And I have been struggling a bit to get
> access to usable hardware. (I do have access to internal development
> systems, but those do not fit the 'usable' description by any measure,
> given that I have to go through the cloud VM orchestration APIs to
> test boot a simple kernel image).
:-/ This is one of the reasons why bugs have such long latencies here.
For example it appears nobody has run kdump on SEV-SNP since last
August:
d2062cc1b1c3 x86/sev: Do not touch VMSA pages during SNP guest memory kdump
...
It then results in unrecoverable #NPF/RMP faults as the VMSA page is
marked busy/in-use when the vCPU is running and subsequently a causes
guest softlockup/hang.
...
1 file changed, 158 insertions(+), 86 deletions(-)
Yet lack of testability of your SEV-SNP series is still somehow
blocking ongoing development work.
> What Boris might allude to is the fact that some of these changes also
> form a prerequisite for being able to construct a generic EFI zboot
> image for x86, which is a long-term objective that I am working on in
> the background. But this is not the main reason.
>
> In any case, there is no urgency wrt these changes as far as I am
> concerned, and given that I already found an issue myself with v3,
> perhaps it is better if we disregard it for the time being, and we can
> come back to it for the next cycle. In the mean time, I can compare
> notes with Boris and Tom directly to ensure that this is in the right
> shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> as well (for which I sent out a RFC/v3 today).
You are being exceedingly generous here, but obviously boot code
changes need quite a bit of testing, and v6.17 (or later) is perfectly
fine too.
We could perhaps do the mechanical code movement to
arch/x86/boot/startup/ alone, without any of the followup functional
changes. This would reduce the cross section of the riskiest part of
your series substantially. If that sounds good to you, please send a
series for review.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-14 6:32 ` Ingo Molnar
@ 2025-05-14 7:41 ` Ard Biesheuvel
2025-05-15 7:17 ` Ingo Molnar
0 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-14 7:41 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, Ard Biesheuvel, linux-kernel, linux-efi, x86,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Wed, 14 May 2025 at 07:32, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb@kernel.org> wrote:
>
...
> > In any case, there is no urgency wrt these changes as far as I am
> > concerned, and given that I already found an issue myself with v3,
> > perhaps it is better if we disregard it for the time being, and we can
> > come back to it for the next cycle. In the mean time, I can compare
> > notes with Boris and Tom directly to ensure that this is in the right
> > shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> > as well (for which I sent out a RFC/v3 today).
>
...
> We could perhaps do the mechanical code movement to
> arch/x86/boot/startup/ alone, without any of the followup functional
> changes. This would reduce the cross section of the riskiest part of
> your series substantially.
The first phase of this work, which is already queued up, was to move
all of the source files that were using RIP_REL_REF() into
arch/x86/boot/startup to be built with -fPIC so that RIP_REL_REF()
could be removed.
The current phase is to separate code that really needs to live under
startup/ from code that doesn't. This is the bit that was
straight-forward for mapping the kernel (including the SME encryption
pieces) because they were already in dedicated source files, but not
so straight-forward for SEV-SNP.
In reality, the mechanical code movement in this phase is mostly in
the opposite direction, where things are separated into startup and
non-startup code at a high level of detail, and the latter is moved
out again.
> If that sounds good to you, please send a
> series for review.
>
Not sure what happened to the tip/x86/boot branch in the meantime, but
assuming that what was already queued up is still scheduled for the
next cycle, I don't think there are any parts of this series that
could be meaningfully rearranged. IOW, the SEV-SNP refactoring needs
to be completed first, which accounts for most of the code movement.
Then, implementing the confined symbol space is just a couple of
patches on top.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-14 6:20 ` Ingo Molnar
@ 2025-05-14 8:17 ` Borislav Petkov
2025-05-14 8:21 ` Borislav Petkov
2025-05-14 9:54 ` Thomas Gleixner
2 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-14 8:17 UTC (permalink / raw)
To: Ingo Molnar
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Wed, May 14, 2025 at 08:20:31AM +0200, Ingo Molnar wrote:
> I've highlighted all the robustness problems and design flaws of the
> existing SEV* startup code...
You can highlight all you want - nothing warrants rushing those in without
testing. Period. No matter how much you try to spin it and flip it.
> Your failure to follow and understand this series is not an excuse for
> the flippant and condescending tone you are using in your replies ...
You must be confusing me with someone else to constantly give me that
condescending schoolteacher tone.
If you do shit like you do, you will get called on it until you behave
according to the rules we all have agreed upon.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-14 6:20 ` Ingo Molnar
2025-05-14 8:17 ` Borislav Petkov
@ 2025-05-14 8:21 ` Borislav Petkov
2025-05-14 9:54 ` Thomas Gleixner
2 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-14 8:21 UTC (permalink / raw)
To: Ingo Molnar
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
On Wed, May 14, 2025 at 08:20:31AM +0200, Ingo Molnar wrote:
> See Ard's excellent description:
Btw, you somehow conveniently ignored the patch Ard mentioned started all
this: a patch you committed without any review.
Sounds familiar?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-14 6:20 ` Ingo Molnar
2025-05-14 8:17 ` Borislav Petkov
2025-05-14 8:21 ` Borislav Petkov
@ 2025-05-14 9:54 ` Thomas Gleixner
2 siblings, 0 replies; 58+ messages in thread
From: Thomas Gleixner @ 2025-05-14 9:54 UTC (permalink / raw)
To: Ingo Molnar, Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ard Biesheuvel,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
Ingo!
On Wed, May 14 2025 at 08:20, Ingo Molnar wrote:
> * Borislav Petkov <bp@alien8.de> wrote:
> Your failure to follow and understand this series is not an excuse for
> the flippant and condescending tone you are using in your replies ...
Can you please stop right there?
I've been observing this for a while now and to be blatantly honest,
_you_ are the one who is constantly failing to work in a team and _you_
are the one who acts like a 19'th century school master.
Your outbreaks of showing up and applying random patches fresh from the
press, your absolute refusal to slow down when asked for, is annoying as
hell to everyone working in and against the tip tree.
You can do whatever you want on your private mingo/*.git branches, but
this frenzy of shoving crap into tip has to stop once and forever.
Borislav, Dave and Peter are doing a great job in taming the influx and
reviewing the patch firehose, but there are limitations on bandwidth and
you cannot demand that everyone drops everything just because you
declare that a particular patch series is the most important thing since
the invention of sliced bread.
You are not the one setting the rules of how the tip team works together
and if you can't agree with the rest of the team, then either we all sit
together and hash it out or you step back.
Continuing this sh*tshow, which is going on for way too long now (hint:
years), is not an option at all.
Thanks,
Thomas
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
` (21 preceding siblings ...)
2025-05-12 19:17 ` [RFT PATCH v3 00/21] x86: strict separation of startup code Borislav Petkov
@ 2025-05-14 17:21 ` Borislav Petkov
2025-05-14 17:37 ` Ard Biesheuvel
22 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-14 17:21 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> This is somewhat intrusive, as there are many data objects that are
> referenced both by startup code and by ordinary code, and an alias needs
> to be emitted for each of those. If startup code references anything
> that has not been made available to it explicitly, a build time link
> error will occur.
Makes me wonder: what will be our rule for what should be made available to
startup code, what should be moved to startup code etc....
I guess as less as possible but past experience teaches us differently.
I guess applying the usual skeptical approach should be sane enough...
We'll see.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-14 17:21 ` Borislav Petkov
@ 2025-05-14 17:37 ` Ard Biesheuvel
2025-05-14 18:53 ` Borislav Petkov
0 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-14 17:37 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Wed, 14 May 2025 at 18:21, Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote:
> > This is somewhat intrusive, as there are many data objects that are
> > referenced both by startup code and by ordinary code, and an alias needs
> > to be emitted for each of those. If startup code references anything
> > that has not been made available to it explicitly, a build time link
> > error will occur.
>
> Makes me wonder: what will be our rule for what should be made available to
> startup code, what should be moved to startup code etc....
>
The rule is really that you can no longer randomly call other code
without being forced to consider carefully what else gets pulled in,
and whether or not that code is guaranteed to behave correctly when
being called via the 1:1 mapping.
> I guess as less as possible but past experience teaches us differently.
> I guess applying the usual skeptical approach should be sane enough...
>
Basically, the first order of business when calling the kernel via the
1:1 mapping is to create the kernel virtual mapping in the page
tables. It is just really unfortunate that SEV-SNP requires so much
prep work before we can even map the kernel.
I don't anticipate that this code will grow a lot after I'm done with it.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-14 17:37 ` Ard Biesheuvel
@ 2025-05-14 18:53 ` Borislav Petkov
0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-14 18:53 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Wed, May 14, 2025 at 06:37:14PM +0100, Ard Biesheuvel wrote:
> The rule is really that you can no longer randomly call other code
> without being forced to consider carefully what else gets pulled in,
> and whether or not that code is guaranteed to behave correctly when
> being called via the 1:1 mapping.
I like "consider carefully". Right now, the decompressor is a mess and
untangling it is a losing game.
> Basically, the first order of business when calling the kernel via the
> 1:1 mapping is to create the kernel virtual mapping in the page
> tables. It is just really unfortunate that SEV-SNP requires so much
> prep work before we can even map the kernel.
>
> I don't anticipate that this code will grow a lot after I'm done with it.
Right, ok.
Lemme keep looking.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 00/21] x86: strict separation of startup code
2025-05-14 7:41 ` Ard Biesheuvel
@ 2025-05-15 7:17 ` Ingo Molnar
0 siblings, 0 replies; 58+ messages in thread
From: Ingo Molnar @ 2025-05-15 7:17 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Borislav Petkov, Ard Biesheuvel, linux-kernel, linux-efi, x86,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky, Linus Torvalds
* Ard Biesheuvel <ardb@kernel.org> wrote:
> On Wed, 14 May 2025 at 07:32, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ard Biesheuvel <ardb@kernel.org> wrote:
> >
> ...
> > > In any case, there is no urgency wrt these changes as far as I am
> > > concerned, and given that I already found an issue myself with v3,
> > > perhaps it is better if we disregard it for the time being, and we can
> > > come back to it for the next cycle. In the mean time, I can compare
> > > notes with Boris and Tom directly to ensure that this is in the right
> > > shape, and perhaps we could at least fix the pgtable_l5_enabled() mess
> > > as well (for which I sent out a RFC/v3 today).
> >
> ...
> > We could perhaps do the mechanical code movement to
> > arch/x86/boot/startup/ alone, without any of the followup functional
> > changes. This would reduce the cross section of the riskiest part of
> > your series substantially.
>
> The first phase of this work, which is already queued up, was to move
> all of the source files that were using RIP_REL_REF() into
> arch/x86/boot/startup to be built with -fPIC so that RIP_REL_REF()
> could be removed.
>
> The current phase is to separate code that really needs to live under
> startup/ from code that doesn't. This is the bit that was
> straight-forward for mapping the kernel (including the SME encryption
> pieces) because they were already in dedicated source files, but not
> so straight-forward for SEV-SNP.
>
> In reality, the mechanical code movement in this phase is mostly in
> the opposite direction, where things are separated into startup and
> non-startup code at a high level of detail, and the latter is moved
> out again.
>
> > If that sounds good to you, please send a
> > series for review.
> >
>
> Not sure what happened to the tip/x86/boot branch in the meantime,
It got merged into tip:x86/core. I wrote you an email about it
yesterday, should be somewhere in your inbox. :)
> [...] but assuming that what was already queued up is still scheduled
> for the next cycle, I don't think there are any parts of this series
> that could be meaningfully rearranged. IOW, the SEV-SNP refactoring
> needs to be completed first, which accounts for most of the code
> movement.
Understood.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-05-12 19:08 ` [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
@ 2025-05-15 7:22 ` Ingo Molnar
2025-05-15 10:24 ` Ard Biesheuvel
2025-05-15 11:10 ` Borislav Petkov
1 sibling, 1 reply; 58+ messages in thread
From: Ingo Molnar @ 2025-05-15 7:22 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Borislav Petkov,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
* Ard Biesheuvel <ardb+git@google.com> wrote:
> + if (cr4 & X86_CR4_OSXSAVE)
> + /* Safe to read xcr0 */
> + ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
> + else
> + /* xgetbv will cause #UD - use reset value for xcr0 */
> + ghcb_set_xcr0(ghcb, 1);
Just a couple of style nits here - this new __sev_cpuid_hv_ghcb()
function should follow the existing SEV* (and general x86) style:
- When mentioning instruction mnemonics, please capitalize them, such
as XGETBV here.
- Same goes for registers, such as XCR0.
- Finally, if a comment line turns a branch into a multi-line
statement, curly braces are needed, even if it's technically still a
single statement per the C syntax which eliminates comments at the
preprocessor level.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-05-15 7:22 ` Ingo Molnar
@ 2025-05-15 10:24 ` Ard Biesheuvel
2025-05-15 15:18 ` Ingo Molnar
0 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-15 10:24 UTC (permalink / raw)
To: Ingo Molnar
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Borislav Petkov,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Thu, 15 May 2025 at 08:22, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Ard Biesheuvel <ardb+git@google.com> wrote:
>
> > + if (cr4 & X86_CR4_OSXSAVE)
> > + /* Safe to read xcr0 */
> > + ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
> > + else
> > + /* xgetbv will cause #UD - use reset value for xcr0 */
> > + ghcb_set_xcr0(ghcb, 1);
>
> Just a couple of style nits here - this new __sev_cpuid_hv_ghcb()
> function
__sev_cpuid_hv_ghcb() is just being moved from one source file to
another - I didn't change a single line, and so I don't think tweaking
the style is appropriate for this patch.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-05-12 19:08 ` [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-05-15 7:22 ` Ingo Molnar
@ 2025-05-15 11:10 ` Borislav Petkov
2025-05-15 14:22 ` Ard Biesheuvel
1 sibling, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-15 11:10 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:36PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> There are two distinct callers of snp_cpuid(): one where the MSR
> protocol is always used, and one where the GHCB page based interface is
> always used.
Yeah, let's stick to the nomenclature, pls: you have a GHCB protocol and a MSR
protocol. We call both protocols. :)
> The snp_cpuid() logic does not care about the distinction, which only
> matters at a lower level. But the fact that it supports both interfaces
> means that the GHCB page based logic is pulled into the early startup
> code where PA to VA conversions are problematic, given that it runs from
> the 1:1 mapping of memory.
>
> So keep snp_cpuid() itself in the startup code, but factor out the
> hypervisor calls via a callback, so that the GHCB page handling can be
> moved out.
>
> Code refactoring only - no functional change intended.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/boot/startup/sev-shared.c | 58 ++++----------------
> arch/x86/coco/sev/vc-shared.c | 49 ++++++++++++++++-
> arch/x86/include/asm/sev.h | 3 +-
> 3 files changed, 61 insertions(+), 49 deletions(-)
...
> @@ -484,21 +447,21 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> return false;
> }
>
> -static void snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> +static void snp_cpuid_hv_no_ghcb(void *ctx, struct cpuid_leaf *leaf)
Uff, those suffixes make my head hurt. So this is the MSR prot CPUID. Let's
call it this way:
snp_cpuid_msr_prot()
and the other one
snp_cpuid_ghcb_prot()
All clear this way.
> {
> - if (sev_cpuid_hv(ghcb, ctxt, leaf))
> + if (__sev_cpuid_hv_msr(leaf))
__sev_cpuid_msr_prot
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
> }
>
> static int __heada
Let's zap that ugly linebreak.
> -snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> - struct cpuid_leaf *leaf)
> +snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
Let's call that just "cpuid" now that it can be different things and it is
a pointer.
> + void *ctx, struct cpuid_leaf *leaf)
> {
> struct cpuid_leaf leaf_hv = *leaf;
>
> switch (leaf->fn) {
> case 0x1:
> - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> + cpuid_hv(ctx, &leaf_hv);
>
> /* initial APIC ID */
> leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
> @@ -517,7 +480,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> break;
> case 0xB:
> leaf_hv.subfn = 0;
> - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> + cpuid_hv(ctx, &leaf_hv);
>
> /* extended APIC ID */
> leaf->edx = leaf_hv.edx;
> @@ -565,7 +528,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> }
> break;
> case 0x8000001E:
> - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> + cpuid_hv(ctx, &leaf_hv);
>
> /* extended APIC ID */
> leaf->eax = leaf_hv.eax;
> @@ -587,7 +550,8 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> * should be treated as fatal by caller.
> */
> int __head
And that ugly linebreak too pls.
...
Here's a diff ontop with my changes. I think it looks a lot saner now and one
can really differentiate which is which.
Could more cleanup be done? Ofc. But that for later patches.
Thx.
---
diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
index 408e064a80d9..992abfa50508 100644
--- a/arch/x86/boot/startup/sev-shared.c
+++ b/arch/x86/boot/startup/sev-shared.c
@@ -319,7 +319,7 @@ static int __sev_cpuid_hv(u32 fn, int reg_idx, u32 *reg)
return 0;
}
-static int __sev_cpuid_hv_msr(struct cpuid_leaf *leaf)
+static int __sev_cpuid_msr_prot(struct cpuid_leaf *leaf)
{
int ret;
@@ -447,21 +447,20 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
return false;
}
-static void snp_cpuid_hv_no_ghcb(void *ctx, struct cpuid_leaf *leaf)
+static void snp_cpuid_msr_prot(void *ctx, struct cpuid_leaf *leaf)
{
- if (__sev_cpuid_hv_msr(leaf))
+ if (__sev_cpuid_msr_prot(leaf))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
}
-static int __head
-snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
- void *ctx, struct cpuid_leaf *leaf)
+static int __head snp_cpuid_postprocess(void (*cpuid)(void *ctx, struct cpuid_leaf *),
+ void *ctx, struct cpuid_leaf *leaf)
{
struct cpuid_leaf leaf_hv = *leaf;
switch (leaf->fn) {
case 0x1:
- cpuid_hv(ctx, &leaf_hv);
+ cpuid(ctx, &leaf_hv);
/* initial APIC ID */
leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
@@ -480,7 +479,7 @@ snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
break;
case 0xB:
leaf_hv.subfn = 0;
- cpuid_hv(ctx, &leaf_hv);
+ cpuid(ctx, &leaf_hv);
/* extended APIC ID */
leaf->edx = leaf_hv.edx;
@@ -528,7 +527,7 @@ snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
}
break;
case 0x8000001E:
- cpuid_hv(ctx, &leaf_hv);
+ cpuid(ctx, &leaf_hv);
/* extended APIC ID */
leaf->eax = leaf_hv.eax;
@@ -549,9 +548,8 @@ snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
* Returns -EOPNOTSUPP if feature not enabled. Any other non-zero return value
* should be treated as fatal by caller.
*/
-int __head
-snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
- void *ctx, struct cpuid_leaf *leaf)
+int __head snp_cpuid(void (*cpuid)(void *ctx, struct cpuid_leaf *), void *ctx,
+ struct cpuid_leaf *leaf)
{
const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
@@ -585,7 +583,7 @@ snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
return 0;
}
- return snp_cpuid_postprocess(cpuid_hv, ctx, leaf);
+ return snp_cpuid_postprocess(cpuid, ctx, leaf);
}
/*
@@ -612,14 +610,14 @@ void __head do_vc_no_ghcb(struct pt_regs *regs, unsigned long exit_code)
leaf.fn = fn;
leaf.subfn = subfn;
- ret = snp_cpuid(snp_cpuid_hv_no_ghcb, NULL, &leaf);
+ ret = snp_cpuid(snp_cpuid_msr_prot, NULL, &leaf);
if (!ret)
goto cpuid_done;
if (ret != -EOPNOTSUPP)
goto fail;
- if (__sev_cpuid_hv_msr(&leaf))
+ if (__sev_cpuid_msr_prot(&leaf))
goto fail;
cpuid_done:
diff --git a/arch/x86/coco/sev/vc-shared.c b/arch/x86/coco/sev/vc-shared.c
index b4688f69102e..776cb90be530 100644
--- a/arch/x86/coco/sev/vc-shared.c
+++ b/arch/x86/coco/sev/vc-shared.c
@@ -409,7 +409,7 @@ static enum es_result vc_handle_ioio(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
return ret;
}
-static int __sev_cpuid_hv_ghcb(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
+static int __sev_cpuid_ghcb_prot(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
{
u32 cr4 = native_read_cr4();
int ret;
@@ -447,11 +447,11 @@ struct cpuid_ctx {
struct es_em_ctxt *ctxt;
};
-static void snp_cpuid_hv_ghcb(void *p, struct cpuid_leaf *leaf)
+static void snp_cpuid_ghcb_prot(void *p, struct cpuid_leaf *leaf)
{
struct cpuid_ctx *ctx = p;
- if (__sev_cpuid_hv_ghcb(ctx->ghcb, ctx->ctxt, leaf))
+ if (__sev_cpuid_ghcb_prot(ctx->ghcb, ctx->ctxt, leaf))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
}
@@ -464,7 +464,7 @@ static int vc_handle_cpuid_snp(struct ghcb *ghcb, struct es_em_ctxt *ctxt)
leaf.fn = regs->ax;
leaf.subfn = regs->cx;
- ret = snp_cpuid(snp_cpuid_hv_ghcb, &ctx, &leaf);
+ ret = snp_cpuid(snp_cpuid_ghcb_prot, &ctx, &leaf);
if (!ret) {
regs->ax = leaf.eax;
regs->bx = leaf.ebx;
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply related [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-05-15 11:10 ` Borislav Petkov
@ 2025-05-15 14:22 ` Ard Biesheuvel
0 siblings, 0 replies; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-15 14:22 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Thu, 15 May 2025 at 12:10, Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, May 12, 2025 at 09:08:36PM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > There are two distinct callers of snp_cpuid(): one where the MSR
> > protocol is always used, and one where the GHCB page based interface is
> > always used.
>
> Yeah, let's stick to the nomenclature, pls: you have a GHCB protocol and a MSR
> protocol. We call both protocols. :)
>
> > The snp_cpuid() logic does not care about the distinction, which only
> > matters at a lower level. But the fact that it supports both interfaces
> > means that the GHCB page based logic is pulled into the early startup
> > code where PA to VA conversions are problematic, given that it runs from
> > the 1:1 mapping of memory.
> >
> > So keep snp_cpuid() itself in the startup code, but factor out the
> > hypervisor calls via a callback, so that the GHCB page handling can be
> > moved out.
> >
> > Code refactoring only - no functional change intended.
> >
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> > arch/x86/boot/startup/sev-shared.c | 58 ++++----------------
> > arch/x86/coco/sev/vc-shared.c | 49 ++++++++++++++++-
> > arch/x86/include/asm/sev.h | 3 +-
> > 3 files changed, 61 insertions(+), 49 deletions(-)
>
> ...
>
> > @@ -484,21 +447,21 @@ snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> > return false;
> > }
> >
> > -static void snp_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > +static void snp_cpuid_hv_no_ghcb(void *ctx, struct cpuid_leaf *leaf)
>
> Uff, those suffixes make my head hurt. So this is the MSR prot CPUID. Let's
> call it this way:
>
> snp_cpuid_msr_prot()
>
> and the other one
>
> snp_cpuid_ghcb_prot()
>
> All clear this way.
>
> > {
> > - if (sev_cpuid_hv(ghcb, ctxt, leaf))
> > + if (__sev_cpuid_hv_msr(leaf))
>
> __sev_cpuid_msr_prot
>
> > sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID_HV);
> > }
> >
> > static int __heada
>
> Let's zap that ugly linebreak.
>
> > -snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > - struct cpuid_leaf *leaf)
> > +snp_cpuid_postprocess(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
>
> Let's call that just "cpuid" now that it can be different things and it is
> a pointer.
>
> > + void *ctx, struct cpuid_leaf *leaf)
> > {
> > struct cpuid_leaf leaf_hv = *leaf;
> >
> > switch (leaf->fn) {
> > case 0x1:
> > - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> > + cpuid_hv(ctx, &leaf_hv);
> >
> > /* initial APIC ID */
> > leaf->ebx = (leaf_hv.ebx & GENMASK(31, 24)) | (leaf->ebx & GENMASK(23, 0));
> > @@ -517,7 +480,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > break;
> > case 0xB:
> > leaf_hv.subfn = 0;
> > - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> > + cpuid_hv(ctx, &leaf_hv);
> >
> > /* extended APIC ID */
> > leaf->edx = leaf_hv.edx;
> > @@ -565,7 +528,7 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > }
> > break;
> > case 0x8000001E:
> > - snp_cpuid_hv(ghcb, ctxt, &leaf_hv);
> > + cpuid_hv(ctx, &leaf_hv);
> >
> > /* extended APIC ID */
> > leaf->eax = leaf_hv.eax;
> > @@ -587,7 +550,8 @@ snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > * should be treated as fatal by caller.
> > */
> > int __head
>
> And that ugly linebreak too pls.
>
> ...
>
> Here's a diff ontop with my changes. I think it looks a lot saner now and one
> can really differentiate which is which.
>
Thanks, I'll fold that in.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback
2025-05-15 10:24 ` Ard Biesheuvel
@ 2025-05-15 15:18 ` Ingo Molnar
0 siblings, 0 replies; 58+ messages in thread
From: Ingo Molnar @ 2025-05-15 15:18 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Borislav Petkov,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
* Ard Biesheuvel <ardb@kernel.org> wrote:
> On Thu, 15 May 2025 at 08:22, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Ard Biesheuvel <ardb+git@google.com> wrote:
> >
> > > + if (cr4 & X86_CR4_OSXSAVE)
> > > + /* Safe to read xcr0 */
> > > + ghcb_set_xcr0(ghcb, xgetbv(XCR_XFEATURE_ENABLED_MASK));
> > > + else
> > > + /* xgetbv will cause #UD - use reset value for xcr0 */
> > > + ghcb_set_xcr0(ghcb, 1);
> >
> > Just a couple of style nits here - this new __sev_cpuid_hv_ghcb()
> > function
>
> __sev_cpuid_hv_ghcb() is just being moved from one source file to
> another - I didn't change a single line, and so I don't think tweaking
> the style is appropriate for this patch.
Yeah, fair enough - in fact changing anything in a pure-movement patch
would be counterproductive.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 02/21] x86/sev: Use MSR protocol for remapping SVSM calling area
2025-05-12 19:08 ` [RFT PATCH v3 02/21] x86/sev: Use MSR protocol for remapping SVSM calling area Ard Biesheuvel
@ 2025-05-15 16:43 ` Borislav Petkov
0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-15 16:43 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:37PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> As the preceding code comment already indicates, remapping the SVSM
> calling area occurs long before the GHCB page is configured, and so
> calling svsm_perform_call_protocol() is guaranteed to result in a call
> to svsm_perform_msr_protocol().
>
> So just call the latter directly. This allows most of the GHCB based API
> infrastructure to be moved out of the startup code in a subsequent
> patch.
>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
> arch/x86/boot/startup/sev-startup.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/startup/sev-startup.c b/arch/x86/boot/startup/sev-startup.c
> index 435853a55768..a1d5a5632d58 100644
> --- a/arch/x86/boot/startup/sev-startup.c
> +++ b/arch/x86/boot/startup/sev-startup.c
> @@ -325,7 +325,9 @@ static __head void svsm_setup(struct cc_blob_sev_info *cc_info)
> call.caa = svsm_get_caa();
> call.rax = SVSM_CORE_CALL(SVSM_CORE_REMAP_CA);
> call.rcx = pa;
> - ret = svsm_perform_call_protocol(&call);
> + do {
> + ret = svsm_perform_msr_protocol(&call);
> + } while (ret == -EAGAIN);
Right, a future cleanup for another patch would be to wrap that loop into
a function. But not now.
> if (ret)
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_SVSM_CA_REMAP_FAIL);
>
> --
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 04/21] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL
2025-05-12 19:08 ` [RFT PATCH v3 04/21] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
@ 2025-05-20 9:44 ` Borislav Petkov
0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-20 9:44 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:39PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> Determining the VMPL at which the kernel runs involves performing a
> RMPADJUST operation on an arbitary page of memory, and observing whether
Time to turn on that spellchecker... :-)
RMPADJUST operation on an arbitary page of memory, and observing whether
Unknown word [arbitary] in commit message.
Suggestions: ['arbitrary', 'obituary', 'arbiter', 'arbitrate', 'arbiters', 'Arbitron', 'arbitrage', 'artery', "arbiter's", 'orbiter']
arbitary, but results in the need to provide a PIC alias for it. So use
Unknown word [arbitary] in commit message.
Suggestions: ['arbitrary', 'obituary', 'arbiter', 'arbitrate', 'arbiters', 'Arbitron', 'arbitrage', 'artery', "arbiter's", 'orbiter']
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code
2025-05-12 19:08 ` [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
@ 2025-05-20 11:38 ` Borislav Petkov
2025-05-20 11:49 ` Ard Biesheuvel
0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-20 11:38 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:40PM +0200, Ard Biesheuvel wrote:
> diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> index f50a606be4f1..07081bb85331 100644
> --- a/arch/x86/include/asm/sev.h
> +++ b/arch/x86/include/asm/sev.h
> @@ -485,6 +485,7 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
> struct snp_guest_request_ioctl;
>
> void setup_ghcb(void);
> +void snp_register_ghcb_early(unsigned long paddr);
> void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
> unsigned long npages);
> void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> @@ -521,8 +522,6 @@ static __always_inline void vc_ghcb_invalidate(struct ghcb *ghcb)
> __builtin_memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
> }
>
> -void vc_forward_exception(struct es_em_ctxt *ctxt);
> -
> /* I/O parameters for CPUID-related helpers */
> struct cpuid_leaf {
> u32 fn;
> @@ -533,15 +532,71 @@ struct cpuid_leaf {
> u32 edx;
> };
>
> +int svsm_perform_msr_protocol(struct svsm_call *call);
> int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
> void *ctx, struct cpuid_leaf *leaf);
>
> +/*
> + * Issue a VMGEXIT to call the SVSM:
> + * - Load the SVSM register state (RAX, RCX, RDX, R8 and R9)
> + * - Set the CA call pending field to 1
> + * - Issue VMGEXIT
> + * - Save the SVSM return register state (RAX, RCX, RDX, R8 and R9)
> + * - Perform atomic exchange of the CA call pending field
> + *
> + * - See the "Secure VM Service Module for SEV-SNP Guests" specification for
> + * details on the calling convention.
> + * - The calling convention loosely follows the Microsoft X64 calling
> + * convention by putting arguments in RCX, RDX, R8 and R9.
> + * - RAX specifies the SVSM protocol/callid as input and the return code
> + * as output.
> + */
> +static __always_inline void svsm_issue_call(struct svsm_call *call, u8 *pending)
Why in a header?
At least in sev-internal.h or so.
I'd prefer for this to be in a .c file, though.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code
2025-05-20 11:38 ` Borislav Petkov
@ 2025-05-20 11:49 ` Ard Biesheuvel
2025-05-20 13:58 ` Borislav Petkov
0 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-20 11:49 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Tue, 20 May 2025 at 13:39, Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, May 12, 2025 at 09:08:40PM +0200, Ard Biesheuvel wrote:
> > diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
> > index f50a606be4f1..07081bb85331 100644
> > --- a/arch/x86/include/asm/sev.h
> > +++ b/arch/x86/include/asm/sev.h
> > @@ -485,6 +485,7 @@ static inline int pvalidate(unsigned long vaddr, bool rmp_psize, bool validate)
> > struct snp_guest_request_ioctl;
> >
> > void setup_ghcb(void);
> > +void snp_register_ghcb_early(unsigned long paddr);
> > void early_snp_set_memory_private(unsigned long vaddr, unsigned long paddr,
> > unsigned long npages);
> > void early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr,
> > @@ -521,8 +522,6 @@ static __always_inline void vc_ghcb_invalidate(struct ghcb *ghcb)
> > __builtin_memset(ghcb->save.valid_bitmap, 0, sizeof(ghcb->save.valid_bitmap));
> > }
> >
> > -void vc_forward_exception(struct es_em_ctxt *ctxt);
> > -
> > /* I/O parameters for CPUID-related helpers */
> > struct cpuid_leaf {
> > u32 fn;
> > @@ -533,15 +532,71 @@ struct cpuid_leaf {
> > u32 edx;
> > };
> >
> > +int svsm_perform_msr_protocol(struct svsm_call *call);
> > int snp_cpuid(void (*cpuid_hv)(void *ctx, struct cpuid_leaf *),
> > void *ctx, struct cpuid_leaf *leaf);
> >
> > +/*
> > + * Issue a VMGEXIT to call the SVSM:
> > + * - Load the SVSM register state (RAX, RCX, RDX, R8 and R9)
> > + * - Set the CA call pending field to 1
> > + * - Issue VMGEXIT
> > + * - Save the SVSM return register state (RAX, RCX, RDX, R8 and R9)
> > + * - Perform atomic exchange of the CA call pending field
> > + *
> > + * - See the "Secure VM Service Module for SEV-SNP Guests" specification for
> > + * details on the calling convention.
> > + * - The calling convention loosely follows the Microsoft X64 calling
> > + * convention by putting arguments in RCX, RDX, R8 and R9.
> > + * - RAX specifies the SVSM protocol/callid as input and the return code
> > + * as output.
> > + */
> > +static __always_inline void svsm_issue_call(struct svsm_call *call, u8 *pending)
>
> Why in a header?
>
> At least in sev-internal.h or so.
>
> I'd prefer for this to be in a .c file, though.
>
OK. In that case, it will be implemented in the startup code then, and
exported to the rest of the core kernel via a PIC alias.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code
2025-05-20 11:49 ` Ard Biesheuvel
@ 2025-05-20 13:58 ` Borislav Petkov
0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-20 13:58 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Tue, May 20, 2025 at 01:49:35PM +0200, Ard Biesheuvel wrote:
> OK. In that case, it will be implemented in the startup code then, and
> exported to the rest of the core kernel via a PIC alias.
Right, let's try that for now - we'll see how it all settles with time and
whether we have to stick it into a internal header eventually.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check
2025-05-12 19:08 ` [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
@ 2025-05-30 11:16 ` Borislav Petkov
2025-05-30 14:28 ` Ard Biesheuvel
0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-30 11:16 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:47PM +0200, Ard Biesheuvel wrote:
> diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> index 70ad9a0aa023..560985ef8df6 100644
> --- a/arch/x86/boot/startup/sev-shared.c
> +++ b/arch/x86/boot/startup/sev-shared.c
> @@ -66,16 +66,10 @@ sev_es_terminate(unsigned int set, unsigned int reason)
> asm volatile("hlt\n" : : : "memory");
> }
>
> -/*
> - * The hypervisor features are available from GHCB version 2 onward.
> - */
> -u64 get_hv_features(void)
> +static u64 __head get_hv_features(void)
> {
> u64 val;
>
> - if (ghcb_version < 2)
> - return 0;
Why?
> -
> sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
> VMGEXIT();
>
> @@ -86,6 +80,31 @@ u64 get_hv_features(void)
> return GHCB_MSR_HV_FT_RESP_VAL(val);
> }
>
> +u64 __head snp_check_hv_features(void)
> +{
> + /*
> + * SNP is supported in v2 of the GHCB spec which mandates support for HV
> + * features.
> + */
> + if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
> + u64 hv_features;
> +
> + hv_features = get_hv_features();
> + if (!(hv_features & GHCB_HV_FT_SNP))
> + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
> +
> + /*
> + * Running at VMPL0 is required unless an SVSM is present and
> + * the hypervisor supports the required SVSM GHCB events.
> + */
> + if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
> + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
> +
> + return hv_features;
> + }
> + return 0;
> +}
> +
> int svsm_perform_msr_protocol(struct svsm_call *call)
> {
> u8 pending = 0;
> diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
> index 753cd2094080..0ae04e333f51 100644
> --- a/arch/x86/boot/startup/sme.c
> +++ b/arch/x86/boot/startup/sme.c
> @@ -533,6 +533,8 @@ void __head sme_enable(struct boot_params *bp)
> if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> + sev_hv_features = snp_check_hv_features();
Is this writing the sev_hv_features declared in
arch/x86/boot/startup/sev-startup.c
?
arch/x86/boot/startup/sev-startup.c:45:u64 sev_hv_features __ro_after_init;
arch/x86/boot/startup/sme.c:536: sev_hv_features = snp_check_hv_features();
arch/x86/coco/sev/core.c:980: if (!(sev_hv_features & GHCB_HV_FT_SNP_AP_CREATION))
arch/x86/include/asm/sev-internal.h:5:extern u64 sev_hv_features;
arch/x86/include/asm/sev.h:428:extern u64 sev_hv_features;
I'm confused.
If sev_hv_features is startup code, why isn't it accessed this way...?
/me goes and looks forward in the set...
oh my, that is coming.
> +
> /* Check if memory encryption is enabled */
> if (feature_mask == AMD_SME_BIT) {
> if (!(bp->hdr.xloadflags & XLF_MEM_ENCRYPTION))
> diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> index fa7fdd11a45b..fc4f6f188d42 100644
> --- a/arch/x86/coco/sev/core.c
> +++ b/arch/x86/coco/sev/core.c
> @@ -1264,17 +1264,6 @@ void __init sev_es_init_vc_handling(void)
> if (!sev_es_check_cpu_features())
> panic("SEV-ES CPU Features missing");
>
> - /*
> - * SNP is supported in v2 of the GHCB spec which mandates support for HV
> - * features.
> - */
> - if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> - sev_hv_features = get_hv_features();
> -
> - if (!(sev_hv_features & GHCB_HV_FT_SNP))
> - sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
> - }
I guess we would've terminated earlier anyway so that's probably ok...
> -
> /* Initialize per-cpu GHCB pages */
> for_each_possible_cpu(cpu) {
> alloc_runtime_data(cpu);
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check
2025-05-30 11:16 ` Borislav Petkov
@ 2025-05-30 14:28 ` Ard Biesheuvel
2025-05-30 16:08 ` Borislav Petkov
0 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-30 14:28 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Fri, 30 May 2025 at 13:17, Borislav Petkov <bp@alien8.de> wrote:
>
> On Mon, May 12, 2025 at 09:08:47PM +0200, Ard Biesheuvel wrote:
> > diff --git a/arch/x86/boot/startup/sev-shared.c b/arch/x86/boot/startup/sev-shared.c
> > index 70ad9a0aa023..560985ef8df6 100644
> > --- a/arch/x86/boot/startup/sev-shared.c
> > +++ b/arch/x86/boot/startup/sev-shared.c
> > @@ -66,16 +66,10 @@ sev_es_terminate(unsigned int set, unsigned int reason)
> > asm volatile("hlt\n" : : : "memory");
> > }
> >
> > -/*
> > - * The hypervisor features are available from GHCB version 2 onward.
> > - */
> > -u64 get_hv_features(void)
> > +static u64 __head get_hv_features(void)
> > {
> > u64 val;
> >
> > - if (ghcb_version < 2)
> > - return 0;
>
> Why?
>
It is explained below ...
> > -
> > sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
> > VMGEXIT();
> >
> > @@ -86,6 +80,31 @@ u64 get_hv_features(void)
> > return GHCB_MSR_HV_FT_RESP_VAL(val);
> > }
> >
> > +u64 __head snp_check_hv_features(void)
> > +{
> > + /*
> > + * SNP is supported in v2 of the GHCB spec which mandates support for HV
> > + * features.
> > + */
... get_hv_features() is only when SEV-SNP has already been detected.
> > + if (sev_status & MSR_AMD64_SEV_SNP_ENABLED) {
> > + u64 hv_features;
> > +
> > + hv_features = get_hv_features();
> > + if (!(hv_features & GHCB_HV_FT_SNP))
> > + sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
> > +
> > + /*
> > + * Running at VMPL0 is required unless an SVSM is present and
> > + * the hypervisor supports the required SVSM GHCB events.
> > + */
> > + if (snp_vmpl > 0 && !(hv_features & GHCB_HV_FT_SNP_MULTI_VMPL))
> > + sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_NOT_VMPL0);
> > +
> > + return hv_features;
> > + }
> > + return 0;
> > +}
> > +
> > int svsm_perform_msr_protocol(struct svsm_call *call)
> > {
> > u8 pending = 0;
> > diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c
> > index 753cd2094080..0ae04e333f51 100644
> > --- a/arch/x86/boot/startup/sme.c
> > +++ b/arch/x86/boot/startup/sme.c
> > @@ -533,6 +533,8 @@ void __head sme_enable(struct boot_params *bp)
> > if (snp_en ^ !!(msr & MSR_AMD64_SEV_SNP_ENABLED))
> > snp_abort();
> >
> > + sev_hv_features = snp_check_hv_features();
>
> Is this writing the sev_hv_features declared in
>
> arch/x86/boot/startup/sev-startup.c
>
> ?
>
> arch/x86/boot/startup/sev-startup.c:45:u64 sev_hv_features __ro_after_init;
> arch/x86/boot/startup/sme.c:536: sev_hv_features = snp_check_hv_features();
> arch/x86/coco/sev/core.c:980: if (!(sev_hv_features & GHCB_HV_FT_SNP_AP_CREATION))
> arch/x86/include/asm/sev-internal.h:5:extern u64 sev_hv_features;
> arch/x86/include/asm/sev.h:428:extern u64 sev_hv_features;
>
> I'm confused.
>
> If sev_hv_features is startup code, why isn't it accessed this way...?
>
> /me goes and looks forward in the set...
>
> oh my, that is coming.
>
> > +
> > /* Check if memory encryption is enabled */
> > if (feature_mask == AMD_SME_BIT) {
> > if (!(bp->hdr.xloadflags & XLF_MEM_ENCRYPTION))
> > diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
> > index fa7fdd11a45b..fc4f6f188d42 100644
> > --- a/arch/x86/coco/sev/core.c
> > +++ b/arch/x86/coco/sev/core.c
> > @@ -1264,17 +1264,6 @@ void __init sev_es_init_vc_handling(void)
> > if (!sev_es_check_cpu_features())
> > panic("SEV-ES CPU Features missing");
> >
> > - /*
> > - * SNP is supported in v2 of the GHCB spec which mandates support for HV
> > - * features.
> > - */
> > - if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
> > - sev_hv_features = get_hv_features();
> > -
> > - if (!(sev_hv_features & GHCB_HV_FT_SNP))
> > - sev_es_terminate(SEV_TERM_SET_GEN, GHCB_SNP_UNSUPPORTED);
> > - }
>
> I guess we would've terminated earlier anyway so that's probably ok...
>
> > -
> > /* Initialize per-cpu GHCB pages */
> > for_each_possible_cpu(cpu) {
> > alloc_runtime_data(cpu);
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check
2025-05-30 14:28 ` Ard Biesheuvel
@ 2025-05-30 16:08 ` Borislav Petkov
2025-05-30 16:12 ` Ard Biesheuvel
0 siblings, 1 reply; 58+ messages in thread
From: Borislav Petkov @ 2025-05-30 16:08 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Fri, May 30, 2025 at 04:28:52PM +0200, Ard Biesheuvel wrote:
> > > +u64 __head snp_check_hv_features(void)
> > > +{
> > > + /*
> > > + * SNP is supported in v2 of the GHCB spec which mandates support for HV
> > > + * features.
> > > + */
>
> ... get_hv_features() is only when SEV-SNP has already been detected.
Hmm, I see
void sev_enable(struct boot_params *bp)
{
...
/*
* Setup/preliminary detection of SNP. This will be sanity-checked
* against CPUID/MSR values later.
*/
snp = early_snp_init(bp);
...
snp_check_hv_features();
if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
This is called here without checking the snp boolean.
And without checking the version it is fragile anyway. Why do you even need to
remove the version check?
Just leave it in.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check
2025-05-30 16:08 ` Borislav Petkov
@ 2025-05-30 16:12 ` Ard Biesheuvel
2025-05-30 16:55 ` Borislav Petkov
0 siblings, 1 reply; 58+ messages in thread
From: Ard Biesheuvel @ 2025-05-30 16:12 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Fri, 30 May 2025 at 18:08, Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, May 30, 2025 at 04:28:52PM +0200, Ard Biesheuvel wrote:
> > > > +u64 __head snp_check_hv_features(void)
> > > > +{
> > > > + /*
> > > > + * SNP is supported in v2 of the GHCB spec which mandates support for HV
> > > > + * features.
> > > > + */
> >
> > ... get_hv_features() is only when SEV-SNP has already been detected.
>
> Hmm, I see
>
> void sev_enable(struct boot_params *bp)
> {
> ...
>
> /*
> * Setup/preliminary detection of SNP. This will be sanity-checked
> * against CPUID/MSR values later.
> */
> snp = early_snp_init(bp);
>
> ...
>
> snp_check_hv_features();
>
> if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
>
> This is called here without checking the snp boolean.
>
> And without checking the version it is fragile anyway. Why do you even need to
> remove the version check?
>
Because the assignment is moved out of the startup code later.
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check
2025-05-30 16:12 ` Ard Biesheuvel
@ 2025-05-30 16:55 ` Borislav Petkov
0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-05-30 16:55 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ard Biesheuvel, linux-kernel, linux-efi, x86, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Fri, May 30, 2025 at 06:12:43PM +0200, Ard Biesheuvel wrote:
> Because the assignment is moved out of the startup code later.
Yeah, we will have to check ghcb version though.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
* Re: [RFT PATCH v3 20/21] x86/boot: Revert "Reject absolute references in .head.text"
2025-05-12 19:08 ` [RFT PATCH v3 20/21] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
@ 2025-06-01 9:39 ` Borislav Petkov
0 siblings, 0 replies; 58+ messages in thread
From: Borislav Petkov @ 2025-06-01 9:39 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: linux-kernel, linux-efi, x86, Ard Biesheuvel, Ingo Molnar,
Dionna Amalie Glaze, Kevin Loughlin, Tom Lendacky
On Mon, May 12, 2025 at 09:08:55PM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
>
> This reverts commit faf0ed487415f76fe4acf7980ce360901f5e1698.
Let's make that pretty:
faf0ed487415 ("x86/boot: Reject absolute references in .head.text")
Other than that, I guess the changes look ok modulo the comments I've made.
You could send a new version once the dust settles so that Tom and I can run
them on everything.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 58+ messages in thread
end of thread, other threads:[~2025-06-01 9:39 UTC | newest]
Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-12 19:08 [RFT PATCH v3 00/21] x86: strict separation of startup code Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 01/21] x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback Ard Biesheuvel
2025-05-15 7:22 ` Ingo Molnar
2025-05-15 10:24 ` Ard Biesheuvel
2025-05-15 15:18 ` Ingo Molnar
2025-05-15 11:10 ` Borislav Petkov
2025-05-15 14:22 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 02/21] x86/sev: Use MSR protocol for remapping SVSM calling area Ard Biesheuvel
2025-05-15 16:43 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 03/21] x86/sev: Use MSR protocol only for early SVSM PVALIDATE call Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 04/21] x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL Ard Biesheuvel
2025-05-20 9:44 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 05/21] x86/sev: Move GHCB page based HV communication out of startup code Ard Biesheuvel
2025-05-20 11:38 ` Borislav Petkov
2025-05-20 11:49 ` Ard Biesheuvel
2025-05-20 13:58 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 06/21] x86/sev: Avoid global variable to store virtual address of SVSM area Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 07/21] x86/sev: Move MSR save/restore out of early page state change helper Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 08/21] x86/sev: Share implementation of MSR-based page state change Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 09/21] x86/sev: Pass SVSM calling area down to early page state change API Ard Biesheuvel
2025-05-13 13:55 ` Ard Biesheuvel
2025-05-13 13:58 ` Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 10/21] x86/sev: Use boot SVSM CA for all startup and init code Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 11/21] x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 12/21] x86/sev: Unify SEV-SNP hypervisor feature check Ard Biesheuvel
2025-05-30 11:16 ` Borislav Petkov
2025-05-30 14:28 ` Ard Biesheuvel
2025-05-30 16:08 ` Borislav Petkov
2025-05-30 16:12 ` Ard Biesheuvel
2025-05-30 16:55 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 13/21] x86/sev: Provide PIC aliases for SEV related data objects Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 14/21] x86/boot: Provide PIC aliases for 5-level paging related constants Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 15/21] x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 16/21] x86/sev: Export startup routines for later use Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 17/21] x86/boot: Create a confined code area for startup code Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 18/21] x86/boot: Move startup code out of __head section Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 19/21] x86/boot: Disallow absolute symbol references in startup code Ard Biesheuvel
2025-05-12 19:08 ` [RFT PATCH v3 20/21] x86/boot: Revert "Reject absolute references in .head.text" Ard Biesheuvel
2025-06-01 9:39 ` Borislav Petkov
2025-05-12 19:08 ` [RFT PATCH v3 21/21] x86/boot: Get rid of the .head.text section Ard Biesheuvel
2025-05-12 19:17 ` [RFT PATCH v3 00/21] x86: strict separation of startup code Borislav Petkov
2025-05-13 10:02 ` Ingo Molnar
2025-05-13 10:12 ` Borislav Petkov
2025-05-13 11:22 ` Ingo Molnar
2025-05-13 14:16 ` Borislav Petkov
2025-05-13 15:01 ` Ard Biesheuvel
2025-05-13 16:44 ` Borislav Petkov
2025-05-13 21:31 ` Ard Biesheuvel
2025-05-14 6:32 ` Ingo Molnar
2025-05-14 7:41 ` Ard Biesheuvel
2025-05-15 7:17 ` Ingo Molnar
2025-05-14 6:20 ` Ingo Molnar
2025-05-14 8:17 ` Borislav Petkov
2025-05-14 8:21 ` Borislav Petkov
2025-05-14 9:54 ` Thomas Gleixner
2025-05-14 17:21 ` Borislav Petkov
2025-05-14 17:37 ` Ard Biesheuvel
2025-05-14 18:53 ` Borislav Petkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).