From: Tom Lendacky <thomas.lendacky@amd.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Ard Biesheuvel <ardb+git@google.com>,
linux-efi@vger.kernel.org, linux-coco@lists.linux.dev,
Borislav Petkov <bp@alien8.de>,
Dionna Amalie Glaze <dionnaglaze@google.com>,
Kevin Loughlin <kevinloughlin@google.com>
Subject: Re: [PATCH] efi/libstub: Do not accept parts of memory before ExitBootServices()
Date: Wed, 2 Apr 2025 13:32:02 -0500 [thread overview]
Message-ID: <afb6dc48-bde3-ffeb-19a3-70b1e5a15cd0@amd.com> (raw)
In-Reply-To: <CAMj1kXEUARHvxQjucikoy_qWkrVnqsMGQyt7mUrBeV-f-_ZQJg@mail.gmail.com>
On 4/1/25 13:45, Ard Biesheuvel wrote:
> On Tue, 1 Apr 2025 at 18:51, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 3/26/25 04:28, Ard Biesheuvel wrote:
>>> On Tue, 25 Mar 2025 at 17:30, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>>>
>>>> On 3/25/25 09:39, Ard Biesheuvel wrote:
>>>>> On Tue, 25 Mar 2025 at 14:44, Kirill A. Shutemov
>>>>> <kirill.shutemov@linux.intel.com> wrote:
>>>>>>
>>>>>> On Tue, Mar 25, 2025 at 02:09:54PM +0100, Ard Biesheuvel wrote:
>>>>>>>> Since the problem happens before ExitBootServices(), can we allocate this
>>>>>>>> memory range with EFI API and free it back?
>>>>>>>>
>>>>>>>
>>>>>>> In principle, yes - we could allocate these misaligned chunks as
>>>>>>> EfiLoaderData, and it wouldn't even be necessary to free them, as they
>>>>>>> would become available to the OS automatically.
>>>>>>>
>>>>>>> But doing this in setup_e820() is tricky, because every page
>>>>>>> allocation modifies the EFI memory map, and we may have to restart
>>>>>>> from the beginning. And there is no guarantee that some asynchronous
>>>>>>> event in the firmware context does not attempt to allocate some pages,
>>>>>>> in a way that might result in another misaligned unaccepted region.
>>>>>>
>>>>>> Looking again at the code, setup_e820() (and therefore
>>>>>> process_unaccepted_memory()) called after efi_exit_boot_services() in
>>>>>> exit_boot(), so we can't use EFI API to allocate memory.
>>>>>>
>>>>>
>>>>> Ah yes, I misremembered that. It also means that it is fine in
>>>>> principle to take over the communication with the hypervisor.
>>>>>
>>>>> However, this is still tricky, because on SEV-SNP, accepting memory
>>>>> appears to rely on the GHCB page based communication being enabled,
>>>>> and this involves mapping it down to a single page so the C bit can be
>>>>> cleared. It would be nice if we could simply use the MSR based
>>>>> protocol for accepting memory.
>>>>
>>>> We can probably do something along this line since there is an existing
>>>> function, __page_state_change(), that performs MSR protocol PSC. If we
>>>> change the arch_accept_memory() calls in process_unaccepted_memory() to
>>>> arch_accept_memory_early() then we can differentiate between this early
>>>> alignment setup timeframe. The early function can also use
>>>> sev_get_status() instead of sev_snp_enabled().
>>>>
>>>> Let me mess around with it a bit and see what I come up with.
>>>>
>>>
>>> Cheers.
>>
>> This is what I came up with below. @Ard and @Kirill, let me know if it
>> looks ok to you and, if so, I'll submit a formal patch where we can work
>> on naming, etc.
>>
>
> Thanks for putting this together.
>
> Some questions below.
>
>>>
>>> So IIUC, it would be sufficient to check sev_get_status() against
>>> MSR_AMD64_SEV_SNP_ENABLED, and use the PSC MSR to transition each
>>> unaccepted page that is in the misaligned head or tail of the region
>>> to private.
>>>
>>> Pardon my ignorance, but does that mean that in principle,
>>> sev_enable() et al could be deferred to early startup of the kernel
>>> proper (where the other SEV startup code lives) ?
>>
>> I'm not sure if it can be. There is a bunch of code in the sev_enable()
>> path that I'm not sure can be moved. I'd have to look a lot closer to
>> determine that.
>>
>> Thanks,
>> Tom
>>
>>>
>>> We have been playing whack-a-mole with PIC codegen issues there, and
>>> so it might make sense to consolidate that logic into a single [PIC]
>>> chunk of code that is somewhat isolated from the rest of the code
>>> (like the kernel/pi code on arm64)
>>
>> diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
>> index dbba332e4a12..b115a73ca25e 100644
>> --- a/arch/x86/boot/compressed/mem.c
>> +++ b/arch/x86/boot/compressed/mem.c
>> @@ -32,19 +32,42 @@ static bool early_is_tdx_guest(void)
>> return is_tdx;
>> }
>>
>> -void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>> +static bool is_sev_snp_enabled(bool early)
>> +{
>> + return early ? early_sev_snp_enabled() : sev_snp_enabled();
>
> Why is the latter test not suitable early on? Simply because
> sev_status is not initialized yet?
Right. I was following the example of calling sev_get_status() before
sev_enable() has been invoked.
>
>> +}
>> +
>> +static void __arch_accept_memory(phys_addr_t start, phys_addr_t end, bool early)
>> {
>> /* Platform-specific memory-acceptance call goes here */
>> if (early_is_tdx_guest()) {
>> if (!tdx_accept_memory(start, end))
>> panic("TDX: Failed to accept memory\n");
>> - } else if (sev_snp_enabled()) {
>> - snp_accept_memory(start, end);
>> + } else if (is_sev_snp_enabled(early)) {
>> + /*
>> + * Calls when memory acceptance is being setup require SNP
>> + * to use the GHCB protocol because the current pagetable
>> + * mappings can't change the early GHCB page to shared.
>> + */
>> + if (early)
>
> I think it would be better to structure this slightly differently.
> I'll have a stab at this myself and get back to you.
Sounds good.
>
>> + snp_accept_memory_early(start, end);
>> + else
>> + snp_accept_memory(start, end);
>> } else {
>> error("Cannot accept memory: unknown platform\n");
>> }
>> }
>>
>> +void arch_accept_memory(phys_addr_t start, phys_addr_t end)
>> +{
>> + __arch_accept_memory(start, end, false);
>> +}
>> +
>> +void arch_accept_memory_early(phys_addr_t start, phys_addr_t end)
>> +{
>> + __arch_accept_memory(start, end, true);
>> +}
>> +
>> bool init_unaccepted_memory(void)
>> {
>> guid_t guid = LINUX_EFI_UNACCEPTED_MEM_TABLE_GUID;
>> diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
>> index bb55934c1cee..162484d662f1 100644
>> --- a/arch/x86/boot/compressed/sev.c
>> +++ b/arch/x86/boot/compressed/sev.c
>> @@ -157,6 +157,12 @@ static int svsm_perform_call_protocol(struct svsm_call *call)
>> return ret;
>> }
>>
>> +/* Used before sev_enable() has been called during unaccepted memory init */
>> +bool early_sev_snp_enabled(void)
>> +{
>> + return sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED;
>> +}
>> +
>> bool sev_snp_enabled(void)
>> {
>> return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
>> @@ -164,10 +170,7 @@ bool sev_snp_enabled(void)
>>
>> static void __page_state_change(unsigned long paddr, enum psc_op op)
>> {
>> - u64 val;
>> -
>> - if (!sev_snp_enabled())
>> - return;
>> + u64 val, msr;
>>
>
> Could you explain how the below code now knows how to decide whether
> to use the MSR protocol or the GHCB page based one?
The __page_state_change() routine only performs the MSR protocol version
of a page state change, no decision is made between the two at this stage.
>
>> /*
>> * If private -> shared then invalidate the page before requesting the
>> @@ -176,6 +179,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
>> if (op == SNP_PAGE_STATE_SHARED)
>> pvalidate_4k_page(paddr, paddr, false);
>>
>> + /* Save the current GHCB MSR value */
>> + msr = sev_es_rd_ghcb_msr();
>> +
>> /* Issue VMGEXIT to change the page state in RMP table. */
>> sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
>> VMGEXIT();
>> @@ -185,6 +191,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
>> if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
>> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
>>
>> + /* Restore the GHCB MSR value */
>> + sev_es_wr_ghcb_msr(msr);
>> +
>> /*
>> * Now that page state is changed in the RMP table, validate it so that it is
>> * consistent with the RMP entry.
>> @@ -195,11 +204,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
>>
>> void snp_set_page_private(unsigned long paddr)
>> {
>> + if (!sev_snp_enabled())
>> + return;
>> +
>> __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
>> }
>>
>> void snp_set_page_shared(unsigned long paddr)
>> {
>> + if (!sev_snp_enabled())
>> + return;
>> +
>> __page_state_change(paddr, SNP_PAGE_STATE_SHARED);
>> }
>>
>> @@ -261,6 +276,11 @@ static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
>> return pa;
>> }
>>
>> +/*
>> + * The memory acceptance support uses the boot GHCB page to perform
>> + * the required page state change operation before validating the
>> + * pages.
>> + */
>> void snp_accept_memory(phys_addr_t start, phys_addr_t end)
>> {
>> struct snp_psc_desc desc = {};
>> @@ -275,6 +295,23 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
>> pa = __snp_accept_memory(&desc, pa, end);
>> }
>>
>> +/*
>> + * The early version of memory acceptance is needed when being called
>> + * from the EFI stub driver. The pagetable manipulation to mark the
>> + * boot GHCB page as shared can't be performed at this stage, so use
>> + * the GHCB page state change MSR protocol instead.
>> + */
>> +void snp_accept_memory_early(phys_addr_t start, phys_addr_t end)
>> +{
>> + phys_addr_t pa;
>> +
>> + pa = start;
>> + while (pa < end) {
>
> Nit: please make this
>
> for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
>
> and drop the braces
Doh! Yes.
>
>> + __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
>> + pa += PAGE_SIZE;
>> + }
>> +}
>> +
>> void sev_es_shutdown_ghcb(void)
>> {
>> if (!boot_ghcb)
>> diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
>> index fc725a981b09..8a135c9c043a 100644
>> --- a/arch/x86/boot/compressed/sev.h
>> +++ b/arch/x86/boot/compressed/sev.h
>> @@ -10,13 +10,17 @@
>>
>> #ifdef CONFIG_AMD_MEM_ENCRYPT
>>
>> +bool early_sev_snp_enabled(void);
>> bool sev_snp_enabled(void);
>> void snp_accept_memory(phys_addr_t start, phys_addr_t end);
>> +void snp_accept_memory_early(phys_addr_t start, phys_addr_t end);
>>
>> #else
>>
>> +static inline bool early_sev_snp_enabled(void) { return false; }
>> static inline bool sev_snp_enabled(void) { return false; }
>> static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
>> +static inline void snp_accept_memory_early(phys_addr_t start, phys_addr_t end) { }
>>
>> #endif
>>
>> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
>> index d96d4494070d..676aa30df52e 100644
>> --- a/drivers/firmware/efi/libstub/efistub.h
>> +++ b/drivers/firmware/efi/libstub/efistub.h
>> @@ -1233,5 +1233,6 @@ efi_status_t allocate_unaccepted_bitmap(__u32 nr_desc,
>> void process_unaccepted_memory(u64 start, u64 end);
>> void accept_memory(phys_addr_t start, unsigned long size);
>> void arch_accept_memory(phys_addr_t start, phys_addr_t end);
>> +void arch_accept_memory_early(phys_addr_t start, phys_addr_t end);
>>
>> #endif
>> diff --git a/drivers/firmware/efi/libstub/unaccepted_memory.c b/drivers/firmware/efi/libstub/unaccepted_memory.c
>> index 757dbe734a47..1955eddc85f1 100644
>> --- a/drivers/firmware/efi/libstub/unaccepted_memory.c
>> +++ b/drivers/firmware/efi/libstub/unaccepted_memory.c
>> @@ -118,7 +118,7 @@ void process_unaccepted_memory(u64 start, u64 end)
>> * immediately accepted in its entirety.
>> */
>> if (end - start < 2 * unit_size) {
>> - arch_accept_memory(start, end);
>> + arch_accept_memory_early(start, end);
>> return;
>> }
>>
>> @@ -129,13 +129,13 @@ void process_unaccepted_memory(u64 start, u64 end)
>>
>> /* Immediately accept a <unit_size piece at the start: */
>> if (start & unit_mask) {
>> - arch_accept_memory(start, round_up(start, unit_size));
>> + arch_accept_memory_early(start, round_up(start, unit_size));
>> start = round_up(start, unit_size);
>> }
>>
>> /* Immediately accept a <unit_size piece at the end: */
>> if (end & unit_mask) {
>> - arch_accept_memory(round_down(end, unit_size), end);
>> + arch_accept_memory_early(round_down(end, unit_size), end);
>> end = round_down(end, unit_size);
>> }
>>
>> @@ -144,8 +144,8 @@ void process_unaccepted_memory(u64 start, u64 end)
>> * into the bitmap.
>> */
>> if (start < unaccepted_table->phys_base) {
>> - arch_accept_memory(start,
>> - min(unaccepted_table->phys_base, end));
>> + arch_accept_memory_early(start,
>> + min(unaccepted_table->phys_base, end));
>> start = unaccepted_table->phys_base;
>> }
>>
>> @@ -165,7 +165,7 @@ void process_unaccepted_memory(u64 start, u64 end)
>> unaccepted_table->phys_base;
>> phys_end = end + unaccepted_table->phys_base;
>>
>> - arch_accept_memory(phys_start, phys_end);
>> + arch_accept_memory_early(phys_start, phys_end);
>> end = bitmap_size * unit_size * BITS_PER_BYTE;
>> }
>>
>>
>>
prev parent reply other threads:[~2025-04-02 18:32 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-25 9:16 [PATCH] efi/libstub: Do not accept parts of memory before ExitBootServices() Ard Biesheuvel
2025-03-25 12:36 ` Kirill A. Shutemov
2025-03-25 12:41 ` Ard Biesheuvel
2025-03-25 12:59 ` Kirill A. Shutemov
2025-03-25 13:09 ` Ard Biesheuvel
2025-03-25 13:44 ` Kirill A. Shutemov
2025-03-25 14:39 ` Ard Biesheuvel
2025-03-25 16:30 ` Tom Lendacky
2025-03-26 9:28 ` Ard Biesheuvel
2025-04-01 15:51 ` Tom Lendacky
2025-04-01 18:45 ` Ard Biesheuvel
2025-04-02 18:32 ` Tom Lendacky [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afb6dc48-bde3-ffeb-19a3-70b1e5a15cd0@amd.com \
--to=thomas.lendacky@amd.com \
--cc=ardb+git@google.com \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=dionnaglaze@google.com \
--cc=kevinloughlin@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-coco@lists.linux.dev \
--cc=linux-efi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox