* Re: [PATCHv8 17/17] ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
[not found] ` <20240227212452.3228893-18-kirill.shutemov@linux.intel.com>
@ 2024-02-27 21:30 ` Kuppuswamy Sathyanarayanan
2024-02-27 22:08 ` Huang, Kai
1 sibling, 0 replies; 17+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2024-02-27 21:30 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86
Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter, Elena Reshetova,
Jun Nakajima, Rick Edgecombe, Tom Lendacky, Kalra, Ashish,
Sean Christopherson, Huang, Kai, Baoquan He, kexec, linux-coco,
linux-kernel
On 2/27/24 1:24 PM, Kirill A. Shutemov wrote:
> When MADT is parsed, print MULTIPROC_WAKEUP information:
>
> ACPI: MP Wakeup (version[1], mailbox[0x7fffd000], reset[0x7fffe068])
>
> This debug information will be very helpful during bring up.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Baoquan He <bhe@redhat.com>
> ---
Looks good to me.
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> drivers/acpi/tables.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index b07f7d091d13..c59a3617bca7 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -198,6 +198,20 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
> }
> break;
>
> + case ACPI_MADT_TYPE_MULTIPROC_WAKEUP:
> + {
> + struct acpi_madt_multiproc_wakeup *p =
> + (struct acpi_madt_multiproc_wakeup *)header;
> + u64 reset_vector = 0;
> +
> + if (p->version >= ACPI_MADT_MP_WAKEUP_VERSION_V1)
> + reset_vector = p->reset_vector;
> +
> + pr_debug("MP Wakeup (version[%d], mailbox[%#llx], reset[%#llx])\n",
> + p->version, p->mailbox_address, reset_vector);
> + }
> + break;
> +
> case ACPI_MADT_TYPE_CORE_PIC:
> {
> struct acpi_madt_core_pic *p = (struct acpi_madt_core_pic *)header;
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 17/17] ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
[not found] ` <20240227212452.3228893-18-kirill.shutemov@linux.intel.com>
2024-02-27 21:30 ` [PATCHv8 17/17] ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed Kuppuswamy Sathyanarayanan
@ 2024-02-27 22:08 ` Huang, Kai
2024-02-28 15:22 ` Kirill A. Shutemov
1 sibling, 1 reply; 17+ messages in thread
From: Huang, Kai @ 2024-02-27 22:08 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86
Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Baoquan He, kexec, linux-coco, linux-kernel
On 28/02/2024 10:24 am, Kirill A. Shutemov wrote:
> When MADT is parsed, print MULTIPROC_WAKEUP information:
>
> ACPI: MP Wakeup (version[1], mailbox[0x7fffd000], reset[0x7fffe068])
>
> This debug information will be very helpful during bring up.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Baoquan He <bhe@redhat.com>
> ---
> drivers/acpi/tables.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> index b07f7d091d13..c59a3617bca7 100644
> --- a/drivers/acpi/tables.c
> +++ b/drivers/acpi/tables.c
> @@ -198,6 +198,20 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
> }
> break;
>
> + case ACPI_MADT_TYPE_MULTIPROC_WAKEUP:
> + {
> + struct acpi_madt_multiproc_wakeup *p =
> + (struct acpi_madt_multiproc_wakeup *)header;
> + u64 reset_vector = 0;
> +
> + if (p->version >= ACPI_MADT_MP_WAKEUP_VERSION_V1)
> + reset_vector = p->reset_vector;
> +
> + pr_debug("MP Wakeup (version[%d], mailbox[%#llx], reset[%#llx])\n",
> + p->version, p->mailbox_address, reset_vector);
> + }
> + break;
> +
Hmm.. I hate to say, but maybe it is better to put this patch at some
early place in this series w/o mailbox version and reset_vector, and add
incremental changes where mailbox/reset_vector is introduced in this series.
The advantage is in this way someone can just backport this patch to the
old kernel if they care -- this should be part of commit f39642d0dbacd
("x86/acpi/x86/boot: Add multiprocessor wake-up support") anyway.
But I guess nobody really cares since it just prints some dmesg, and
nobody really noticed this until this series.
So, up to you, and feel free to add:
Acked-by: Kai Huang <kai.huang@intel.com>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 08/17] x86/tdx: Account shared memory
[not found] ` <20240227212452.3228893-9-kirill.shutemov@linux.intel.com>
@ 2024-02-27 23:12 ` Huang, Kai
0 siblings, 0 replies; 17+ messages in thread
From: Huang, Kai @ 2024-02-27 23:12 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86
Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Baoquan He, kexec, linux-coco, linux-kernel
On 28/02/2024 10:24 am, Kirill A. Shutemov wrote:
> The kernel will convert all shared memory back to private during kexec.
> The direct mapping page tables will provide information on which memory
> is shared.
>
> It is extremely important to convert all shared memory. If a page is
> missed, it will cause the second kernel to crash when it accesses it.
>
> Keep track of the number of shared pages. This will allow for
> cross-checking against the shared information in the direct mapping and
> reporting if the shared bit is lost.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 09/17] x86/mm: Adding callbacks to prepare encrypted memory for kexec
[not found] ` <20240227212452.3228893-10-kirill.shutemov@linux.intel.com>
@ 2024-02-27 23:16 ` Huang, Kai
0 siblings, 0 replies; 17+ messages in thread
From: Huang, Kai @ 2024-02-27 23:16 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86
Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Baoquan He, kexec, linux-coco, linux-kernel, Nikolay Borisov
On 28/02/2024 10:24 am, Kirill A. Shutemov wrote:
> AMD SEV and Intel TDX guests allocate shared buffers for performing I/O.
> This is done by allocating pages normally from the buddy allocator and
> then converting them to shared using set_memory_decrypted().
>
> On kexec, the second kernel is unaware of which memory has been
> converted in this manner. It only sees E820_TYPE_RAM. Accessing shared
> memory as private is fatal.
>
> Therefore, the memory state must be reset to its original state before
> starting the new kernel with kexec.
>
> The process of converting shared memory back to private occurs in two
> steps:
>
> - enc_kexec_stop_conversion() stops new conversions.
>
> - enc_kexec_unshare_mem() unshares all existing shared memory, reverting
> it back to private.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>x
> ---
Reviewed-by: Kai Huang <kai.huang@intel.com>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 10/17] x86/tdx: Convert shared memory back to private on kexec
[not found] ` <20240227212452.3228893-11-kirill.shutemov@linux.intel.com>
@ 2024-02-27 23:30 ` Huang, Kai
0 siblings, 0 replies; 17+ messages in thread
From: Huang, Kai @ 2024-02-27 23:30 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86
Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Baoquan He, kexec, linux-coco, linux-kernel
>
> +/* Stop new private<->shared conversions */
> +static void tdx_kexec_stop_conversion(bool crash)
> +{
> + /*
> + * Crash kernel reaches here with interrupts disabled: can't wait for
> + * conversions to finish.
> + *
> + * If race happened, just report and proceed.
> + */
> + bool wait_for_lock = !crash;
> +
> + if (!stop_memory_enc_conversion(wait_for_lock))
> + pr_warn("Failed to finish shared<->private conversions\n");
"Failed to finish" -> "Failed to stop"? stop_memory_enc_conversion()
doesn't actually finish any conversion.
Other than that:
Reviewed-by: Kai Huang <kai.huang@intel.com>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 06/17] x86/mm: Make x86_platform.guest.enc_status_change_*() return errno
[not found] ` <20240227212452.3228893-7-kirill.shutemov@linux.intel.com>
@ 2024-02-27 23:33 ` Huang, Kai
0 siblings, 0 replies; 17+ messages in thread
From: Huang, Kai @ 2024-02-27 23:33 UTC (permalink / raw)
To: Kirill A. Shutemov, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
Dave Hansen, x86
Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Baoquan He, kexec, linux-coco, linux-kernel, Dave Hansen
On 28/02/2024 10:24 am, Kirill A. Shutemov wrote:
> TDX is going to have more than one reason to fail
> enc_status_change_prepare().
>
> Change the callback to return errno instead of assuming -EIO;
> enc_status_change_finish() changed too to keep the interface symmetric.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Dave Hansen <dave.hansen@intel.com>
> ---
Acked-by: Kai Huang <kai.huang@intel.com>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 17/17] ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
2024-02-27 22:08 ` Huang, Kai
@ 2024-02-28 15:22 ` Kirill A. Shutemov
2024-02-28 21:19 ` Huang, Kai
0 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2024-02-28 15:22 UTC (permalink / raw)
To: Huang, Kai
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Baoquan He, kexec, linux-coco, linux-kernel
On Wed, Feb 28, 2024 at 11:08:38AM +1300, Huang, Kai wrote:
>
>
> On 28/02/2024 10:24 am, Kirill A. Shutemov wrote:
> > When MADT is parsed, print MULTIPROC_WAKEUP information:
> >
> > ACPI: MP Wakeup (version[1], mailbox[0x7fffd000], reset[0x7fffe068])
> >
> > This debug information will be very helpful during bring up.
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reviewed-by: Baoquan He <bhe@redhat.com>
> > ---
> > drivers/acpi/tables.c | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
> > index b07f7d091d13..c59a3617bca7 100644
> > --- a/drivers/acpi/tables.c
> > +++ b/drivers/acpi/tables.c
> > @@ -198,6 +198,20 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
> > }
> > break;
> > + case ACPI_MADT_TYPE_MULTIPROC_WAKEUP:
> > + {
> > + struct acpi_madt_multiproc_wakeup *p =
> > + (struct acpi_madt_multiproc_wakeup *)header;
> > + u64 reset_vector = 0;
> > +
> > + if (p->version >= ACPI_MADT_MP_WAKEUP_VERSION_V1)
> > + reset_vector = p->reset_vector;
> > +
> > + pr_debug("MP Wakeup (version[%d], mailbox[%#llx], reset[%#llx])\n",
> > + p->version, p->mailbox_address, reset_vector);
> > + }
> > + break;
> > +
>
> Hmm.. I hate to say, but maybe it is better to put this patch at some early
> place in this series w/o mailbox version and reset_vector, and add
> incremental changes where mailbox/reset_vector is introduced in this series.
>
> The advantage is in this way someone can just backport this patch to the old
> kernel if they care -- this should be part of commit f39642d0dbacd
> ("x86/acpi/x86/boot: Add multiprocessor wake-up support") anyway.
It is not subject for backporting. It is just a cosmetics fix (or debug
facility). Any new MADT type would generate a warning. Nothing wrong with
it.
--
Kiryl Shutsemau / Kirill A. Shutemov
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 17/17] ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed
2024-02-28 15:22 ` Kirill A. Shutemov
@ 2024-02-28 21:19 ` Huang, Kai
0 siblings, 0 replies; 17+ messages in thread
From: Huang, Kai @ 2024-02-28 21:19 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Baoquan He, kexec, linux-coco, linux-kernel
On 29/02/2024 4:22 am, Kirill A. Shutemov wrote:
> On Wed, Feb 28, 2024 at 11:08:38AM +1300, Huang, Kai wrote:
>>
>>
>> On 28/02/2024 10:24 am, Kirill A. Shutemov wrote:
>>> When MADT is parsed, print MULTIPROC_WAKEUP information:
>>>
>>> ACPI: MP Wakeup (version[1], mailbox[0x7fffd000], reset[0x7fffe068])
>>>
>>> This debug information will be very helpful during bring up.
>>>
>>> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>>> Reviewed-by: Baoquan He <bhe@redhat.com>
>>> ---
>>> drivers/acpi/tables.c | 14 ++++++++++++++
>>> 1 file changed, 14 insertions(+)
>>>
>>> diff --git a/drivers/acpi/tables.c b/drivers/acpi/tables.c
>>> index b07f7d091d13..c59a3617bca7 100644
>>> --- a/drivers/acpi/tables.c
>>> +++ b/drivers/acpi/tables.c
>>> @@ -198,6 +198,20 @@ void acpi_table_print_madt_entry(struct acpi_subtable_header *header)
>>> }
>>> break;
>>> + case ACPI_MADT_TYPE_MULTIPROC_WAKEUP:
>>> + {
>>> + struct acpi_madt_multiproc_wakeup *p =
>>> + (struct acpi_madt_multiproc_wakeup *)header;
>>> + u64 reset_vector = 0;
>>> +
>>> + if (p->version >= ACPI_MADT_MP_WAKEUP_VERSION_V1)
>>> + reset_vector = p->reset_vector;
>>> +
>>> + pr_debug("MP Wakeup (version[%d], mailbox[%#llx], reset[%#llx])\n",
>>> + p->version, p->mailbox_address, reset_vector);
>>> + }
>>> + break;
>>> +
>>
>> Hmm.. I hate to say, but maybe it is better to put this patch at some early
>> place in this series w/o mailbox version and reset_vector, and add
>> incremental changes where mailbox/reset_vector is introduced in this series.
>>
>> The advantage is in this way someone can just backport this patch to the old
>> kernel if they care -- this should be part of commit f39642d0dbacd
>> ("x86/acpi/x86/boot: Add multiprocessor wake-up support") anyway.
>
> It is not subject for backporting. It is just a cosmetics fix (or debug
> facility). Any new MADT type would generate a warning. Nothing wrong with
> it.
>
OK fine to me. Thanks.
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 00/17, CORRECTED] x86/tdx: Add kexec support
[not found] <20240227212452.3228893-1-kirill.shutemov@linux.intel.com>
` (4 preceding siblings ...)
[not found] ` <20240227212452.3228893-7-kirill.shutemov@linux.intel.com>
@ 2024-03-06 15:02 ` Kirill A. Shutemov
2024-03-07 6:57 ` Tao Liu
2024-03-18 7:02 ` [PATCH v2 0/3] x86/snp: " Ashish Kalra
6 siblings, 1 reply; 17+ messages in thread
From: Kirill A. Shutemov @ 2024-03-06 15:02 UTC (permalink / raw)
To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86
Cc: Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Huang, Kai, Baoquan He, kexec, linux-coco, linux-kernel
On Tue, Feb 27, 2024 at 11:24:35PM +0200, Kirill A. Shutemov wrote:
> The patchset adds bits and pieces to get kexec (and crashkernel) work on
> TDX guest.
>
> The last patch implements CPU offlining according to the approved ACPI
> spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> 2nd kernel with single CPU.
>
> Please review. I would be glad for any feedback.
Thomas, Ingo, Borislav, Dave,
Any feedback?
Is there anything else I can do to get the patchset moving?
--
Kiryl Shutsemau / Kirill A. Shutemov
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCHv8 00/17, CORRECTED] x86/tdx: Add kexec support
2024-03-06 15:02 ` [PATCHv8 00/17, CORRECTED] x86/tdx: Add kexec support Kirill A. Shutemov
@ 2024-03-07 6:57 ` Tao Liu
0 siblings, 0 replies; 17+ messages in thread
From: Tao Liu @ 2024-03-07 6:57 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
Rafael J. Wysocki, Peter Zijlstra, Adrian Hunter,
Kuppuswamy Sathyanarayanan, Elena Reshetova, Jun Nakajima,
Rick Edgecombe, Tom Lendacky, Kalra, Ashish, Sean Christopherson,
Huang, Kai, Baoquan He, kexec, linux-coco, linux-kernel
Hi Kirill,
On Wed, Mar 06, 2024 at 05:02:45PM +0200, Kirill A. Shutemov wrote:
> On Tue, Feb 27, 2024 at 11:24:35PM +0200, Kirill A. Shutemov wrote:
> > The patchset adds bits and pieces to get kexec (and crashkernel) work on
> > TDX guest.
> >
> > The last patch implements CPU offlining according to the approved ACPI
> > spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> > kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> > 2nd kernel with single CPU.
> >
> > Please review. I would be glad for any feedback.
>
> Thomas, Ingo, Borislav, Dave,
>
> Any feedback?
>
> Is there anything else I can do to get the patchset moving?
>
I tested the patchset with Linux 6.8-rc6, no problem found. With the
patchset, a vmcore can be generated successfully in tdx VM and can be
analyzed by crash utility.
Tested-by: Tao Liu <ltao@redhat.com>
Thanks,
Tao Liu
> --
> Kiryl Shutsemau / Kirill A. Shutemov
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 0/3] x86/snp: Add kexec support
[not found] <20240227212452.3228893-1-kirill.shutemov@linux.intel.com>
` (5 preceding siblings ...)
2024-03-06 15:02 ` [PATCHv8 00/17, CORRECTED] x86/tdx: Add kexec support Kirill A. Shutemov
@ 2024-03-18 7:02 ` Ashish Kalra
2024-03-18 7:02 ` [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec Ashish Kalra
` (2 more replies)
6 siblings, 3 replies; 17+ messages in thread
From: Ashish Kalra @ 2024-03-18 7:02 UTC (permalink / raw)
To: tglx, mingo, dave.hansen
Cc: rafael, peterz, adrian.hunter, sathyanarayanan.kuppuswamy,
elena.reshetova, jun.nakajima, rick.p.edgecombe, thomas.lendacky,
seanjc, michael.roth, kai.huang, bhe, kexec, linux-coco,
linux-kernel, kirill.shutemov, bdas, vkuznets, dionnaglaze,
anisinha, jroedel
From: Ashish Kalra <ashish.kalra@amd.com>
The patchset adds bits and pieces to get kexec (and crashkernel) work on
SNP guest.
v2:
- address zeroing of unaccepted memory table mappings at all page table levels
adding phys_pte_init(), phys_pud_init() and phys_p4d_init().
- include skip efi_arch_mem_reserve() in case of kexec as part of this
patch set.
- rename last_address_shd_kexec to a more appropriate
kexec_last_address_to_make_private.
- remove duplicate code shared with TDX and use common interfaces
defined for SNP and TDX for kexec/kdump.
- remove set_pte_enc() dependency on pg_level_to_pfn() and make the
function simpler.
- rename unshare_pte() to make_pte_private().
- clarify and make the comment for using kexec_last_address_to_make_private
more understandable.
- general cleanup.
Ashish Kalra (3):
efi/x86: skip efi_arch_mem_reserve() in case of kexec.
x86/mm: Do not zap page table entries mapping unaccepted memory table
during kdump.
x86/snp: Convert shared memory back to private on kexec
arch/x86/include/asm/probe_roms.h | 1 +
arch/x86/include/asm/sev.h | 4 +
arch/x86/kernel/probe_roms.c | 16 +++
arch/x86/kernel/sev.c | 169 ++++++++++++++++++++++++++++++
arch/x86/mm/init_64.c | 16 ++-
arch/x86/mm/mem_encrypt_amd.c | 3 +
arch/x86/platform/efi/quirks.c | 10 ++
7 files changed, 215 insertions(+), 4 deletions(-)
--
2.34.1
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec.
2024-03-18 7:02 ` [PATCH v2 0/3] x86/snp: " Ashish Kalra
@ 2024-03-18 7:02 ` Ashish Kalra
2024-03-19 4:00 ` Dave Young
2024-03-18 7:02 ` [PATCH v2 2/3] x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump Ashish Kalra
2024-03-18 7:02 ` [PATCH v2 3/3] x86/snp: Convert shared memory back to private on kexec Ashish Kalra
2 siblings, 1 reply; 17+ messages in thread
From: Ashish Kalra @ 2024-03-18 7:02 UTC (permalink / raw)
To: tglx, mingo, dave.hansen
Cc: rafael, peterz, adrian.hunter, sathyanarayanan.kuppuswamy,
elena.reshetova, jun.nakajima, rick.p.edgecombe, thomas.lendacky,
seanjc, michael.roth, kai.huang, bhe, kexec, linux-coco,
linux-kernel, kirill.shutemov, bdas, vkuznets, dionnaglaze,
anisinha, jroedel
From: Ashish Kalra <ashish.kalra@amd.com>
For kexec use case, need to use and stick to the EFI memmap passed
from the first kernel via boot-params/setup data, hence,
skip efi_arch_mem_reserve() during kexec.
Additionally during SNP guest kexec testing discovered that EFI memmap
is corrupted during chained kexec. kexec_enter_virtual_mode() during
late init will remap the efi_memmap physical pages allocated in
efi_arch_mem_reserve() via memboot & then subsequently cause random
EFI memmap corruption once memblock is freed/teared-down.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/platform/efi/quirks.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..d4562d074371 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -258,6 +258,16 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
int num_entries;
void *new;
+ /*
+ * For kexec use case, we need to use the EFI memmap passed from the first
+ * kernel via setup data, so we need to skip this.
+ * Additionally kexec_enter_virtual_mode() during late init will remap
+ * the efi_memmap physical pages allocated here via memboot & then
+ * subsequently cause random EFI memmap corruption once memblock is freed.
+ */
+ if (efi_setup)
+ return;
+
if (efi_mem_desc_lookup(addr, &md) ||
md.type != EFI_BOOT_SERVICES_DATA) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
--
2.34.1
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 2/3] x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump.
2024-03-18 7:02 ` [PATCH v2 0/3] x86/snp: " Ashish Kalra
2024-03-18 7:02 ` [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec Ashish Kalra
@ 2024-03-18 7:02 ` Ashish Kalra
2024-03-21 14:58 ` Kirill A. Shutemov
2024-03-18 7:02 ` [PATCH v2 3/3] x86/snp: Convert shared memory back to private on kexec Ashish Kalra
2 siblings, 1 reply; 17+ messages in thread
From: Ashish Kalra @ 2024-03-18 7:02 UTC (permalink / raw)
To: tglx, mingo, dave.hansen
Cc: rafael, peterz, adrian.hunter, sathyanarayanan.kuppuswamy,
elena.reshetova, jun.nakajima, rick.p.edgecombe, thomas.lendacky,
seanjc, michael.roth, kai.huang, bhe, kexec, linux-coco,
linux-kernel, kirill.shutemov, bdas, vkuznets, dionnaglaze,
anisinha, jroedel
From: Ashish Kalra <ashish.kalra@amd.com>
During crashkernel boot only pre-allocated crash memory is presented as
E820_TYPE_RAM. This can cause page table entries mapping unaccepted memory
table to be zapped during phys_pte_init(), phys_pmd_init(), phys_pud_init()
and phys_p4d_init() as SNP/TDX guest use E820_TYPE_ACPI to store the
unaccepted memory table and pass it between the kernels on
kexec/kdump.
E820_TYPE_ACPI covers not only ACPI data, but also EFI tables and might
be required by kernel to function properly.
The problem was discovered during debugging kdump for SNP guest. The
unaccepted memory table stored with E820_TYPE_ACPI and passed between
the kernels on kdump was getting zapped as the PMD entry mapping this
is above the E820_TYPE_RAM range for the reserved crashkernel memory.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/mm/init_64.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index a0dffaca6d2b..cc294a9e9fd7 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -469,7 +469,9 @@ phys_pte_init(pte_t *pte_page, unsigned long paddr, unsigned long paddr_end,
!e820__mapped_any(paddr & PAGE_MASK, paddr_next,
E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PAGE_MASK, paddr_next,
- E820_TYPE_RESERVED_KERN))
+ E820_TYPE_RESERVED_KERN) &&
+ !e820__mapped_any(paddr & PAGE_MASK, paddr_next,
+ E820_TYPE_ACPI))
set_pte_init(pte, __pte(0), init);
continue;
}
@@ -524,7 +526,9 @@ phys_pmd_init(pmd_t *pmd_page, unsigned long paddr, unsigned long paddr_end,
!e820__mapped_any(paddr & PMD_MASK, paddr_next,
E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PMD_MASK, paddr_next,
- E820_TYPE_RESERVED_KERN))
+ E820_TYPE_RESERVED_KERN) &&
+ !e820__mapped_any(paddr & PMD_MASK, paddr_next,
+ E820_TYPE_ACPI))
set_pmd_init(pmd, __pmd(0), init);
continue;
}
@@ -611,7 +615,9 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, unsigned long paddr_end,
!e820__mapped_any(paddr & PUD_MASK, paddr_next,
E820_TYPE_RAM) &&
!e820__mapped_any(paddr & PUD_MASK, paddr_next,
- E820_TYPE_RESERVED_KERN))
+ E820_TYPE_RESERVED_KERN) &&
+ !e820__mapped_any(paddr & PUD_MASK, paddr_next,
+ E820_TYPE_ACPI))
set_pud_init(pud, __pud(0), init);
continue;
}
@@ -698,7 +704,9 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
!e820__mapped_any(paddr & P4D_MASK, paddr_next,
E820_TYPE_RAM) &&
!e820__mapped_any(paddr & P4D_MASK, paddr_next,
- E820_TYPE_RESERVED_KERN))
+ E820_TYPE_RESERVED_KERN) &&
+ !e820__mapped_any(paddr & P4D_MASK, paddr_next,
+ E820_TYPE_ACPI))
set_p4d_init(p4d, __p4d(0), init);
continue;
}
--
2.34.1
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH v2 3/3] x86/snp: Convert shared memory back to private on kexec
2024-03-18 7:02 ` [PATCH v2 0/3] x86/snp: " Ashish Kalra
2024-03-18 7:02 ` [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec Ashish Kalra
2024-03-18 7:02 ` [PATCH v2 2/3] x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump Ashish Kalra
@ 2024-03-18 7:02 ` Ashish Kalra
2 siblings, 0 replies; 17+ messages in thread
From: Ashish Kalra @ 2024-03-18 7:02 UTC (permalink / raw)
To: tglx, mingo, dave.hansen
Cc: rafael, peterz, adrian.hunter, sathyanarayanan.kuppuswamy,
elena.reshetova, jun.nakajima, rick.p.edgecombe, thomas.lendacky,
seanjc, michael.roth, kai.huang, bhe, kexec, linux-coco,
linux-kernel, kirill.shutemov, bdas, vkuznets, dionnaglaze,
anisinha, jroedel
From: Ashish Kalra <ashish.kalra@amd.com>
SNP guests allocate shared buffers to perform I/O. It is done by
allocating pages normally from the buddy allocator and converting them
to shared with set_memory_decrypted().
The second kernel has no idea what memory is converted this way. It only
sees E820_TYPE_RAM.
Accessing shared memory via private mapping will cause unrecoverable RMP
page-faults.
On kexec walk direct mapping and convert all shared memory back to
private. It makes all RAM private again and second kernel may use it
normally. Additionally for SNP guests convert all bss decrypted section
pages back to private and switch back ROM regions to shared so that
their revalidation does not fail during kexec kernel boot.
The conversion occurs in two steps: stopping new conversions and
unsharing all memory. In the case of normal kexec, the stopping of
conversions takes place while scheduling is still functioning. This
allows for waiting until any ongoing conversions are finished. The
second step is carried out when all CPUs except one are inactive and
interrupts are disabled. This prevents any conflicts with code that may
access shared memory.
Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
---
arch/x86/include/asm/probe_roms.h | 1 +
arch/x86/include/asm/sev.h | 4 +
arch/x86/kernel/probe_roms.c | 16 +++
arch/x86/kernel/sev.c | 169 ++++++++++++++++++++++++++++++
arch/x86/mm/mem_encrypt_amd.c | 3 +
5 files changed, 193 insertions(+)
diff --git a/arch/x86/include/asm/probe_roms.h b/arch/x86/include/asm/probe_roms.h
index 1c7f3815bbd6..d50b67dbff33 100644
--- a/arch/x86/include/asm/probe_roms.h
+++ b/arch/x86/include/asm/probe_roms.h
@@ -6,4 +6,5 @@ struct pci_dev;
extern void __iomem *pci_map_biosrom(struct pci_dev *pdev);
extern void pci_unmap_biosrom(void __iomem *rom);
extern size_t pci_biosrom_size(struct pci_dev *pdev);
+extern void snp_kexec_unprep_rom_memory(void);
#endif
diff --git a/arch/x86/include/asm/sev.h b/arch/x86/include/asm/sev.h
index d7b27cb34c2b..867518b9bcad 100644
--- a/arch/x86/include/asm/sev.h
+++ b/arch/x86/include/asm/sev.h
@@ -229,6 +229,8 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end);
u64 snp_get_unsupported_features(u64 status);
u64 sev_get_status(void);
void kdump_sev_callback(void);
+void snp_kexec_unshare_mem(void);
+void snp_kexec_stop_conversion(bool crash);
#else
static inline void sev_es_ist_enter(struct pt_regs *regs) { }
static inline void sev_es_ist_exit(void) { }
@@ -258,6 +260,8 @@ static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
static inline u64 snp_get_unsupported_features(u64 status) { return 0; }
static inline u64 sev_get_status(void) { return 0; }
static inline void kdump_sev_callback(void) { }
+void snp_kexec_unshare_mem(void) {}
+static void snp_kexec_stop_conversion(bool crash) {}
#endif
#ifdef CONFIG_KVM_AMD_SEV
diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c
index 319fef37d9dc..457f1e5c8d00 100644
--- a/arch/x86/kernel/probe_roms.c
+++ b/arch/x86/kernel/probe_roms.c
@@ -177,6 +177,22 @@ size_t pci_biosrom_size(struct pci_dev *pdev)
}
EXPORT_SYMBOL(pci_biosrom_size);
+void snp_kexec_unprep_rom_memory(void)
+{
+ unsigned long vaddr, npages, sz;
+
+ /*
+ * Switch back ROM regions to shared so that their validation
+ * does not fail during kexec kernel boot.
+ */
+ vaddr = (unsigned long)__va(video_rom_resource.start);
+ sz = (system_rom_resource.end + 1) - video_rom_resource.start;
+ npages = PAGE_ALIGN(sz) >> PAGE_SHIFT;
+
+ snp_set_memory_shared(vaddr, npages);
+}
+EXPORT_SYMBOL(snp_kexec_unprep_rom_memory);
+
#define ROMSIGNATURE 0xaa55
static int __init romsignature(const unsigned char *rom)
diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
index 1ef7ae806a01..7443a9620a31 100644
--- a/arch/x86/kernel/sev.c
+++ b/arch/x86/kernel/sev.c
@@ -40,6 +40,7 @@
#include <asm/apic.h>
#include <asm/cpuid.h>
#include <asm/cmdline.h>
+#include <asm/probe_roms.h>
#define DR7_RESET_VALUE 0x400
@@ -71,6 +72,9 @@ static struct ghcb *boot_ghcb __section(".data");
/* Bitmap of SEV features supported by the hypervisor */
static u64 sev_hv_features __ro_after_init;
+/* Last address to be switched to private during kexec */
+static unsigned long kexec_last_addr_to_make_private;
+
/* #VC handler runtime per-CPU data */
struct sev_es_runtime_data {
struct ghcb ghcb_page;
@@ -906,6 +910,171 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end)
set_pages_state(vaddr, npages, SNP_PAGE_STATE_PRIVATE);
}
+static bool set_pte_enc(pte_t *kpte, int level, void *va)
+{
+ pte_t new_pte;
+
+ if (pte_none(*kpte))
+ return false;
+
+ /*
+ * Change the physical page attribute from C=0 to C=1. Flush the
+ * caches to ensure that data gets accessed with the correct C-bit.
+ */
+ if (pte_present(*kpte))
+ clflush_cache_range(va, page_level_size(level));
+
+ new_pte = __pte(cc_mkenc(pte_val(*kpte)));
+ set_pte_atomic(kpte, new_pte);
+
+ return true;
+}
+
+static bool make_pte_private(pte_t *pte, unsigned long addr, int pages, int level)
+{
+ struct sev_es_runtime_data *data;
+ struct ghcb *ghcb;
+
+ data = this_cpu_read(runtime_data);
+ ghcb = &data->ghcb_page;
+
+ /* Check for GHCB for being part of a PMD range. */
+ if ((unsigned long)ghcb >= addr &&
+ (unsigned long)ghcb <= (addr + (pages * PAGE_SIZE))) {
+ /*
+ * Ensure that the current cpu's GHCB is made private
+ * at the end of unshared loop so that we continue to use the
+ * optimized GHCB protocol and not force the switch to
+ * MSR protocol till the very end.
+ */
+ pr_debug("setting boot_ghcb to NULL for this cpu ghcb\n");
+ kexec_last_addr_to_make_private = addr;
+ return true;
+ }
+
+ if (!set_pte_enc(pte, level, (void *)addr))
+ return false;
+
+ snp_set_memory_private(addr, pages);
+
+ return true;
+}
+
+static void unshare_all_memory(void)
+{
+ unsigned long addr, end;
+
+ /*
+ * Walk direct mapping and convert all shared memory back to private,
+ */
+
+ addr = PAGE_OFFSET;
+ end = PAGE_OFFSET + get_max_mapped();
+
+ while (addr < end) {
+ unsigned long size;
+ unsigned int level;
+ pte_t *pte;
+
+ pte = lookup_address(addr, &level);
+ size = page_level_size(level);
+
+ /*
+ * pte_none() check is required to skip physical memory holes in direct mapped.
+ */
+ if (pte && pte_decrypted(*pte) && !pte_none(*pte)) {
+ int pages = size / PAGE_SIZE;
+
+ if (!make_pte_private(pte, addr, pages, level)) {
+ pr_err("Failed to unshare range %#lx-%#lx\n",
+ addr, addr + size);
+ }
+
+ }
+
+ addr += size;
+ }
+ __flush_tlb_all();
+
+}
+
+static void unshare_all_bss_decrypted_memory(void)
+{
+ unsigned long vaddr, vaddr_end;
+ unsigned long size;
+ unsigned int level;
+ unsigned int npages;
+ pte_t *pte;
+
+ vaddr = (unsigned long)__start_bss_decrypted;
+ vaddr_end = (unsigned long)__start_bss_decrypted_unused;
+ npages = (vaddr_end - vaddr) >> PAGE_SHIFT;
+ for (; vaddr < vaddr_end; vaddr += PAGE_SIZE) {
+ pte = lookup_address(vaddr, &level);
+ if (!pte || !pte_decrypted(*pte) || pte_none(*pte))
+ continue;
+
+ size = page_level_size(level);
+ set_pte_enc(pte, level, (void *)vaddr);
+ }
+ vaddr = (unsigned long)__start_bss_decrypted;
+ snp_set_memory_private(vaddr, npages);
+}
+
+/* Stop new private<->shared conversions */
+void snp_kexec_stop_conversion(bool crash)
+{
+ /*
+ * Crash kernel reaches here with interrupts disabled: can't wait for
+ * conversions to finish.
+ *
+ * If race happened, just report and proceed.
+ */
+ bool wait_for_lock = !crash;
+
+ if (!stop_memory_enc_conversion(wait_for_lock))
+ pr_warn("Failed to finish shared<->private conversions\n");
+}
+
+void snp_kexec_unshare_mem(void)
+{
+ if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
+ return;
+
+ /*
+ * Switch back any specific memory regions such as option
+ * ROM regions back to shared so that (re)validation does
+ * not fail when kexec kernel boots.
+ */
+ snp_kexec_unprep_rom_memory();
+
+ unshare_all_memory();
+
+ unshare_all_bss_decrypted_memory();
+
+ if (kexec_last_addr_to_make_private) {
+ unsigned long size;
+ unsigned int level;
+ pte_t *pte;
+
+ /*
+ * Switch to using the MSR protocol to change this cpu's
+ * GHCB to private.
+ * All the per-cpu GHCBs have been switched back to private,
+ * so can't do any more GHCB calls to the hypervisor beyond
+ * this point till the kexec kernel starts running.
+ */
+ boot_ghcb = NULL;
+ sev_cfg.ghcbs_initialized = false;
+
+ pr_debug("boot ghcb 0x%lx\n", kexec_last_addr_to_make_private);
+ pte = lookup_address(kexec_last_addr_to_make_private, &level);
+ size = page_level_size(level);
+ set_pte_enc(pte, level, (void *)kexec_last_addr_to_make_private);
+ snp_set_memory_private(kexec_last_addr_to_make_private, (size / PAGE_SIZE));
+ }
+}
+
static int snp_set_vmsa(void *va, bool vmsa)
{
u64 attrs;
diff --git a/arch/x86/mm/mem_encrypt_amd.c b/arch/x86/mm/mem_encrypt_amd.c
index d314e577836d..dab2dc2207fb 100644
--- a/arch/x86/mm/mem_encrypt_amd.c
+++ b/arch/x86/mm/mem_encrypt_amd.c
@@ -468,6 +468,9 @@ void __init sme_early_init(void)
x86_platform.guest.enc_tlb_flush_required = amd_enc_tlb_flush_required;
x86_platform.guest.enc_cache_flush_required = amd_enc_cache_flush_required;
+ x86_platform.guest.enc_kexec_stop_conversion = snp_kexec_stop_conversion;
+ x86_platform.guest.enc_kexec_unshare_mem = snp_kexec_unshare_mem;
+
/*
* AMD-SEV-ES intercepts the RDMSR to read the X2APIC ID in the
* parallel bringup low level code. That raises #VC which cannot be
--
2.34.1
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec.
2024-03-18 7:02 ` [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec Ashish Kalra
@ 2024-03-19 4:00 ` Dave Young
2024-03-24 22:32 ` Kalra, Ashish
0 siblings, 1 reply; 17+ messages in thread
From: Dave Young @ 2024-03-19 4:00 UTC (permalink / raw)
To: Ashish Kalra
Cc: tglx, mingo, dave.hansen, rafael, peterz, adrian.hunter,
sathyanarayanan.kuppuswamy, elena.reshetova, jun.nakajima,
rick.p.edgecombe, thomas.lendacky, seanjc, michael.roth,
kai.huang, bhe, kexec, linux-coco, linux-kernel, kirill.shutemov,
bdas, vkuznets, dionnaglaze, anisinha, jroedel, Ard Biesheuvel
Hi,
Added Ard in cc.
On 03/18/24 at 07:02am, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> For kexec use case, need to use and stick to the EFI memmap passed
> from the first kernel via boot-params/setup data, hence,
> skip efi_arch_mem_reserve() during kexec.
>
> Additionally during SNP guest kexec testing discovered that EFI memmap
> is corrupted during chained kexec. kexec_enter_virtual_mode() during
> late init will remap the efi_memmap physical pages allocated in
> efi_arch_mem_reserve() via memboot & then subsequently cause random
> EFI memmap corruption once memblock is freed/teared-down.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
> arch/x86/platform/efi/quirks.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index f0cc00032751..d4562d074371 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -258,6 +258,16 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
> int num_entries;
> void *new;
>
> + /*
> + * For kexec use case, we need to use the EFI memmap passed from the first
> + * kernel via setup data, so we need to skip this.
> + * Additionally kexec_enter_virtual_mode() during late init will remap
> + * the efi_memmap physical pages allocated here via memboot & then
> + * subsequently cause random EFI memmap corruption once memblock is freed.
Can you elaborate a bit about the corruption, is it reproducible without
SNP?
> + */
> + if (efi_setup)
> + return;
> +
How about checking the md attribute instead of checking the efi_setup,
personally I feel it a bit better, something like below:
diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
index f0cc00032751..699332b075bb 100644
--- a/arch/x86/platform/efi/quirks.c
+++ b/arch/x86/platform/efi/quirks.c
@@ -255,15 +255,24 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
struct efi_memory_map_data data = { 0 };
struct efi_mem_range mr;
efi_memory_desc_t md;
- int num_entries;
+ int num_entries, ret;
void *new;
- if (efi_mem_desc_lookup(addr, &md) ||
- md.type != EFI_BOOT_SERVICES_DATA) {
+ ret = efi_mem_desc_lookup(addr, &md);
+ if (ret) {
pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
return;
}
+ if (md.type != EFI_BOOT_SERVICES_DATA) {
+ pr_err("Skil reserving non EFI Boot Service Data memory for %pa\n", &addr);
+ return;
+ }
+
+ /* Kexec copied the efi memmap from the 1st kernel, thus skip the case. */
+ if (md.attribute & EFI_MEMORY_RUNTIME)
+ return;
+
if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
return;
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH v2 2/3] x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump.
2024-03-18 7:02 ` [PATCH v2 2/3] x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump Ashish Kalra
@ 2024-03-21 14:58 ` Kirill A. Shutemov
0 siblings, 0 replies; 17+ messages in thread
From: Kirill A. Shutemov @ 2024-03-21 14:58 UTC (permalink / raw)
To: Ashish Kalra
Cc: tglx, mingo, dave.hansen, rafael, peterz, adrian.hunter,
sathyanarayanan.kuppuswamy, elena.reshetova, jun.nakajima,
rick.p.edgecombe, thomas.lendacky, seanjc, michael.roth,
kai.huang, bhe, kexec, linux-coco, linux-kernel, bdas, vkuznets,
dionnaglaze, anisinha, jroedel
On Mon, Mar 18, 2024 at 07:02:45AM +0000, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
>
> During crashkernel boot only pre-allocated crash memory is presented as
> E820_TYPE_RAM. This can cause page table entries mapping unaccepted memory
> table to be zapped during phys_pte_init(), phys_pmd_init(), phys_pud_init()
> and phys_p4d_init() as SNP/TDX guest use E820_TYPE_ACPI to store the
> unaccepted memory table and pass it between the kernels on
> kexec/kdump.
>
> E820_TYPE_ACPI covers not only ACPI data, but also EFI tables and might
> be required by kernel to function properly.
>
> The problem was discovered during debugging kdump for SNP guest. The
> unaccepted memory table stored with E820_TYPE_ACPI and passed between
> the kernels on kdump was getting zapped as the PMD entry mapping this
> is above the E820_TYPE_RAM range for the reserved crashkernel memory.
>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
I guess it would be better if I take this patch into my kexec patchset. I
guess I just got lucky not to step onto the issue.
--
Kiryl Shutsemau / Kirill A. Shutemov
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec.
2024-03-19 4:00 ` Dave Young
@ 2024-03-24 22:32 ` Kalra, Ashish
0 siblings, 0 replies; 17+ messages in thread
From: Kalra, Ashish @ 2024-03-24 22:32 UTC (permalink / raw)
To: Dave Young
Cc: tglx, mingo, dave.hansen, rafael, peterz, adrian.hunter,
sathyanarayanan.kuppuswamy, elena.reshetova, jun.nakajima,
rick.p.edgecombe, thomas.lendacky, seanjc, michael.roth,
kai.huang, bhe, kexec, linux-coco, linux-kernel, kirill.shutemov,
bdas, vkuznets, dionnaglaze, anisinha, jroedel, Ard Biesheuvel
Hello,
On 3/18/2024 11:00 PM, Dave Young wrote:
> Hi,
>
> Added Ard in cc.
>
> On 03/18/24 at 07:02am, Ashish Kalra wrote:
>> From: Ashish Kalra <ashish.kalra@amd.com>
>>
>> For kexec use case, need to use and stick to the EFI memmap passed
>> from the first kernel via boot-params/setup data, hence,
>> skip efi_arch_mem_reserve() during kexec.
>>
>> Additionally during SNP guest kexec testing discovered that EFI memmap
>> is corrupted during chained kexec. kexec_enter_virtual_mode() during
>> late init will remap the efi_memmap physical pages allocated in
>> efi_arch_mem_reserve() via memboot & then subsequently cause random
>> EFI memmap corruption once memblock is freed/teared-down.
>>
>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> ---
>> arch/x86/platform/efi/quirks.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
>> index f0cc00032751..d4562d074371 100644
>> --- a/arch/x86/platform/efi/quirks.c
>> +++ b/arch/x86/platform/efi/quirks.c
>> @@ -258,6 +258,16 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
>> int num_entries;
>> void *new;
>>
>> + /*
>> + * For kexec use case, we need to use the EFI memmap passed from the first
>> + * kernel via setup data, so we need to skip this.
>> + * Additionally kexec_enter_virtual_mode() during late init will remap
>> + * the efi_memmap physical pages allocated here via memboot & then
>> + * subsequently cause random EFI memmap corruption once memblock is freed.
> Can you elaborate a bit about the corruption, is it reproducible without
> SNP?
This is only reproducible on SNP.
This is the call-stack for the above function:
[ 0.313377] efi_arch_mem_reserve+0x64/0x220^M
[ 0.314060] ? memblock_add_range+0x2a0/0x2e0^M
[ 0.314763] efi_mem_reserve+0x36/0x60^M
[ 0.315360] efi_bgrt_init+0x17d/0x1a0^M
[ 0.315959] ? __pfx_acpi_parse_bgrt+0x10/0x10^M
[ 0.316711] acpi_parse_bgrt+0x12/0x20^M
[ 0.317310] acpi_table_parse+0x77/0xd0^M
[ 0.317922] acpi_boot_init+0x362/0x630^M
[ 0.318535] setup_arch+0xa4e/0xf90^M
[ 0.319091] start_kernel+0x68/0xa70^M
[ 0.319664] x86_64_start_reservations+0x1c/0x30^M
[ 0.320431] x86_64_start_kernel+0xbf/0x110^M
[ 0.321099] secondary_startup_64_no_verify+0x179/0x17b^M
This function efi_arch_mem_reserve() calls efi_memmap_alloc() which in
turn calls __efi_memmap_alloc_early() which does memblock_phys_alloc(),
and later does efi_memmap_install() which does early_memremap() of the
EFI memmap into this memblock allocated physical memory. So now EFI
memmap gets re-mapped into the memblock allocated memory.
Later kexec_enter_virtual_mode() calls efi_memmap_init_late() which
memremap()'s the EFI memmap into the above memblock allocated physical
range.
Obviously, when memblocks are later freed during late init, this
memblock allocated physical range will get freed and re-allocated which
will eventually overwrite and corrupt the EFI memmap leading to
subsequent kexec boot crash.
>> + */
>> + if (efi_setup)
>> + return;
>> +
> How about checking the md attribute instead of checking the efi_setup,
> personally I feel it a bit better, something like below:
I based the above on the following code checking for kexec boot:
void __init efi_enter_virtual_mode(void)
{
...
if (efi_setup)
kexec_enter_virtual_mode();
else
__efi_enter_virtual_mode();
But, i have tested with the code (you shared below) about checking the
md attribute and it works, so i can resend my v2 patch based on this.
Thanks, Ashish
>
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index f0cc00032751..699332b075bb 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -255,15 +255,24 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
> struct efi_memory_map_data data = { 0 };
> struct efi_mem_range mr;
> efi_memory_desc_t md;
> - int num_entries;
> + int num_entries, ret;
> void *new;
>
> - if (efi_mem_desc_lookup(addr, &md) ||
> - md.type != EFI_BOOT_SERVICES_DATA) {
> + ret = efi_mem_desc_lookup(addr, &md);
> + if (ret) {
> pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr);
> return;
> }
>
> + if (md.type != EFI_BOOT_SERVICES_DATA) {
> + pr_err("Skil reserving non EFI Boot Service Data memory for %pa\n", &addr);
> + return;
> + }
> +
> + /* Kexec copied the efi memmap from the 1st kernel, thus skip the case. */
> + if (md.attribute & EFI_MEMORY_RUNTIME)
> + return;
> +
> if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) {
> pr_err("Region spans EFI memory descriptors, %pa\n", &addr);
> return;
>
>
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2024-03-24 22:33 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20240227212452.3228893-1-kirill.shutemov@linux.intel.com>
[not found] ` <20240227212452.3228893-18-kirill.shutemov@linux.intel.com>
2024-02-27 21:30 ` [PATCHv8 17/17] ACPI: tables: Print MULTIPROC_WAKEUP when MADT is parsed Kuppuswamy Sathyanarayanan
2024-02-27 22:08 ` Huang, Kai
2024-02-28 15:22 ` Kirill A. Shutemov
2024-02-28 21:19 ` Huang, Kai
[not found] ` <20240227212452.3228893-9-kirill.shutemov@linux.intel.com>
2024-02-27 23:12 ` [PATCHv8 08/17] x86/tdx: Account shared memory Huang, Kai
[not found] ` <20240227212452.3228893-10-kirill.shutemov@linux.intel.com>
2024-02-27 23:16 ` [PATCHv8 09/17] x86/mm: Adding callbacks to prepare encrypted memory for kexec Huang, Kai
[not found] ` <20240227212452.3228893-11-kirill.shutemov@linux.intel.com>
2024-02-27 23:30 ` [PATCHv8 10/17] x86/tdx: Convert shared memory back to private on kexec Huang, Kai
[not found] ` <20240227212452.3228893-7-kirill.shutemov@linux.intel.com>
2024-02-27 23:33 ` [PATCHv8 06/17] x86/mm: Make x86_platform.guest.enc_status_change_*() return errno Huang, Kai
2024-03-06 15:02 ` [PATCHv8 00/17, CORRECTED] x86/tdx: Add kexec support Kirill A. Shutemov
2024-03-07 6:57 ` Tao Liu
2024-03-18 7:02 ` [PATCH v2 0/3] x86/snp: " Ashish Kalra
2024-03-18 7:02 ` [PATCH v2 1/3] efi/x86: skip efi_arch_mem_reserve() in case of kexec Ashish Kalra
2024-03-19 4:00 ` Dave Young
2024-03-24 22:32 ` Kalra, Ashish
2024-03-18 7:02 ` [PATCH v2 2/3] x86/mm: Do not zap page table entries mapping unaccepted memory table during kdump Ashish Kalra
2024-03-21 14:58 ` Kirill A. Shutemov
2024-03-18 7:02 ` [PATCH v2 3/3] x86/snp: Convert shared memory back to private on kexec Ashish Kalra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox