Re: [PATCHv4 10/14] x86/tdx: Convert shared memory back to private on kexec

linux-coco.lists.linux.dev archive mirror
 help / color / mirror / Atom feed

From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"x86@kernel.org" <x86@kernel.org>, "bp@alien8.de" <bp@alien8.de>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>
Cc: "kexec@lists.infradead.org" <kexec@lists.infradead.org>,
	"Reshetova, Elena" <elena.reshetova@intel.com>,
	"Nakajima, Jun" <jun.nakajima@intel.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"Huang, Kai" <kai.huang@intel.com>,
	"sathyanarayanan.kuppuswamy@linux.intel.com"
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"Hunter, Adrian" <adrian.hunter@intel.com>,
	"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
	"ashish.kalra@amd.com" <ashish.kalra@amd.com>,
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>,
	"seanjc@google.com" <seanjc@google.com>,
	"bhe@redhat.com" <bhe@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCHv4 10/14] x86/tdx: Convert shared memory back to private on kexec
Date: Wed, 6 Dec 2023 01:28:08 +0000	[thread overview]
Message-ID: <3cf8b953c449320cc4c085924ef0e2eed5eadcf7.camel@intel.com> (raw)
In-Reply-To: <20231205004510.27164-11-kirill.shutemov@linux.intel.com>

On Tue, 2023-12-05 at 03:45 +0300, Kirill A. Shutemov wrote: 
> +static void tdx_kexec_unshare_mem(bool crash)
> +{
> +       unsigned long addr, end;
> +       long found = 0, shared;
> +
> +       /* Stop new private<->shared conversions */
> +       conversion_allowed = false;

I wonder if this might need a compiler barrier here to be totally safe.
I'm not sure.

> +
> +       /*
> +        * Crash kernel reaches here with interrupts disabled: can't
> wait for
> +        * conversions to finish.
> +        *
> +        * If race happened, just report and proceed.
> +        */
> +       if (!crash) {
> +               unsigned long timeout;
> +
> +               /*
> +                * Wait for in-flight conversions to complete.
> +                *
> +                * Do not wait more than 30 seconds.
> +                */
> +               timeout = 30 * USEC_PER_SEC;
> +               while (atomic_read(&conversions_in_progress) &&
> timeout--)
> +                       udelay(1);
> +       }
> +
> +       if (atomic_read(&conversions_in_progress))
> +               pr_warn("Failed to finish shared<->private
> conversions\n");

I can't think of any non-ridiculous way to handle this case. Maybe we
need VMM help.

> 
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index 830425e6d38e..c81afffaa954 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -12,6 +12,7 @@
>  #include <linux/delay.h>
>  #include <linux/objtool.h>
>  #include <linux/pgtable.h>
> +#include <linux/kexec.h>
>  #include <acpi/reboot.h>
>  #include <asm/io.h>
>  #include <asm/apic.h>
> @@ -31,6 +32,7 @@
>  #include <asm/realmode.h>
>  #include <asm/x86_init.h>
>  #include <asm/efi.h>
> +#include <asm/tdx.h>
>  
>  /*
>   * Power off function, if any
> @@ -716,6 +718,14 @@ static void
> native_machine_emergency_restart(void)
>  
>  void native_machine_shutdown(void)
>  {
> +       /*
> +        * Call enc_kexec_unshare_mem() while all CPUs are still
> active and
> +        * interrupts are enabled. This will allow all in-flight
> memory
> +        * conversions to finish cleanly before unsharing all memory.
> +        */
> +       if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) &&
> kexec_in_progress)
> +               x86_platform.guest.enc_kexec_unshare_mem(false);

These questions are coming from an incomplete understanding of the
kexec/reboot operation. Please disregard if it is not helpful.

By doing this while other tasks can still run, it handles the
conversion races in the !crash case. But then it sets shared pages to
NP. What happens if another active task tries to write to one?

I guess we rely on the kernel_restart_prepare()->device_shutdown() to
clean up, which runs before native_machine_shutdown(). So there might
be conversions in progress when tdx_kexec_unshare_mem() is called, from
the allocator work queues. But the actual memory won't be accessed
during that operation.

But the console must be active? Or otherwise who can see these
warnings. It doesn't use a shared page? Or the KVM clock, which looks
to clean up at cpu tear down, which now happens after
tdx_kexec_unshare_mem()? So I wonder if there might be cases.

If so, maybe you could halt the conversions in
native_machine_shutdown(), then do the actual reset to private after
tasks can't schedule. I'd still wonder about if anything might try to
access a shared page triggered by the console output.


> +
>         /* Stop the cpus and apics */
>  #ifdef CONFIG_X86_IO_APIC
>         /*

next prev parent reply	other threads:[~2023-12-06  1:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-05  0:44 [PATCHv4 00/14] x86/tdx: Add kexec support Kirill A. Shutemov
2023-12-05  0:44 ` [PATCHv4 01/14] x86/acpi: Extract ACPI MADT wakeup code into a separate file Kirill A. Shutemov
2023-12-05  0:44 ` [PATCHv4 02/14] x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init Kirill A. Shutemov
2023-12-05  0:44 ` [PATCHv4 03/14] cpu/hotplug: Add support for declaring CPU offlining not supported Kirill A. Shutemov
2023-12-15 19:42   ` Thomas Gleixner
2023-12-05  0:45 ` [PATCHv4 04/14] cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup Kirill A. Shutemov
2023-12-15 19:43   ` Thomas Gleixner
2023-12-05  0:45 ` [PATCHv4 05/14] x86/kvm: Do not try to disable kvmclock if it was not enabled Kirill A. Shutemov
2023-12-11 23:10   ` Kirill A. Shutemov
2023-12-13 17:22     ` Sean Christopherson
2024-01-04 15:05       ` Kirill A. Shutemov
2024-01-09 14:59         ` Sean Christopherson
2023-12-05  0:45 ` [PATCHv4 06/14] x86/kexec: Keep CR4.MCE set during kexec for TDX guest Kirill A. Shutemov
2023-12-05 23:58   ` Huang, Kai
2023-12-06 13:26     ` kirill.shutemov
2023-12-05  0:45 ` [PATCHv4 07/14] x86/mm: Make x86_platform.guest.enc_status_change_*() return errno Kirill A. Shutemov
2023-12-05  0:45 ` [PATCHv4 08/14] x86/mm: Return correct level from lookup_address() if pte is none Kirill A. Shutemov
2023-12-05  0:45 ` [PATCHv4 09/14] x86/tdx: Account shared memory Kirill A. Shutemov
2023-12-05  0:45 ` [PATCHv4 10/14] x86/tdx: Convert shared memory back to private on kexec Kirill A. Shutemov
2023-12-06  1:28   ` Edgecombe, Rick P [this message]
2023-12-06 15:07     ` kirill.shutemov
2023-12-06 18:32       ` Edgecombe, Rick P
2023-12-05  0:45 ` [PATCHv4 11/14] x86/mm: Make e820_end_ram_pfn() cover E820_TYPE_ACPI ranges Kirill A. Shutemov
2023-12-05  0:45 ` [PATCHv4 12/14] x86/acpi: Rename fields in acpi_madt_multiproc_wakeup structure Kirill A. Shutemov
2023-12-05  0:45 ` [PATCHv4 13/14] x86/acpi: Do not attempt to bring up secondary CPUs in kexec case Kirill A. Shutemov
2023-12-15 20:08   ` Thomas Gleixner
2023-12-05  0:45 ` [PATCHv4 14/14] x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method Kirill A. Shutemov
2023-12-05 23:36   ` Huang, Kai
2023-12-22 11:19     ` kirill.shutemov
2023-12-22 11:38       ` Huang, Kai
2023-12-15 20:29   ` Thomas Gleixner
2023-12-22 16:34     ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3cf8b953c449320cc4c085924ef0e2eed5eadcf7.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=adrian.hunter@intel.com \
    --cc=ashish.kalra@amd.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=elena.reshetova@intel.com \
    --cc=jun.nakajima@intel.com \
    --cc=kai.huang@intel.com \
    --cc=kexec@lists.infradead.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).