Re: [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"Lutomirski, Andy" <luto@kernel.org>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>, "Christopherson,,
	Sean" <seanjc@google.com>, "mingo@redhat.com" <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kys@microsoft.com" <kys@microsoft.com>,
	"Cui, Dexuan" <decui@microsoft.com>,
	"mikelley@microsoft.com" <mikelley@microsoft.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"sathyanarayanan.kuppuswamy@linux.intel.com" 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared
Date: Wed, 30 Aug 2023 00:02:44 +0000	[thread overview]
Message-ID: <28cfc19ac3171c270896d080f30aeda11b587bb8.camel@intel.com> (raw)
In-Reply-To: <1688661719-60329-1-git-send-email-mikelley@microsoft.com>

On Thu, 2023-07-06 at 09:41 -0700, Michael Kelley wrote:
> In a CoCo VM when a page transitions from private to shared, or vice
> versa, attributes in the PTE must be updated *and* the hypervisor
> must
> be notified of the change. Because there are two separate steps,
> there's
> a window where the settings are inconsistent.  Normally the code that
> initiates the transition (via set_memory_decrypted() or
> set_memory_encrypted()) ensures that the memory is not being accessed
> during a transition, so the window of inconsistency is not a problem.
> However, the load_unaligned_zeropad() function can read arbitrary
> memory
> pages at arbitrary times, which could access a transitioning page
> during
> the window.  In such a case, CoCo VM specific exceptions are taken
> (depending on the CoCo architecture in use).  Current code in those
> exception handlers recovers and does "fixup" on the result returned
> by
> load_unaligned_zeropad().  Unfortunately, this exception handling and
> fixup code is tricky and somewhat fragile.  At the moment, it is
> broken for both TDX and SEV-SNP.
> 
> There's also a problem with the current code in paravisor scenarios:
> TDX Partitioning and SEV-SNP in vTOM mode. The exceptions need
> to be forwarded from the paravisor to the Linux guest, but there
> are no architectural specs for how to do that.
> 
> To avoid these complexities of the CoCo exception handlers, change
> the core transition code in __set_memory_enc_pgtable() to do the
> following:
> 
> 1.  Remove aliasing mappings
> 2.  Remove the PRESENT bit from the PTEs of all transitioning pages
> 3.  Flush the TLB globally
> 4.  Flush the data cache if needed
> 5.  Set/clear the encryption attribute as appropriate
> 6.  Notify the hypervisor of the page status change
> 7.  Add back the PRESENT bit
> 
> With this approach, load_unaligned_zeropad() just takes its normal
> page-fault-based fixup path if it touches a page that is
> transitioning.
> As a result, load_unaligned_zeropad() and CoCo VM page transitioning
> are completely decoupled.  CoCo VM page transitions can proceed
> without needing to handle architecture-specific exceptions and fix
> things up. This decoupling reduces the complexity due to separate
> TDX and SEV-SNP fixup paths, and gives more freedom to revise and
> introduce new capabilities in future versions of the TDX and SEV-SNP
> architectures. Paravisor scenarios work properly without needing
> to forward exceptions.
> 
> This approach may make __set_memory_enc_pgtable() slightly slower
> because of touching the PTEs three times instead of just once. But
> the run time of this function is already dominated by the hypercall
> and the need to flush the TLB at least once and maybe twice. In any
> case, this function is only used for CoCo VM page transitions, and
> is already unsuitable for hot paths.

Excluding vm_unmap_aliases(), and just looking at the TLB flushes, it
kind of looks like this:
1. Clear present
2. TLB flush
3. Set C bit
4. Set Present bit
5. TLB flush

But if you instead did:
1. Clear Present and set C bit
2. TLB flush
3. Set Present bit (no flush)

Then you could still have only 1 TLB flush and 2 operations instead of
3. Otherwise it's the same load_unaligned_zeropad() benefits you are
looking for I think. But I'm not very educated on the private<->shared
conversion HW rules though, so maybe not.

> 
> The architecture specific callback function for notifying the
> hypervisor typically must translate guest kernel virtual addresses
> into guest physical addresses to pass to the hypervisor.  Because
> the PTEs are invalid at the time of callback, the code for doing the
> translation needs updating.  virt_to_phys() or equivalent continues
> to work for direct map addresses.  But vmalloc addresses cannot use
> vmalloc_to_page() because that function requires the leaf PTE to be
> valid. Instead, slow_virt_to_phys() must be used. Both functions
> manually walk the page table hierarchy, so performance is the same.

Just curious. Are vmalloc addresses supported here? It looks like in
SEV, but not TDX.

next prev parent reply	other threads:[~2023-08-30  0:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06 16:41 [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared Michael Kelley
2023-08-02 21:57 ` Tom Lendacky
2023-08-05 14:38   ` Michael Kelley (LINUX)
2023-08-06 22:19 ` kirill.shutemov
2023-08-16  2:54   ` Michael Kelley (LINUX)
2023-08-28 14:22     ` Michael Kelley (LINUX)
2023-08-28 16:13       ` kirill.shutemov
2023-08-28 21:00         ` Michael Kelley (LINUX)
2023-08-28 22:13           ` kirill.shutemov
2023-08-28 23:23             ` Michael Kelley (LINUX)
2023-08-28 23:57               ` kirill.shutemov
2023-08-30  0:02 ` Edgecombe, Rick P [this message]
2023-08-30  3:33   ` Michael Kelley (LINUX)
2023-08-30 23:40 ` Edgecombe, Rick P
2023-08-31 17:29   ` Edgecombe, Rick P
2023-09-01 14:44     ` Michael Kelley (LINUX)
2023-09-01 16:33       ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28cfc19ac3171c270896d080f30aeda11b587bb8.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).