linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
To: "linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"Lutomirski, Andy" <luto@kernel.org>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
	"haiyangz@microsoft.com" <haiyangz@microsoft.com>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>, "Christopherson,,
	Sean" <seanjc@google.com>, "mingo@redhat.com" <mingo@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"kys@microsoft.com" <kys@microsoft.com>,
	"Cui, Dexuan" <decui@microsoft.com>,
	"mikelley@microsoft.com" <mikelley@microsoft.com>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	"bp@alien8.de" <bp@alien8.de>,
	"sathyanarayanan.kuppuswamy@linux.intel.com" 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared
Date: Wed, 30 Aug 2023 00:02:44 +0000	[thread overview]
Message-ID: <28cfc19ac3171c270896d080f30aeda11b587bb8.camel@intel.com> (raw)
In-Reply-To: <1688661719-60329-1-git-send-email-mikelley@microsoft.com>

On Thu, 2023-07-06 at 09:41 -0700, Michael Kelley wrote:
> In a CoCo VM when a page transitions from private to shared, or vice
> versa, attributes in the PTE must be updated *and* the hypervisor
> must
> be notified of the change. Because there are two separate steps,
> there's
> a window where the settings are inconsistent.  Normally the code that
> initiates the transition (via set_memory_decrypted() or
> set_memory_encrypted()) ensures that the memory is not being accessed
> during a transition, so the window of inconsistency is not a problem.
> However, the load_unaligned_zeropad() function can read arbitrary
> memory
> pages at arbitrary times, which could access a transitioning page
> during
> the window.  In such a case, CoCo VM specific exceptions are taken
> (depending on the CoCo architecture in use).  Current code in those
> exception handlers recovers and does "fixup" on the result returned
> by
> load_unaligned_zeropad().  Unfortunately, this exception handling and
> fixup code is tricky and somewhat fragile.  At the moment, it is
> broken for both TDX and SEV-SNP.
> 
> There's also a problem with the current code in paravisor scenarios:
> TDX Partitioning and SEV-SNP in vTOM mode. The exceptions need
> to be forwarded from the paravisor to the Linux guest, but there
> are no architectural specs for how to do that.
> 
> To avoid these complexities of the CoCo exception handlers, change
> the core transition code in __set_memory_enc_pgtable() to do the
> following:
> 
> 1.  Remove aliasing mappings
> 2.  Remove the PRESENT bit from the PTEs of all transitioning pages
> 3.  Flush the TLB globally
> 4.  Flush the data cache if needed
> 5.  Set/clear the encryption attribute as appropriate
> 6.  Notify the hypervisor of the page status change
> 7.  Add back the PRESENT bit
> 
> With this approach, load_unaligned_zeropad() just takes its normal
> page-fault-based fixup path if it touches a page that is
> transitioning.
> As a result, load_unaligned_zeropad() and CoCo VM page transitioning
> are completely decoupled.  CoCo VM page transitions can proceed
> without needing to handle architecture-specific exceptions and fix
> things up. This decoupling reduces the complexity due to separate
> TDX and SEV-SNP fixup paths, and gives more freedom to revise and
> introduce new capabilities in future versions of the TDX and SEV-SNP
> architectures. Paravisor scenarios work properly without needing
> to forward exceptions.
> 
> This approach may make __set_memory_enc_pgtable() slightly slower
> because of touching the PTEs three times instead of just once. But
> the run time of this function is already dominated by the hypercall
> and the need to flush the TLB at least once and maybe twice. In any
> case, this function is only used for CoCo VM page transitions, and
> is already unsuitable for hot paths.

Excluding vm_unmap_aliases(), and just looking at the TLB flushes, it
kind of looks like this:
1. Clear present
2. TLB flush
3. Set C bit
4. Set Present bit
5. TLB flush

But if you instead did:
1. Clear Present and set C bit
2. TLB flush
3. Set Present bit (no flush)

Then you could still have only 1 TLB flush and 2 operations instead of
3. Otherwise it's the same load_unaligned_zeropad() benefits you are
looking for I think. But I'm not very educated on the private<->shared
conversion HW rules though, so maybe not.

> 
> The architecture specific callback function for notifying the
> hypervisor typically must translate guest kernel virtual addresses
> into guest physical addresses to pass to the hypervisor.  Because
> the PTEs are invalid at the time of callback, the code for doing the
> translation needs updating.  virt_to_phys() or equivalent continues
> to work for direct map addresses.  But vmalloc addresses cannot use
> vmalloc_to_page() because that function requires the leaf PTE to be
> valid. Instead, slow_virt_to_phys() must be used. Both functions
> manually walk the page table hierarchy, so performance is the same.

Just curious. Are vmalloc addresses supported here? It looks like in
SEV, but not TDX.


  parent reply	other threads:[~2023-08-30  0:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06 16:41 [RFC PATCH 1/1] x86/mm: Mark CoCo VM pages invalid while moving between private and shared Michael Kelley
2023-08-02 21:57 ` Tom Lendacky
2023-08-05 14:38   ` Michael Kelley (LINUX)
2023-08-06 22:19 ` kirill.shutemov
2023-08-16  2:54   ` Michael Kelley (LINUX)
2023-08-28 14:22     ` Michael Kelley (LINUX)
2023-08-28 16:13       ` kirill.shutemov
2023-08-28 21:00         ` Michael Kelley (LINUX)
2023-08-28 22:13           ` kirill.shutemov
2023-08-28 23:23             ` Michael Kelley (LINUX)
2023-08-28 23:57               ` kirill.shutemov
2023-08-30  0:02 ` Edgecombe, Rick P [this message]
2023-08-30  3:33   ` Michael Kelley (LINUX)
2023-08-30 23:40 ` Edgecombe, Rick P
2023-08-31 17:29   ` Edgecombe, Rick P
2023-09-01 14:44     ` Michael Kelley (LINUX)
2023-09-01 16:33       ` Edgecombe, Rick P

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28cfc19ac3171c270896d080f30aeda11b587bb8.camel@intel.com \
    --to=rick.p.edgecombe@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=decui@microsoft.com \
    --cc=haiyangz@microsoft.com \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kys@microsoft.com \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=wei.liu@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).