All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Patrick Roy <roypat@amazon.co.uk>
Cc: seanjc@google.com, pbonzini@redhat.com,
	akpm@linux-foundation.org, dwmw@amazon.co.uk, david@redhat.com,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
	willy@infradead.org, graf@amazon.com, derekmn@amazon.com,
	kalyazin@amazon.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	dmatlack@google.com, tabba@google.com,
	chao.p.peng@linux.intel.com, xmarcalx@amazon.co.uk
Subject: Re: [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map
Date: Wed, 10 Jul 2024 10:31:35 +0300	[thread overview]
Message-ID: <Zo441yz7Yw2JZcPs@kernel.org> (raw)
In-Reply-To: <20240709132041.3625501-6-roypat@amazon.co.uk>

On Tue, Jul 09, 2024 at 02:20:33PM +0100, Patrick Roy wrote:
> While guest_memfd is not available to be mapped by userspace, it is
> still accessible through the kernel's direct map. This means that in
> scenarios where guest-private memory is not hardware protected, it can
> be speculatively read and its contents potentially leaked through
> hardware side-channels. Removing guest-private memory from the direct
> map, thus mitigates a large class of speculative execution issues
> [1, Table 1].
> 
> This patch adds a flag to the `KVM_CREATE_GUEST_MEMFD` which, if set, removes the
> struct pages backing guest-private memory from the direct map. Should
> `CONFIG_HAVE_KVM_GMEM_{INVALIDATE, PREPARE}` be set, pages are removed
> after preparation and before invalidation, so that the
> prepare/invalidate routines do not have to worry about potentially
> absent direct map entries.
> 
> Direct map removal do not reuse the `KVM_GMEM_PREPARE` machinery, since `prepare` can be
> called multiple time, and it is the responsibility of the preparation
> routine to not "prepare" the same folio twice [2]. Thus, instead
> explicitly check if `filemap_grab_folio` allocated a new folio, and
> remove the returned folio from the direct map only if this was the case.
> 
> The patch uses release_folio instead of free_folio to reinsert pages
> back into the direct map as by the time free_folio is called,
> folio->mapping can already be NULL. This means that a call to
> folio_inode inside free_folio might deference a NULL pointer, leaving no
> way to access the inode which stores the flags that allow determining
> whether the page was removed from the direct map in the first place.
> 
> Lastly, the patch uses set_direct_map_{invalid,default}_noflush instead
> of `set_memory_[n]p` to avoid expensive flushes of TLBs and the L*-cache
> hierarchy. This is especially important once KVM restores direct map
> entries on-demand in later patches, where simple FIO benchmarks of a
> virtio-blk device have shown that TLB flushes on a Intel(R) Xeon(R)
> Platinum 8375C CPU @ 2.90GHz resulted in 80% degradation in throughput
> compared to a non-flushing solution.
> 
> Not flushing the TLB means that until TLB entries for temporarily
> restored direct map entries get naturally evicted, they can be used
> during speculative execution, and effectively "unhide" the memory for
> longer than intended. We consider this acceptable, as the only pages
> that are temporarily reinserted into the direct map like this will
> either hold PV data structures (kvm-clock, asyncpf, etc), or pages
> containing privileged instructions inside the guest kernel image (in the
> MMIO emulation case).
> 
> [1]: https://download.vusec.net/papers/quarantine_raid23.pdf
> 
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
>  include/uapi/linux/kvm.h |  2 ++
>  virt/kvm/guest_memfd.c   | 52 ++++++++++++++++++++++++++++++++++------
>  2 files changed, 47 insertions(+), 7 deletions(-)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index e065d9fe7ab2..409116aa23c9 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1563,4 +1563,6 @@ struct kvm_create_guest_memfd {
>  	__u64 reserved[6];
>  };
>  
> +#define KVM_GMEM_NO_DIRECT_MAP                 (1ULL << 0)
> +
>  #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 9148b9679bb1..dc9b0c2d0b0e 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -4,6 +4,7 @@
>  #include <linux/kvm_host.h>
>  #include <linux/pagemap.h>
>  #include <linux/anon_inodes.h>
> +#include <linux/set_memory.h>
>  
>  #include "kvm_mm.h"
>  
> @@ -49,9 +50,16 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
>  	return 0;
>  }
>  
> +static bool kvm_gmem_not_present(struct inode *inode)
> +{
> +	return ((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) != 0;
> +}
> +
>  static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare)
>  {
>  	struct folio *folio;
> +	bool zap_direct_map = false;
> +	int r;
>  
>  	/* TODO: Support huge pages. */
>  	folio = filemap_grab_folio(inode->i_mapping, index);
> @@ -74,16 +82,30 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool
>  		for (i = 0; i < nr_pages; i++)
>  			clear_highpage(folio_page(folio, i));
>  
> +		// We need to clear the folio before calling kvm_gmem_prepare_folio,
> +		// but can only remove it from the direct map _after_ preparation is done.

No C++ comments please

> +		zap_direct_map = kvm_gmem_not_present(inode);
> +
>  		folio_mark_uptodate(folio);
>  	}
>  
>  	if (prepare) {
> -		int r =	kvm_gmem_prepare_folio(inode, index, folio);
> -		if (r < 0) {
> -			folio_unlock(folio);
> -			folio_put(folio);
> -			return ERR_PTR(r);
> -		}
> +		r = kvm_gmem_prepare_folio(inode, index, folio);
> +		if (r < 0)
> +			goto out_err;
> +	}
> +
> +	if (zap_direct_map) {
> +		r = set_direct_map_invalid_noflush(&folio->page);

It's not future proof to presume that folio is a single page here.
You should loop over folio pages and add a TLB flush after the loop.

> +		if (r < 0)
> +			goto out_err;
> +
> +		// We use the private flag to track whether the folio has been removed
> +		// from the direct map. This is because inside of ->free_folio,
> +		// we do not have access to the address_space anymore, meaning we
> +		// cannot check folio_inode(folio)->i_private to determine whether
> +		// KVM_GMEM_NO_DIRECT_MAP was set.
> +		folio_set_private(folio);
>  	}
>  
>  	/*
> @@ -91,6 +113,10 @@ static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool
>  	 * unevictable and there is no storage to write back to.
>  	 */
>  	return folio;
> +out_err:
> +	folio_unlock(folio);
> +	folio_put(folio);
> +	return ERR_PTR(r);
>  }
>  
>  static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start,
> @@ -354,10 +380,22 @@ static void kvm_gmem_free_folio(struct folio *folio)
>  }
>  #endif
>  
> +static void kvm_gmem_invalidate_folio(struct folio *folio, size_t start, size_t end)
> +{
> +	if (start == 0 && end == PAGE_SIZE) {
> +		// We only get here if PG_private is set, which only happens if kvm_gmem_not_present
> +		// returned true in kvm_gmem_get_folio. Thus no need to do that check again.
> +		BUG_ON(set_direct_map_default_noflush(&folio->page));

Ditto.

> +
> +		folio_clear_private(folio);
> +	}
> +}
> +
>  static const struct address_space_operations kvm_gmem_aops = {
>  	.dirty_folio = noop_dirty_folio,
>  	.migrate_folio	= kvm_gmem_migrate_folio,
>  	.error_remove_folio = kvm_gmem_error_folio,
> +	.invalidate_folio = kvm_gmem_invalidate_folio,
>  #ifdef CONFIG_HAVE_KVM_GMEM_INVALIDATE
>  	.free_folio = kvm_gmem_free_folio,
>  #endif
> @@ -443,7 +481,7 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>  {
>  	loff_t size = args->size;
>  	u64 flags = args->flags;
> -	u64 valid_flags = 0;
> +	u64 valid_flags = KVM_GMEM_NO_DIRECT_MAP;
>  
>  	if (flags & ~valid_flags)
>  		return -EINVAL;
> -- 
> 2.45.2
> 

-- 
Sincerely yours,
Mike.

  reply	other threads:[~2024-07-10  7:34 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-09 13:20 [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 1/8] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 2/8] kvm: use slowpath in gfn_to_hva_cache if memory is private Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 3/8] kvm: pfncache: enlighten about gmem Patrick Roy
2024-07-09 14:36   ` David Woodhouse
2024-07-10  9:49     ` Patrick Roy
2024-07-10 10:20       ` David Woodhouse
2024-07-10 10:46         ` Patrick Roy
2024-07-10 10:50           ` David Woodhouse
2024-07-09 13:20 ` [RFC PATCH 4/8] kvm: x86: support walking guest page tables in gmem Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 5/8] kvm: gmem: add option to remove guest private memory from direct map Patrick Roy
2024-07-10  7:31   ` Mike Rapoport [this message]
2024-07-10  9:50     ` Patrick Roy
2024-07-09 13:20 ` [RFC PATCH 6/8] kvm: gmem: Temporarily restore direct map entries when needed Patrick Roy
2024-07-11  6:25   ` Paolo Bonzini
2024-07-09 13:20 ` [RFC PATCH 7/8] mm: secretmem: use AS_INACCESSIBLE to prohibit GUP Patrick Roy
2024-07-09 21:09   ` David Hildenbrand
2024-07-10  7:32     ` Mike Rapoport
2024-07-10  9:50       ` Patrick Roy
2024-07-10 21:14         ` David Hildenbrand
2024-07-09 13:20 ` [RFC PATCH 8/8] kvm: gmem: Allow restricted userspace mappings Patrick Roy
2024-07-09 14:48   ` Fuad Tabba
2024-07-09 21:13     ` David Hildenbrand
2024-07-10  9:51       ` Patrick Roy
2024-07-10 21:12         ` David Hildenbrand
2024-07-10 21:53           ` Sean Christopherson
2024-07-10 21:56             ` David Hildenbrand
2024-07-12 15:59           ` Patrick Roy
2024-07-30 10:15             ` David Hildenbrand
2024-08-01 10:30               ` Patrick Roy
2024-07-22 12:28 ` [RFC PATCH 0/8] Unmapping guest_memfd from Direct Map Vlastimil Babka (SUSE)
2024-07-26  6:55   ` Patrick Roy
2024-07-30 10:17     ` David Hildenbrand
2024-07-26 16:44 ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zo441yz7Yw2JZcPs@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=chao.p.peng@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=derekmn@amazon.com \
    --cc=dmatlack@google.com \
    --cc=dwmw@amazon.co.uk \
    --cc=graf@amazon.com \
    --cc=hpa@zytor.com \
    --cc=kalyazin@amazon.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=roypat@amazon.co.uk \
    --cc=seanjc@google.com \
    --cc=tabba@google.com \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=xmarcalx@amazon.co.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.