From: Mike Rapoport <rppt@kernel.org>
To: Patrick Roy <roypat@amazon.co.uk>
Cc: seanjc@google.com, pbonzini@redhat.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
x86@kernel.org, hpa@zytor.com, rostedt@goodmis.org,
mhiramat@kernel.org, mathieu.desnoyers@efficios.com,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, quic_eberman@quicinc.com,
dwmw@amazon.com, david@redhat.com, tabba@google.com,
linux-mm@kvack.org, dmatlack@google.com, graf@amazon.com,
jgowans@amazon.com, derekmn@amazon.com, kalyazin@amazon.com,
xmarcalx@amazon.com
Subject: Re: [RFC PATCH v2 01/10] kvm: gmem: Add option to remove gmem from direct map
Date: Wed, 18 Sep 2024 07:48:54 +0200 [thread overview]
Message-ID: <Zuppxn_uW5JhDBjR@kernel.org> (raw)
In-Reply-To: <20240910163038.1298452-2-roypat@amazon.co.uk>
On Tue, Sep 10, 2024 at 05:30:27PM +0100, Patrick Roy wrote:
> Add a flag to the KVM_CREATE_GUEST_MEMFD ioctl that causes gmem pfns
> to be removed from the host kernel's direct map. Memory is removed
> immediately after allocation and preparation of gmem folios (after
> preparation, as the prepare callback might expect the direct map entry
> to be present). Direct map entries are restored before
> kvm_arch_gmem_invalidate is called (as ->invalidate_folio is called
> before ->free_folio), for the same reason.
>
> Use the PG_private flag to indicate that a folio is part of gmem with
> direct map removal enabled. While in this patch, PG_private does have a
> meaning of "folio not in direct map", this will no longer be true in
> follow up patches. Gmem folios might get temporarily reinserted into the
> direct map, but the PG_private flag needs to remain set, as the folios
> will have private data that needs to be freed independently of direct
> map status. This is why kvm_gmem_folio_clear_private does not call
> folio_clear_private.
>
> kvm_gmem_{set,clear}_folio_private must be called with the folio lock
> held.
>
> To ensure that failures in kvm_gmem_{clear,set}_private do not cause
> system instability due to leaving holes in the direct map, try to always
> restore direct map entries on failure. Pages for which restoration of
> direct map entries fails are marked as HWPOISON, to prevent the
> kernel from ever touching them again.
>
> Signed-off-by: Patrick Roy <roypat@amazon.co.uk>
> ---
> include/uapi/linux/kvm.h | 2 +
> virt/kvm/guest_memfd.c | 96 +++++++++++++++++++++++++++++++++++++---
> 2 files changed, 91 insertions(+), 7 deletions(-)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 637efc0551453..81b0f4a236b8c 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1564,6 +1564,8 @@ struct kvm_create_guest_memfd {
> __u64 reserved[6];
> };
>
> +#define KVM_GMEM_NO_DIRECT_MAP (1ULL << 0)
> +
> #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
>
> struct kvm_pre_fault_memory {
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 1c509c3512614..2ed27992206f3 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -4,6 +4,7 @@
> #include <linux/kvm_host.h>
> #include <linux/pagemap.h>
> #include <linux/anon_inodes.h>
> +#include <linux/set_memory.h>
>
> #include "kvm_mm.h"
>
> @@ -49,8 +50,69 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol
> return 0;
> }
>
> +static bool kvm_gmem_test_no_direct_map(struct inode *inode)
> +{
> + return ((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) == KVM_GMEM_NO_DIRECT_MAP;
> +}
> +
> +static int kvm_gmem_folio_set_private(struct folio *folio)
> +{
> + unsigned long start, npages, i;
> + int r;
> +
> + start = (unsigned long) folio_address(folio);
> + npages = folio_nr_pages(folio);
> +
> + for (i = 0; i < npages; ++i) {
> + r = set_direct_map_invalid_noflush(folio_page(folio, i));
> + if (r)
> + goto out_remap;
> + }
I feels like we need a new helper that takes care of contiguous pages.
arm64 already has set_memory_valid(), so it may be something like
int set_direct_map_valid_noflush(struct page *p, unsigned nr, bool valid);
> + flush_tlb_kernel_range(start, start + folio_size(folio));
> + folio_set_private(folio);
> + return 0;
> +out_remap:
> + for (; i > 0; i--) {
> + struct page *page = folio_page(folio, i - 1);
> +
> + if (WARN_ON_ONCE(set_direct_map_default_noflush(page))) {
> + /*
> + * Random holes in the direct map are bad, let's mark
> + * these pages as corrupted memory so that the kernel
> + * avoids ever touching them again.
> + */
> + folio_set_hwpoison(folio);
> + r = -EHWPOISON;
> + }
> + }
> + return r;
> +}
> +
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2024-09-18 5:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-09-10 16:30 [RFC PATCH v2 00/10] Unmapping guest_memfd from Direct Map Patrick Roy
2024-09-10 16:30 ` [RFC PATCH v2 01/10] kvm: gmem: Add option to remove gmem from direct map Patrick Roy
2024-09-13 17:43 ` kernel test robot
2024-09-18 5:48 ` Mike Rapoport [this message]
2024-09-10 16:30 ` [RFC PATCH v2 02/10] kvm: gmem: Add KVM_GMEM_GET_PFN_SHARED Patrick Roy
2024-09-13 18:47 ` kernel test robot
2024-09-10 16:30 ` [RFC PATCH v2 03/10] kvm: gmem: Add KVM_GMEM_GET_PFN_LOCKED Patrick Roy
2024-09-10 16:30 ` [RFC PATCH v2 04/10] kvm: Allow reading/writing gmem using kvm_{read,write}_guest Patrick Roy
2024-09-10 16:30 ` [RFC PATCH v2 05/10] kvm: gmem: Refcount internal accesses to gmem Patrick Roy
2024-09-10 16:30 ` [RFC PATCH v2 06/10] kvm: gmem: add tracepoints for gmem share/unshare Patrick Roy
2024-10-04 22:50 ` Steven Rostedt
2024-09-10 16:30 ` [RFC PATCH v2 07/10] kvm: pfncache: invalidate when memory attributes change Patrick Roy
2024-09-13 18:04 ` kernel test robot
2024-09-10 16:30 ` [RFC PATCH v2 08/10] kvm: pfncache: Support caching gmem pfns Patrick Roy
2024-09-10 16:30 ` [RFC PATCH v2 09/10] kvm: pfncache: hook up to gmem invalidation Patrick Roy
2024-09-10 16:30 ` [RFC PATCH v2 10/10] kvm: x86: support walking guest page tables in gmem Patrick Roy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zuppxn_uW5JhDBjR@kernel.org \
--to=rppt@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=derekmn@amazon.com \
--cc=dmatlack@google.com \
--cc=dwmw@amazon.com \
--cc=graf@amazon.com \
--cc=hpa@zytor.com \
--cc=jgowans@amazon.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=quic_eberman@quicinc.com \
--cc=rostedt@goodmis.org \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=tabba@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xmarcalx@amazon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.