From: Fuad Tabba <tabba@google.com>
To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org
Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au,
anup@brainfault.org, paul.walmsley@sifive.com,
palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com,
viro@zeniv.linux.org.uk, brauner@kernel.org,
willy@infradead.org, akpm@linux-foundation.org,
xiaoyao.li@intel.com, yilun.xu@intel.com,
chao.p.peng@linux.intel.com, jarkko@kernel.org,
amoorthy@google.com, dmatlack@google.com,
isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
vannapurve@google.com, ackerleytng@google.com,
mail@maciej.szmigiero.name, david@redhat.com,
michael.roth@amd.com, wei.w.wang@intel.com,
liam.merwick@oracle.com, isaku.yamahata@gmail.com,
kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
steven.price@arm.com, quic_eberman@quicinc.com,
quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
catalin.marinas@arm.com, james.morse@arm.com,
yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org,
will@kernel.org, qperret@google.com, keirf@google.com,
roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org,
jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com,
fvdl@google.com, hughd@google.com, jthoughton@google.com,
peterx@redhat.com, tabba@google.com
Subject: [PATCH v6 5/7] KVM: guest_memfd: Restore folio state after final folio_put()
Date: Tue, 18 Mar 2025 16:20:44 +0000 [thread overview]
Message-ID: <20250318162046.4016367-6-tabba@google.com> (raw)
In-Reply-To: <20250318162046.4016367-1-tabba@google.com>
Before transitioning a guest_memfd folio to unshared, thereby
disallowing access by the host and allowing the hypervisor to transition
its view of the guest page as private, we need to be sure that the host
doesn't have any references to the folio.
This patch uses the guest_memfd folio type to register a callback that
informs the guest_memfd subsystem when the last reference is dropped,
therefore knowing that the host doesn't have any remaining references.
Signed-off-by: Fuad Tabba <tabba@google.com>
---
The function kvm_slot_gmem_register_callback() isn't used in this
series. It will be used later in code that performs unsharing of
memory. I have tested it with pKVM, based on downstream code [*].
It's included in this RFC since it demonstrates the plan to
handle unsharing of private folios.
[*] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.13-v6-pkvm
---
include/linux/kvm_host.h | 6 ++
virt/kvm/guest_memfd.c | 142 ++++++++++++++++++++++++++++++++++++++-
2 files changed, 147 insertions(+), 1 deletion(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bf82faf16c53..d9d9d72d8beb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2607,6 +2607,7 @@ int kvm_gmem_slot_set_shared(struct kvm_memory_slot *slot, gfn_t start,
int kvm_gmem_slot_clear_shared(struct kvm_memory_slot *slot, gfn_t start,
gfn_t end);
bool kvm_gmem_slot_is_guest_shared(struct kvm_memory_slot *slot, gfn_t gfn);
+int kvm_gmem_slot_register_callback(struct kvm_memory_slot *slot, gfn_t gfn);
void kvm_gmem_handle_folio_put(struct folio *folio);
#else
static inline int kvm_gmem_set_shared(struct kvm *kvm, gfn_t start, gfn_t end)
@@ -2638,6 +2639,11 @@ static inline bool kvm_gmem_slot_is_guest_shared(struct kvm_memory_slot *slot,
WARN_ON_ONCE(1);
return false;
}
+static inline int kvm_gmem_slot_register_callback(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
#endif /* CONFIG_KVM_GMEM_SHARED_MEM */
#endif
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4b857ab421bf..4fd9e5760503 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -391,6 +391,28 @@ enum folio_shareability {
KVM_GMEM_NONE_SHARED = 0b11, /* Not shared, transient state. */
};
+/*
+ * Unregisters the __folio_put() callback from the folio.
+ *
+ * Restores a folio's refcount after all pending references have been released,
+ * and removes the folio type, thereby removing the callback. Now the folio can
+ * be freed normaly once all actual references have been dropped.
+ *
+ * Must be called with the filemap (inode->i_mapping) invalidate_lock held, and
+ * the folio must be locked.
+ */
+static void kvm_gmem_restore_pending_folio(struct folio *folio, const struct inode *inode)
+{
+ rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock);
+ WARN_ON_ONCE(!folio_test_locked(folio));
+
+ if (WARN_ON_ONCE(folio_mapped(folio) || !folio_test_guestmem(folio)))
+ return;
+
+ __folio_clear_guestmem(folio);
+ folio_ref_add(folio, folio_nr_pages(folio));
+}
+
static int kvm_gmem_offset_set_shared(struct inode *inode, pgoff_t index)
{
struct xarray *shared_offsets = &kvm_gmem_private(inode)->shared_offsets;
@@ -398,6 +420,24 @@ static int kvm_gmem_offset_set_shared(struct inode *inode, pgoff_t index)
rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock);
+ /*
+ * If the folio is NONE_SHARED, it indicates that it is transitioning to
+ * private (GUEST_SHARED). Transition it to shared (ALL_SHARED)
+ * immediately, and remove the callback.
+ */
+ if (xa_to_value(xa_load(shared_offsets, index)) == KVM_GMEM_NONE_SHARED) {
+ struct folio *folio = filemap_lock_folio(inode->i_mapping, index);
+
+ if (WARN_ON_ONCE(IS_ERR(folio)))
+ return PTR_ERR(folio);
+
+ if (folio_test_guestmem(folio))
+ kvm_gmem_restore_pending_folio(folio, inode);
+
+ folio_unlock(folio);
+ folio_put(folio);
+ }
+
return xa_err(xa_store(shared_offsets, index, xval, GFP_KERNEL));
}
@@ -498,9 +538,109 @@ static int kvm_gmem_offset_range_clear_shared(struct inode *inode,
return r;
}
+/*
+ * Registers a callback to __folio_put(), so that gmem knows that the host does
+ * not have any references to the folio. The callback itself is registered by
+ * setting the folio type to guestmem.
+ *
+ * Returns 0 if a callback was registered or already has been registered, or
+ * -EAGAIN if the host has references, indicating a callback wasn't registered.
+ *
+ * Must be called with the filemap (inode->i_mapping) invalidate_lock held, and
+ * the folio must be locked.
+ */
+static int kvm_gmem_register_callback(struct folio *folio, struct inode *inode, pgoff_t index)
+{
+ struct xarray *shared_offsets = &kvm_gmem_private(inode)->shared_offsets;
+ void *xval_guest = xa_mk_value(KVM_GMEM_GUEST_SHARED);
+ int refcount;
+ int r = 0;
+
+ rwsem_assert_held_write_nolockdep(&inode->i_mapping->invalidate_lock);
+ WARN_ON_ONCE(!folio_test_locked(folio));
+
+ if (folio_test_guestmem(folio))
+ return 0;
+
+ if (folio_mapped(folio))
+ return -EAGAIN;
+
+ refcount = folio_ref_count(folio);
+ if (!folio_ref_freeze(folio, refcount))
+ return -EAGAIN;
+
+ /*
+ * Register callback by setting the folio type and subtracting gmem's
+ * references for it to trigger once outstanding references are dropped.
+ */
+ if (refcount > 1) {
+ __folio_set_guestmem(folio);
+ refcount -= folio_nr_pages(folio);
+ } else {
+ /* No outstanding references, transition it to guest shared. */
+ r = WARN_ON_ONCE(xa_err(xa_store(shared_offsets, index, xval_guest, GFP_KERNEL)));
+ }
+
+ folio_ref_unfreeze(folio, refcount);
+ return r;
+}
+
+int kvm_gmem_slot_register_callback(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+ unsigned long pgoff = slot->gmem.pgoff + gfn - slot->base_gfn;
+ struct inode *inode = file_inode(READ_ONCE(slot->gmem.file));
+ struct folio *folio;
+ int r;
+
+ filemap_invalidate_lock(inode->i_mapping);
+
+ folio = filemap_lock_folio(inode->i_mapping, pgoff);
+ if (WARN_ON_ONCE(IS_ERR(folio))) {
+ r = PTR_ERR(folio);
+ goto out;
+ }
+
+ r = kvm_gmem_register_callback(folio, inode, pgoff);
+
+ folio_unlock(folio);
+ folio_put(folio);
+out:
+ filemap_invalidate_unlock(inode->i_mapping);
+
+ return r;
+}
+EXPORT_SYMBOL_GPL(kvm_gmem_slot_register_callback);
+
+/*
+ * Callback function for __folio_put(), i.e., called once all references by the
+ * host to the folio have been dropped. This allows gmem to transition the state
+ * of the folio to shared with the guest, and allows the hypervisor to continue
+ * transitioning its state to private, since the host cannot attempt to access
+ * it anymore.
+ */
void kvm_gmem_handle_folio_put(struct folio *folio)
{
- WARN_ONCE(1, "A placeholder that shouldn't trigger. Work in progress.");
+ struct address_space *mapping;
+ struct xarray *shared_offsets;
+ struct inode *inode;
+ pgoff_t index;
+ void *xval;
+
+ mapping = folio->mapping;
+ if (WARN_ON_ONCE(!mapping))
+ return;
+
+ inode = mapping->host;
+ index = folio->index;
+ shared_offsets = &kvm_gmem_private(inode)->shared_offsets;
+ xval = xa_mk_value(KVM_GMEM_GUEST_SHARED);
+
+ filemap_invalidate_lock(inode->i_mapping);
+ folio_lock(folio);
+ kvm_gmem_restore_pending_folio(folio, inode);
+ folio_unlock(folio);
+ WARN_ON_ONCE(xa_err(xa_store(shared_offsets, index, xval, GFP_KERNEL)));
+ filemap_invalidate_unlock(inode->i_mapping);
}
EXPORT_SYMBOL_GPL(kvm_gmem_handle_folio_put);
--
2.49.0.rc1.451.g8f38331e32-goog
next prev parent reply other threads:[~2025-03-18 16:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-18 16:20 [PATCH v6 0/7] KVM: Restricted mapping of guest_memfd at the host and arm64 support Fuad Tabba
2025-03-18 16:20 ` [PATCH v6 1/7] KVM: guest_memfd: Make guest mem use guest mem inodes instead of anonymous inodes Fuad Tabba
2025-03-18 16:20 ` [PATCH v6 2/7] KVM: guest_memfd: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock Fuad Tabba
2025-03-18 16:20 ` [PATCH v6 3/7] KVM: guest_memfd: Track folio sharing within a struct kvm_gmem_private Fuad Tabba
2025-03-18 16:20 ` [PATCH v6 4/7] KVM: guest_memfd: Folio sharing states and functions that manage their transition Fuad Tabba
2025-03-18 16:20 ` Fuad Tabba [this message]
2025-03-21 20:09 ` [PATCH v6 5/7] KVM: guest_memfd: Restore folio state after final folio_put() Vishal Annapurve
2025-03-25 15:57 ` Fuad Tabba
2025-04-02 22:17 ` Michael Roth
2025-03-18 16:20 ` [PATCH v6 6/7] KVM: guest_memfd: Handle invalidation of shared memory Fuad Tabba
2025-03-18 16:20 ` [PATCH v6 7/7] KVM: guest_memfd: Add a guest_memfd() flag to initialize it as shared Fuad Tabba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250318162046.4016367-6-tabba@google.com \
--to=tabba@google.com \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=maz@kernel.org \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qperret@google.com \
--cc=quic_cvanscha@quicinc.com \
--cc=quic_eberman@quicinc.com \
--cc=quic_mnalajal@quicinc.com \
--cc=quic_pderrin@quicinc.com \
--cc=quic_pheragu@quicinc.com \
--cc=quic_svaddagi@quicinc.com \
--cc=quic_tsoni@quicinc.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox