* [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
@ 2025-11-13 23:22 Sean Christopherson
2025-11-17 11:36 ` Yan Zhao
2026-06-03 15:37 ` Ackerley Tng
0 siblings, 2 replies; 9+ messages in thread
From: Sean Christopherson @ 2025-11-13 23:22 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kvm, linux-kernel, Yan Zhao, Vishal Annapurve,
Sean Christopherson
Add more context and information to the comment in kvm_gmem_release() that
explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b)
from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
from slot->gmem.file")
b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
occur after the memslot is no longer visible to kvm_gmem_get_pfn().
is especially difficult to fully grok, particularly in light of commit
ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
gmem is dying"), which addressed a race between unbind() and release().
No functional change intended.
Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/guest_memfd.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fdaea3422c30..2e09d7ec0cfc 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
* dereferencing the slot for existing bindings needs to be protected
* against memslot updates, specifically so that unbind doesn't race
* and free the memslot (kvm_gmem_get_file() will return NULL).
- *
- * Since .release is called only when the reference count is zero,
- * after which file_ref_get() and get_file_active() fail,
- * kvm_gmem_get_pfn() cannot be using the file concurrently.
- * file_ref_put() provides a full barrier, and get_file_active() the
- * matching acquire barrier.
*/
mutex_lock(&kvm->slots_lock);
filemap_invalidate_lock(inode->i_mapping);
+ /*
+ * Note! synchronize_srcu() is _not_ needed after nullifying memslot
+ * bindings as slot->gmem.file cannot be set back to a non-null value
+ * without the memslot first being deleted. I.e. this relies on the
+ * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure
+ * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't
+ * grab a reference to slot->gmem.file even if the struct file object
+ * is reallocated.
+ *
+ * file_ref_put() provides a full barrier, and __get_file_rcu() the
+ * matching acquire barrier, to ensure that kvm_gmem_get_file() (via
+ * __get_file_rcu()) sees refcount==0 or fails the "file reloaded"
+ * check (file != NULL due to nullifying the file pointer here).
+ */
xa_for_each(&f->bindings, index, slot)
WRITE_ONCE(slot->gmem.file, NULL);
base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272
--
2.52.0.rc1.455.g30608eb744-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2025-11-13 23:22 [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF Sean Christopherson @ 2025-11-17 11:36 ` Yan Zhao 2026-06-03 0:40 ` Ackerley Tng 2026-06-04 0:10 ` Sean Christopherson 2026-06-03 15:37 ` Ackerley Tng 1 sibling, 2 replies; 9+ messages in thread From: Yan Zhao @ 2025-11-17 11:36 UTC (permalink / raw) To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Vishal Annapurve On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote: > Add more context and information to the comment in kvm_gmem_release() that > explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) > from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute > from slot->gmem.file") > > b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot > occur after the memslot is no longer visible to kvm_gmem_get_pfn(). > > is especially difficult to fully grok, particularly in light of commit > ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when > gmem is dying"), which addressed a race between unbind() and release(). As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when gmem is dying"), unbind() and release() are mutually exclusive, i.e., both protected by slots_lock, mentioning "race" here is confusing. So, that commit addressed the mishandling in unbind() after kvm_gmem_get_file() returning NULL? > No functional change intended. > > Cc: Yan Zhao <yan.y.zhao@intel.com> > Cc: Vishal Annapurve <vannapurve@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > virt/kvm/guest_memfd.c | 20 ++++++++++++++------ > 1 file changed, 14 insertions(+), 6 deletions(-) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index fdaea3422c30..2e09d7ec0cfc 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) > * dereferencing the slot for existing bindings needs to be protected > * against memslot updates, specifically so that unbind doesn't race > * and free the memslot (kvm_gmem_get_file() will return NULL). > - * > - * Since .release is called only when the reference count is zero, > - * after which file_ref_get() and get_file_active() fail, > - * kvm_gmem_get_pfn() cannot be using the file concurrently. > - * file_ref_put() provides a full barrier, and get_file_active() the > - * matching acquire barrier. > */ > mutex_lock(&kvm->slots_lock); > > filemap_invalidate_lock(inode->i_mapping); > > + /* > + * Note! synchronize_srcu() is _not_ needed after nullifying memslot > + * bindings as slot->gmem.file cannot be set back to a non-null value > + * without the memslot first being deleted. I.e. this relies on the > + * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure > + * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't > + * grab a reference to slot->gmem.file even if the struct file object > + * is reallocated. Do you mean that as kvm_gmem_get_pfn() can't find a stale slot, it can't grab reference to stale slot->gmem.file, even if slot->gmem.file has been set to a different value, i.e., after invoking unbind(), bind() ? But I'm not sure why to put the kvm_gmem_get_pfn() relying on synchronize_srcu_expedited() in kvm_swap_active_memslots() in the comment of release(). Without the guard of kvm_gmem_get_file(), kvm_gmem_get_pfn() may need to provide some RCU read-critial section too for release() to wait by synchronize_srcu()? So * Since .release is called only when the reference count is zero, * after which file_ref_get() and get_file_active() fail, is still helpful? > + * > + * file_ref_put() provides a full barrier, and __get_file_rcu() the > + * matching acquire barrier, to ensure that kvm_gmem_get_file() (via > + * __get_file_rcu()) sees refcount==0 or fails the "file reloaded" > + * check (file != NULL due to nullifying the file pointer here). > + */ > xa_for_each(&f->bindings, index, slot) > WRITE_ONCE(slot->gmem.file, NULL); > > > base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272 > -- > 2.52.0.rc1.455.g30608eb744-goog > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2025-11-17 11:36 ` Yan Zhao @ 2026-06-03 0:40 ` Ackerley Tng 2026-06-03 11:42 ` Yan Zhao 2026-06-04 0:10 ` Sean Christopherson 1 sibling, 1 reply; 9+ messages in thread From: Ackerley Tng @ 2026-06-03 0:40 UTC (permalink / raw) To: Yan Zhao; +Cc: seanjc, pbonzini, kvm, linux-kernel, vannapurve Yan Zhao <yan.y.zhao@intel.com> writes: Coming here from https://lore.kernel.org/all/ahn6qysOGfa74A2E@google.com/. > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote: >> Add more context and information to the comment in kvm_gmem_release() that >> explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) >> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute >> from slot->gmem.file") >> >> b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot >> occur after the memslot is no longer visible to kvm_gmem_get_pfn(). >> >> is especially difficult to fully grok, particularly in light of commit >> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when >> gmem is dying"), which addressed a race between unbind() and release(). > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on > memslot deletion when gmem is dying"), unbind() and release() are mutually > exclusive, i.e., both protected by slots_lock, mentioning "race" here is > confusing. So, that commit addressed the mishandling in unbind() after > kvm_gmem_get_file() returning NULL? > I agree that the use of the word "race" here is odd since both unbind and release() are mutually exclusive and protected by slots_lock. Looking at ae431059e75d again, there are really 3 changes: 1. Refactoring out __kvm_gmem_unbind(), which is mostly about setting bindings in the xarray to NULL and setting slot->gmem.file to NULL, 2. Checking that if slot->gmem.file is NULL, skip unbinding 3. Checking that if kvm_gmem_get_file() returns NULL, just unbind without taking filemap_invalidate_lock(). (2.) is explained in the comment /* * Nothing to do if the underlying file was _already_ closed, as * kvm_gmem_release() invalidates and nullifies all bindings. */ The real fix is in (3.), I think? Could you explain how kvm_gmem_get_file() works? Why would the file be dying with the slots_lock making the unbind and release mutually exclusive? >> No functional change intended. >> >> Cc: Yan Zhao <yan.y.zhao@intel.com> >> Cc: Vishal Annapurve <vannapurve@google.com> >> Signed-off-by: Sean Christopherson <seanjc@google.com> >> --- >> virt/kvm/guest_memfd.c | 20 ++++++++++++++------ >> 1 file changed, 14 insertions(+), 6 deletions(-) >> >> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c >> index fdaea3422c30..2e09d7ec0cfc 100644 >> --- a/virt/kvm/guest_memfd.c >> +++ b/virt/kvm/guest_memfd.c >> @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) >> * dereferencing the slot for existing bindings needs to be protected >> * against memslot updates, specifically so that unbind doesn't race >> * and free the memslot (kvm_gmem_get_file() will return NULL). >> - * >> - * Since .release is called only when the reference count is zero, >> - * after which file_ref_get() and get_file_active() fail, >> - * kvm_gmem_get_pfn() cannot be using the file concurrently. >> - * file_ref_put() provides a full barrier, and get_file_active() the >> - * matching acquire barrier. >> */ > >> mutex_lock(&kvm->slots_lock); >> >> filemap_invalidate_lock(inode->i_mapping); >> >> + /* >> + * Note! synchronize_srcu() is _not_ needed after nullifying memslot >> + * bindings as slot->gmem.file cannot be set back to a non-null value >> + * without the memslot first being deleted. I.e. this relies on the >> + * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure >> + * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't >> + * grab a reference to slot->gmem.file even if the struct file object >> + * is reallocated. > Do you mean that > as kvm_gmem_get_pfn() can't find a stale slot, it can't grab reference to stale > slot->gmem.file, even if slot->gmem.file has been set to a different value, > i.e., after invoking unbind(), bind() ? > > But I'm not sure why to put the kvm_gmem_get_pfn() relying on > synchronize_srcu_expedited() in kvm_swap_active_memslots() in the comment of > release(). This is a good question, I think the comment can be improved to explain this. > Without the guard of kvm_gmem_get_file(), kvm_gmem_get_pfn() may need to > provide some RCU read-critial section too for release() to wait by > synchronize_srcu()? > > So > * Since .release is called only when the reference count is zero, > * after which file_ref_get() and get_file_active() fail, > is still helpful? > >> + * >> + * file_ref_put() provides a full barrier, and __get_file_rcu() the >> + * matching acquire barrier, to ensure that kvm_gmem_get_file() (via >> + * __get_file_rcu()) sees refcount==0 or fails the "file reloaded" >> + * check (file != NULL due to nullifying the file pointer here). >> + */ > >> xa_for_each(&f->bindings, index, slot) >> WRITE_ONCE(slot->gmem.file, NULL); >> >> >> base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272 >> -- >> 2.52.0.rc1.455.g30608eb744-goog >> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2026-06-03 0:40 ` Ackerley Tng @ 2026-06-03 11:42 ` Yan Zhao 2026-06-03 15:11 ` Ackerley Tng 0 siblings, 1 reply; 9+ messages in thread From: Yan Zhao @ 2026-06-03 11:42 UTC (permalink / raw) To: Ackerley Tng; +Cc: seanjc, pbonzini, kvm, linux-kernel, vannapurve On Tue, Jun 02, 2026 at 05:40:00PM -0700, Ackerley Tng wrote: > Yan Zhao <yan.y.zhao@intel.com> writes: > > Coming here from https://lore.kernel.org/all/ahn6qysOGfa74A2E@google.com/. > > > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote: > >> Add more context and information to the comment in kvm_gmem_release() that > >> explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) > >> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute > >> from slot->gmem.file") > >> > >> b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot > >> occur after the memslot is no longer visible to kvm_gmem_get_pfn(). > >> > >> is especially difficult to fully grok, particularly in light of commit > >> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when > >> gmem is dying"), which addressed a race between unbind() and release(). > > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on > > memslot deletion when gmem is dying"), unbind() and release() are mutually > > exclusive, i.e., both protected by slots_lock, mentioning "race" here is > > confusing. So, that commit addressed the mishandling in unbind() after > > kvm_gmem_get_file() returning NULL? > > > > I agree that the use of the word "race" here is odd since both unbind > and release() are mutually exclusive and protected by slots_lock. > > Looking at ae431059e75d again, there are really 3 changes: > > 1. Refactoring out __kvm_gmem_unbind(), which is mostly about setting > bindings in the xarray to NULL and setting slot->gmem.file to NULL, > 2. Checking that if slot->gmem.file is NULL, skip unbinding > 3. Checking that if kvm_gmem_get_file() returns NULL, just unbind > without taking filemap_invalidate_lock(). > > (2.) is explained in the comment > > /* > * Nothing to do if the underlying file was _already_ closed, as > * kvm_gmem_release() invalidates and nullifies all bindings. > */ > > The real fix is in (3.), I think? Could you explain how > kvm_gmem_get_file() works? Why would the file be dying with the > slots_lock making the unbind and release mutually exclusive? Below is my understanding: kvm_gmem_unbind() is invoked under slots_lock. kvm_gmem_release() holds slots_lock before it accesses any slot and sets slot->gmem.file to NULL. When kvm_gmem_get_file() returns NULL, it could be 1) kvm_gmem_release() is to be invoked soon. 2) kvm_gmem_release() is being invoked, before holding slots_lock. 3) kvm_gmem_release() is being invoked, holding slots_lock. 4) kvm_gmem_release() is being invoked, after releasing slots_lock. 5) kvm_gmem_release() has been invoked and completed. So, when kvm_gmem_unbind() finds slot->gmem.file != NULL while kvm_gmem_get_file() returns NULL, it could only be 1) or 2). And at that point, the remaining user of the slot->gmem.file should only be kvm_gmem_release(). For both 1) and 2), kvm_gmem_unbind() and kvm_gmem_release() can't operate on f->bindings simultaneously. kvm_gmem_unbind() needs to remove the slot from slot->gmem.file's binding. Otherwise, after kvm_gmem_unbind() returns, the slot will be deleted and kvm_gmem_release() will later hold slots_lock and cause use-after-free. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2026-06-03 11:42 ` Yan Zhao @ 2026-06-03 15:11 ` Ackerley Tng 0 siblings, 0 replies; 9+ messages in thread From: Ackerley Tng @ 2026-06-03 15:11 UTC (permalink / raw) To: Yan Zhao; +Cc: seanjc, pbonzini, kvm, linux-kernel, vannapurve Yan Zhao <yan.y.zhao@intel.com> writes: > On Tue, Jun 02, 2026 at 05:40:00PM -0700, Ackerley Tng wrote: >> Yan Zhao <yan.y.zhao@intel.com> writes: >> >> Coming here from https://lore.kernel.org/all/ahn6qysOGfa74A2E@google.com/. >> >> > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote: >> >> Add more context and information to the comment in kvm_gmem_release() that >> >> explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) >> >> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute >> >> from slot->gmem.file") >> >> >> >> b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot >> >> occur after the memslot is no longer visible to kvm_gmem_get_pfn(). >> >> >> >> is especially difficult to fully grok, particularly in light of commit >> >> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when >> >> gmem is dying"), which addressed a race between unbind() and release(). >> > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on >> > memslot deletion when gmem is dying"), unbind() and release() are mutually >> > exclusive, i.e., both protected by slots_lock, mentioning "race" here is >> > confusing. So, that commit addressed the mishandling in unbind() after >> > kvm_gmem_get_file() returning NULL? >> > >> >> I agree that the use of the word "race" here is odd since both unbind >> and release() are mutually exclusive and protected by slots_lock. >> >> Looking at ae431059e75d again, there are really 3 changes: >> >> 1. Refactoring out __kvm_gmem_unbind(), which is mostly about setting >> bindings in the xarray to NULL and setting slot->gmem.file to NULL, >> 2. Checking that if slot->gmem.file is NULL, skip unbinding >> 3. Checking that if kvm_gmem_get_file() returns NULL, just unbind >> without taking filemap_invalidate_lock(). >> >> (2.) is explained in the comment >> >> /* >> * Nothing to do if the underlying file was _already_ closed, as >> * kvm_gmem_release() invalidates and nullifies all bindings. >> */ >> >> The real fix is in (3.), I think? Could you explain how >> kvm_gmem_get_file() works? Why would the file be dying with the >> slots_lock making the unbind and release mutually exclusive? > Below is my understanding: > Thanks for explaining! > kvm_gmem_unbind() is invoked under slots_lock. > kvm_gmem_release() holds slots_lock before it accesses any slot and sets > slot->gmem.file to NULL. > > When kvm_gmem_get_file() returns NULL, it could be > 1) kvm_gmem_release() is to be invoked soon. > 2) kvm_gmem_release() is being invoked, before holding slots_lock. > 3) kvm_gmem_release() is being invoked, holding slots_lock. > 4) kvm_gmem_release() is being invoked, after releasing slots_lock. > 5) kvm_gmem_release() has been invoked and completed. > > So, when kvm_gmem_unbind() finds slot->gmem.file != NULL while > kvm_gmem_get_file() returns NULL, it could only be 1) or 2). And at To clarify, 3, 4, 5 are not possible because kvm_gmem_unbind() is called with slots_lock held, right? > that point, > the remaining user of the slot->gmem.file should only be kvm_gmem_release(). > > For both 1) and 2), kvm_gmem_unbind() and kvm_gmem_release() can't operate on > f->bindings simultaneously. kvm_gmem_unbind() needs to remove the slot from > slot->gmem.file's binding. Otherwise, after kvm_gmem_unbind() returns, the slot > will be deleted and kvm_gmem_release() will later hold slots_lock and cause > use-after-free. Thanks, this makes sense. The purpose of using kvm_gmem_get_file() is that if the file is _being_ closed, as stated in the comment in kvm_gmem_unbind(), kvm_gmem_get_file() would find the refcount to be 0 and return NULL. (The missing part for me was that _being_ closed leads to kvm_gmem_get_file() finding a 0 refcount.) Annotating all of these: > void kvm_gmem_unbind(struct kvm_memory_slot *slot) > { > /* > * Nothing to do if the underlying file was _already_ closed, as > * kvm_gmem_release() invalidates and nullifies all bindings. > */ > Above comment rephrased: !slot->gmem.file means kvm_gmem_release() had already taken slots_lock, iterated all bindings and set slot->gmem.file to NULL, and dropped slots_lock. > > if (!slot->gmem.file) > return; > If we get here, we know for sure that slot->gmem.file is valid (not freed) and can be dereferenced safely. We know this because kvm_gmem_release() is the place that frees slot->gmem.file, and if there was any chance of slot->gmem.file being freed, slot->gmem.file would have been NULL-ed first. Hence we can also safely proceed to do filemap_invalidate_lock(slot->gmem.file->f_mapping), but as described in ae431059e75d, file->f_mapping is not under KVM's purview. A future change could NULL out file->f_mapping before .release() is called, for example, and break the assumption that gmem.file->f_mapping is still valid. Hence, rely on taking a reference on gmem.file from slot instead, which is within KVM's control. > CLASS(gmem_get_file, file)(slot); > If we do get the reference, we're good and we can take filemap_invalidate_lock() and then unbind. > /* > * However, if the file is _being_ closed, then the bindings need to be If we don't get the reference, then the file is _being_ closed, so the file's refcount is already 0, but kvm_gmem_release() hasn't yet been called. Later, when kvm_gmem_release() is called, it will iterate bindings and then dereference slot. Hence, kvm_gmem_release() must not see the slot - the slot must be removed from bindings. Hence we must unbind. Unbinding without taking filemap_invalidate_lock() is fine since if !file, we're sure there's only one last user of file - kvm_gmem_release(), and kvm_gmem_release() is locked out by slots_lock. > * removed as kvm_gmem_release() might not run until after the memslot > * is freed. Note, modifying the bindings is safe even though the file > * is dying as kvm_gmem_release() nullifies slot->gmem.file under > * slots_lock, and only puts its reference to KVM after destroying all > * bindings. I.e. reaching this point means kvm_gmem_release() hasn't > * yet destroyed the bindings or freed the gmem_file, and can't do so > * until the caller drops slots_lock. > */ > if (!file) { > __kvm_gmem_unbind(slot, slot->gmem.file->private_data); > return; > } > If we did get the reference, this is the most straightforward case. Take the filemap_invalidate_lock(), unbind. > filemap_invalidate_lock(file->f_mapping); > __kvm_gmem_unbind(slot, file->private_data); > filemap_invalidate_unlock(file->f_mapping); > } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2025-11-17 11:36 ` Yan Zhao 2026-06-03 0:40 ` Ackerley Tng @ 2026-06-04 0:10 ` Sean Christopherson 2026-06-05 8:32 ` Yan Zhao 1 sibling, 1 reply; 9+ messages in thread From: Sean Christopherson @ 2026-06-04 0:10 UTC (permalink / raw) To: Yan Zhao; +Cc: Paolo Bonzini, kvm, linux-kernel, Vishal Annapurve On Mon, Nov 17, 2025, Yan Zhao wrote: > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote: > > Add more context and information to the comment in kvm_gmem_release() that > > explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) > > from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute > > from slot->gmem.file") > > > > b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot > > occur after the memslot is no longer visible to kvm_gmem_get_pfn(). > > > > is especially difficult to fully grok, particularly in light of commit > > ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when > > gmem is dying"), which addressed a race between unbind() and release(). > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on > memslot deletion when gmem is dying"), unbind() and release() are mutually > exclusive, i.e., both protected by slots_lock, mentioning "race" here is > confusing. So, that commit addressed the mishandling in unbind() after > kvm_gmem_get_file() returning NULL? Yes, that's the race. Just because two chunks of code are mutually exclusive doesn't mean there's no race, it just changes the granularity of the race, and/or how it manifests. I consider it a race because in the failing case, there's simply no way for release() and unbind() to run sequentially. I.e. the only way to encounter the bug was to run two operations concurrently, *and* for a specific ordering to occur. > > No functional change intended. > > > > Cc: Yan Zhao <yan.y.zhao@intel.com> > > Cc: Vishal Annapurve <vannapurve@google.com> > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > virt/kvm/guest_memfd.c | 20 ++++++++++++++------ > > 1 file changed, 14 insertions(+), 6 deletions(-) > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index fdaea3422c30..2e09d7ec0cfc 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) > > * dereferencing the slot for existing bindings needs to be protected > > * against memslot updates, specifically so that unbind doesn't race > > * and free the memslot (kvm_gmem_get_file() will return NULL). > > - * > > - * Since .release is called only when the reference count is zero, > > - * after which file_ref_get() and get_file_active() fail, > > - * kvm_gmem_get_pfn() cannot be using the file concurrently. > > - * file_ref_put() provides a full barrier, and get_file_active() the > > - * matching acquire barrier. > > */ > > > mutex_lock(&kvm->slots_lock); > > > > filemap_invalidate_lock(inode->i_mapping); > > > > + /* > > + * Note! synchronize_srcu() is _not_ needed after nullifying memslot > > + * bindings as slot->gmem.file cannot be set back to a non-null value > > + * without the memslot first being deleted. I.e. this relies on the > > + * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure > > + * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't > > + * grab a reference to slot->gmem.file even if the struct file object > > + * is reallocated. > Do you mean that as kvm_gmem_get_pfn() can't find a stale slot, it can't grab > reference to stale slot->gmem.file, even if slot->gmem.file has been set to a > different value, i.e., after invoking unbind(), bind() ? I was specifically trying to say this: 1. slot->gmem.file == old == synchronize_srcu() |===> window #1 for readers 2. slot->gmem.file == NULL == synchronize_srcu() |===> window #2 for readers 3. slot->gmem.file == new == kvm_gmem_get_pfn() can run in two "windows". In the window #1, it can see @old or NULL. In window #2, it can see NULL or @new. The synchronization ensures it can't grab a reference to @old once window #2 has been opened. > But I'm not sure why to put the kvm_gmem_get_pfn() relying on > synchronize_srcu_expedited() in kvm_swap_active_memslots() in the comment of > release(). Without the guard of kvm_gmem_get_file(), kvm_gmem_get_pfn() may > need to provide some RCU read-critial section too for release() to wait by > synchronize_srcu()? No, what I'm saying is that without synchronize_srcu() to create distinct window #1 and window #2 above, KVM would need this: diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c index a1cb72e66288..e66e431bdb98 100644 --- virt/kvm/guest_memfd.c +++ virt/kvm/guest_memfd.c @@ -347,6 +347,8 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) xa_for_each(&f->bindings, index, slot) WRITE_ONCE(slot->gmem.file, NULL); + synchronize_srcu(); + /* * All in-flight operations are gone and new bindings can be created. * Zap all SPTEs pointed at by this file. Do not free the backing Otherwise there is no guarantee in-flight readers can't reach the old slot->gmem.file. ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2026-06-04 0:10 ` Sean Christopherson @ 2026-06-05 8:32 ` Yan Zhao 0 siblings, 0 replies; 9+ messages in thread From: Yan Zhao @ 2026-06-05 8:32 UTC (permalink / raw) To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Vishal Annapurve On Wed, Jun 03, 2026 at 05:10:59PM -0700, Sean Christopherson wrote: > On Mon, Nov 17, 2025, Yan Zhao wrote: > > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote: > > > Add more context and information to the comment in kvm_gmem_release() that > > > explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) > > > from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute > > > from slot->gmem.file") > > > > > > b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot > > > occur after the memslot is no longer visible to kvm_gmem_get_pfn(). > > > > > > is especially difficult to fully grok, particularly in light of commit > > > ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when > > > gmem is dying"), which addressed a race between unbind() and release(). > > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on > > memslot deletion when gmem is dying"), unbind() and release() are mutually > > exclusive, i.e., both protected by slots_lock, mentioning "race" here is > > confusing. So, that commit addressed the mishandling in unbind() after > > kvm_gmem_get_file() returning NULL? > > Yes, that's the race. Just because two chunks of code are mutually exclusive > doesn't mean there's no race, it just changes the granularity of the race, and/or > how it manifests. > > I consider it a race because in the failing case, there's simply no way for > release() and unbind() to run sequentially. I.e. the only way to encounter the > bug was to run two operations concurrently, *and* for a specific ordering to > occur. Thanks for the explanation! > > > No functional change intended. > > > > > > Cc: Yan Zhao <yan.y.zhao@intel.com> > > > Cc: Vishal Annapurve <vannapurve@google.com> > > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > > --- > > > virt/kvm/guest_memfd.c | 20 ++++++++++++++------ > > > 1 file changed, 14 insertions(+), 6 deletions(-) > > > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > > index fdaea3422c30..2e09d7ec0cfc 100644 > > > --- a/virt/kvm/guest_memfd.c > > > +++ b/virt/kvm/guest_memfd.c > > > @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) > > > * dereferencing the slot for existing bindings needs to be protected > > > * against memslot updates, specifically so that unbind doesn't race > > > * and free the memslot (kvm_gmem_get_file() will return NULL). > > > - * > > > - * Since .release is called only when the reference count is zero, > > > - * after which file_ref_get() and get_file_active() fail, > > > - * kvm_gmem_get_pfn() cannot be using the file concurrently. > > > - * file_ref_put() provides a full barrier, and get_file_active() the > > > - * matching acquire barrier. > > > */ > > > > > mutex_lock(&kvm->slots_lock); > > > > > > filemap_invalidate_lock(inode->i_mapping); > > > > > > + /* > > > + * Note! synchronize_srcu() is _not_ needed after nullifying memslot > > > + * bindings as slot->gmem.file cannot be set back to a non-null value > > > + * without the memslot first being deleted. I.e. this relies on the > > > + * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure > > > + * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't > > > + * grab a reference to slot->gmem.file even if the struct file object > > > + * is reallocated. > > Do you mean that as kvm_gmem_get_pfn() can't find a stale slot, it can't grab > > reference to stale slot->gmem.file, even if slot->gmem.file has been set to a > > different value, i.e., after invoking unbind(), bind() ? > > I was specifically trying to say this: > > 1. slot->gmem.file == old == > synchronize_srcu() |===> window #1 for readers > 2. slot->gmem.file == NULL == > synchronize_srcu() |===> window #2 for readers > 3. slot->gmem.file == new == > > kvm_gmem_get_pfn() can run in two "windows". In the window #1, it can see @old > or NULL. In window #2, it can see NULL or @new. The synchronization ensures > it can't grab a reference to @old once window #2 has been opened. Thanks for the clarification. > > But I'm not sure why to put the kvm_gmem_get_pfn() relying on > > synchronize_srcu_expedited() in kvm_swap_active_memslots() in the comment of > > release(). Without the guard of kvm_gmem_get_file(), kvm_gmem_get_pfn() may > > need to provide some RCU read-critial section too for release() to wait by > > synchronize_srcu()? > > No, what I'm saying is that without synchronize_srcu() to create distinct > window #1 and window #2 above, KVM would need this: > > diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c > index a1cb72e66288..e66e431bdb98 100644 > --- virt/kvm/guest_memfd.c > +++ virt/kvm/guest_memfd.c > @@ -347,6 +347,8 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) > xa_for_each(&f->bindings, index, slot) > WRITE_ONCE(slot->gmem.file, NULL); > > + synchronize_srcu(); > + > /* > * All in-flight operations are gone and new bindings can be created. > * Zap all SPTEs pointed at by this file. Do not free the backing > > Otherwise there is no guarantee in-flight readers can't reach the old > slot->gmem.file. Hmm, however, if the reader (kvm_gmem_get_pfn()) has grabbed a reference to @old, kvm_gmem_release() can't come; if kvm_gmem_release() comes, kvm_gmem_get_pfn() should not have been able to successfully invoke get_file_active() on @old. After kvm_gmem_release() returns, any subsequent calls to get_file_active() on @old should fail and return NULL. So, even if the reader can see @old, we're still fine without synchronize_srcu(), as get_file_active() and SLAB_TYPESAFE_BY_RCU should have provide the safety. Am I missing something? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2025-11-13 23:22 [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF Sean Christopherson 2025-11-17 11:36 ` Yan Zhao @ 2026-06-03 15:37 ` Ackerley Tng 2026-06-03 23:29 ` Sean Christopherson 1 sibling, 1 reply; 9+ messages in thread From: Ackerley Tng @ 2026-06-03 15:37 UTC (permalink / raw) To: Sean Christopherson, Paolo Bonzini Cc: kvm, linux-kernel, Yan Zhao, Vishal Annapurve Sean Christopherson <seanjc@google.com> writes: > Add more context and information to the comment in kvm_gmem_release() that > explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) > from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute > from slot->gmem.file") > > b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot > occur after the memslot is no longer visible to kvm_gmem_get_pfn(). > > is especially difficult to fully grok, particularly in light of commit > ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when > gmem is dying"), which addressed a race between unbind() and release(). > > No functional change intended. > > Cc: Yan Zhao <yan.y.zhao@intel.com> > Cc: Vishal Annapurve <vannapurve@google.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > virt/kvm/guest_memfd.c | 20 ++++++++++++++------ > 1 file changed, 14 insertions(+), 6 deletions(-) > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index fdaea3422c30..2e09d7ec0cfc 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) > * dereferencing the slot for existing bindings needs to be protected > * against memslot updates, specifically so that unbind doesn't race > * and free the memslot (kvm_gmem_get_file() will return NULL). > - * > - * Since .release is called only when the reference count is zero, > - * after which file_ref_get() and get_file_active() fail, > - * kvm_gmem_get_pfn() cannot be using the file concurrently. > - * file_ref_put() provides a full barrier, and get_file_active() the > - * matching acquire barrier. > */ > mutex_lock(&kvm->slots_lock); > > filemap_invalidate_lock(inode->i_mapping); > > + /* > + * Note! synchronize_srcu() is _not_ needed after nullifying memslot > + * bindings as slot->gmem.file cannot be set back to a non-null value > + * without the memslot first being deleted. I.e. this relies on the > + * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure Is this kvm_swap_active_memslots() referring not to swapping out of the deleted memslot, but swapping in of a new memslot? > + * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't > + * grab a reference to slot->gmem.file even if the struct file object > + * is reallocated. > + * I think the part that's missing for me is: why is synchronize_srcu() required after NULLifying memslot bindings? Normally, synchronize_srcu() will be needed after memslot updates to make sure the next user of kvm->srcu gets the new state of the memslots? The "new state" here could be 1. slot->gmem.file == NULL 2. slot->gmem.file == some new file, set after the old memslot was replaced > + * file_ref_put() provides a full barrier, and __get_file_rcu() the > + * matching acquire barrier, to ensure that kvm_gmem_get_file() (via > + * __get_file_rcu()) sees refcount==0 or fails the "file reloaded" > + * check (file != NULL due to nullifying the file pointer here). In this case, kvm_gmem_get_pfn() uses kvm_gmem_get_file(), and because of the above, kvm_gmem_get_pfn() will not try to get a folio from a NULL file (new state 1). To handle new state 2, kvm_swap_active_memslots() to swap in a new memslot must have called synchronize_srcu(), so synchronize_srcu() can be skipped here. > + */ > xa_for_each(&f->bindings, index, slot) > WRITE_ONCE(slot->gmem.file, NULL); > > > base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272 > -- > 2.52.0.rc1.455.g30608eb744-goog ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF 2026-06-03 15:37 ` Ackerley Tng @ 2026-06-03 23:29 ` Sean Christopherson 0 siblings, 0 replies; 9+ messages in thread From: Sean Christopherson @ 2026-06-03 23:29 UTC (permalink / raw) To: Ackerley Tng; +Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Vishal Annapurve On Wed, Jun 03, 2026, Ackerley Tng wrote: > Sean Christopherson <seanjc@google.com> writes: > > > Add more context and information to the comment in kvm_gmem_release() that > > explains why there's no synchronization on RCU _or_ kvm->srcu. Point (b) > > from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute > > from slot->gmem.file") > > > > b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot > > occur after the memslot is no longer visible to kvm_gmem_get_pfn(). > > > > is especially difficult to fully grok, particularly in light of commit > > ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when > > gmem is dying"), which addressed a race between unbind() and release(). > > > > No functional change intended. > > > > Cc: Yan Zhao <yan.y.zhao@intel.com> > > Cc: Vishal Annapurve <vannapurve@google.com> > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > virt/kvm/guest_memfd.c | 20 ++++++++++++++------ > > 1 file changed, 14 insertions(+), 6 deletions(-) > > > > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > > index fdaea3422c30..2e09d7ec0cfc 100644 > > --- a/virt/kvm/guest_memfd.c > > +++ b/virt/kvm/guest_memfd.c > > @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file) > > * dereferencing the slot for existing bindings needs to be protected > > * against memslot updates, specifically so that unbind doesn't race > > * and free the memslot (kvm_gmem_get_file() will return NULL). > > - * > > - * Since .release is called only when the reference count is zero, > > - * after which file_ref_get() and get_file_active() fail, > > - * kvm_gmem_get_pfn() cannot be using the file concurrently. > > - * file_ref_put() provides a full barrier, and get_file_active() the > > - * matching acquire barrier. > > */ > > mutex_lock(&kvm->slots_lock); > > > > filemap_invalidate_lock(inode->i_mapping); > > > > + /* > > + * Note! synchronize_srcu() is _not_ needed after nullifying memslot > > + * bindings as slot->gmem.file cannot be set back to a non-null value > > + * without the memslot first being deleted. I.e. this relies on the > > + * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure > > Is this kvm_swap_active_memslots() referring not to swapping out of the > deleted memslot, but swapping in of a new memslot? No, it's the deletion that matters. When doing an (S)RCU-protect pointer swap, readers can see _either_ pointer (the old or the new). I.e. when creating a new memslot, synchronize_srcu_expedited() would only guarantee readers would see the new slot *after* synchronization completes. But that's not what we care about. What we care about is that an in-flight kvm_gmem_get_pfn() _can't_ reach the file for new, re-allocated memslot. That's guaranteed because deleting the old memslot waiting for readers to away, i.e. guarantees that any readers that saw a non-NULL slot->gmem.file saw the one associated with the old memslot. > > + * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't > > + * grab a reference to slot->gmem.file even if the struct file object > > + * is reallocated. > > + * > > I think the part that's missing for me is: why is synchronize_srcu() > required after NULLifying memslot bindings? Normally, synchronize_srcu() > will be needed after memslot updates to make sure the next user of > kvm->srcu gets the new state of the memslots? > > The "new state" here could be > > 1. slot->gmem.file == NULL > 2. slot->gmem.file == some new file, set after the old memslot was > replaced Only because there's a synchronize_srcu(). Without that, it could be: 1. slot->gmem.file == old == 2. slot->gmem.file == NULL |==> window for readers 3. slot->gmem.file == new == versus 1. slot->gmem.file == old == synchronize_srcu() |===> window for readers 2. slot->gmem.file == NULL == synchronize_srcu() |===> window #2 for readers 3. slot->gmem.file == new == ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-05 9:13 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-11-13 23:22 [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF Sean Christopherson 2025-11-17 11:36 ` Yan Zhao 2026-06-03 0:40 ` Ackerley Tng 2026-06-03 11:42 ` Yan Zhao 2026-06-03 15:11 ` Ackerley Tng 2026-06-04 0:10 ` Sean Christopherson 2026-06-05 8:32 ` Yan Zhao 2026-06-03 15:37 ` Ackerley Tng 2026-06-03 23:29 ` Sean Christopherson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox