The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
@ 2025-11-13 23:22 Sean Christopherson
  2025-11-17 11:36 ` Yan Zhao
  2026-06-03 15:37 ` Ackerley Tng
  0 siblings, 2 replies; 8+ messages in thread
From: Sean Christopherson @ 2025-11-13 23:22 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm, linux-kernel, Yan Zhao, Vishal Annapurve,
	Sean Christopherson

Add more context and information to the comment in kvm_gmem_release() that
explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
from slot->gmem.file")

      b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
         occur after the memslot is no longer visible to kvm_gmem_get_pfn().

is especially difficult to fully grok, particularly in light of commit
ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
gmem is dying"), which addressed a race between unbind() and release().

No functional change intended.

Cc: Yan Zhao <yan.y.zhao@intel.com>
Cc: Vishal Annapurve <vannapurve@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 virt/kvm/guest_memfd.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index fdaea3422c30..2e09d7ec0cfc 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
 	 * dereferencing the slot for existing bindings needs to be protected
 	 * against memslot updates, specifically so that unbind doesn't race
 	 * and free the memslot (kvm_gmem_get_file() will return NULL).
-	 *
-	 * Since .release is called only when the reference count is zero,
-	 * after which file_ref_get() and get_file_active() fail,
-	 * kvm_gmem_get_pfn() cannot be using the file concurrently.
-	 * file_ref_put() provides a full barrier, and get_file_active() the
-	 * matching acquire barrier.
 	 */
 	mutex_lock(&kvm->slots_lock);
 
 	filemap_invalidate_lock(inode->i_mapping);
 
+	/*
+	 * Note!  synchronize_srcu() is _not_ needed after nullifying memslot
+	 * bindings as slot->gmem.file cannot be set back to a non-null value
+	 * without the memslot first being deleted.  I.e. this relies on the
+	 * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure
+	 * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't
+	 * grab a reference to slot->gmem.file even if the struct file object
+	 * is reallocated.
+	 *
+	 * file_ref_put() provides a full barrier, and __get_file_rcu() the
+	 * matching acquire barrier, to ensure that kvm_gmem_get_file() (via
+	 * __get_file_rcu()) sees refcount==0 or fails the "file reloaded"
+	 * check (file != NULL due to nullifying the file pointer here).
+	 */
 	xa_for_each(&f->bindings, index, slot)
 		WRITE_ONCE(slot->gmem.file, NULL);
 

base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272
-- 
2.52.0.rc1.455.g30608eb744-goog


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
  2025-11-13 23:22 [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF Sean Christopherson
@ 2025-11-17 11:36 ` Yan Zhao
  2026-06-03  0:40   ` Ackerley Tng
  2026-06-04  0:10   ` Sean Christopherson
  2026-06-03 15:37 ` Ackerley Tng
  1 sibling, 2 replies; 8+ messages in thread
From: Yan Zhao @ 2025-11-17 11:36 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: Paolo Bonzini, kvm, linux-kernel, Vishal Annapurve

On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote:
> Add more context and information to the comment in kvm_gmem_release() that
> explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
> from slot->gmem.file")
> 
>       b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
>          occur after the memslot is no longer visible to kvm_gmem_get_pfn().
> 
> is especially difficult to fully grok, particularly in light of commit
> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
> gmem is dying"), which addressed a race between unbind() and release().
As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on
memslot deletion when gmem is dying"), unbind() and release() are mutually
exclusive, i.e., both protected by slots_lock, mentioning "race" here is
confusing. So, that commit addressed the mishandling in unbind() after
kvm_gmem_get_file() returning NULL?

> No functional change intended.
> 
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  virt/kvm/guest_memfd.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index fdaea3422c30..2e09d7ec0cfc 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
>  	 * dereferencing the slot for existing bindings needs to be protected
>  	 * against memslot updates, specifically so that unbind doesn't race
>  	 * and free the memslot (kvm_gmem_get_file() will return NULL).
> -	 *
> -	 * Since .release is called only when the reference count is zero,
> -	 * after which file_ref_get() and get_file_active() fail,
> -	 * kvm_gmem_get_pfn() cannot be using the file concurrently.
> -	 * file_ref_put() provides a full barrier, and get_file_active() the
> -	 * matching acquire barrier.
>  	 */

>  	mutex_lock(&kvm->slots_lock);
>  
>  	filemap_invalidate_lock(inode->i_mapping);
> 
> +	/*
> +	 * Note!  synchronize_srcu() is _not_ needed after nullifying memslot
> +	 * bindings as slot->gmem.file cannot be set back to a non-null value
> +	 * without the memslot first being deleted.  I.e. this relies on the
> +	 * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure
> +	 * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't
> +	 * grab a reference to slot->gmem.file even if the struct file object
> +	 * is reallocated.
Do you mean that
as kvm_gmem_get_pfn() can't find a stale slot, it can't grab reference to stale
slot->gmem.file, even if slot->gmem.file has been set to a different value,
i.e., after invoking unbind(), bind() ?

But I'm not sure why to put the kvm_gmem_get_pfn() relying on
synchronize_srcu_expedited() in kvm_swap_active_memslots() in the comment of
release(). 
Without the guard of kvm_gmem_get_file(), kvm_gmem_get_pfn() may need to
provide some RCU read-critial section too for release() to wait by
synchronize_srcu()?

So
	 * Since .release is called only when the reference count is zero,
	 * after which file_ref_get() and get_file_active() fail,
is still helpful?

> +	 *
> +	 * file_ref_put() provides a full barrier, and __get_file_rcu() the
> +	 * matching acquire barrier, to ensure that kvm_gmem_get_file() (via
> +	 * __get_file_rcu()) sees refcount==0 or fails the "file reloaded"
> +	 * check (file != NULL due to nullifying the file pointer here).
> +	 */

>  	xa_for_each(&f->bindings, index, slot)
>  		WRITE_ONCE(slot->gmem.file, NULL);
>  
> 
> base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272
> -- 
> 2.52.0.rc1.455.g30608eb744-goog
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
  2025-11-17 11:36 ` Yan Zhao
@ 2026-06-03  0:40   ` Ackerley Tng
  2026-06-03 11:42     ` Yan Zhao
  2026-06-04  0:10   ` Sean Christopherson
  1 sibling, 1 reply; 8+ messages in thread
From: Ackerley Tng @ 2026-06-03  0:40 UTC (permalink / raw)
  To: Yan Zhao; +Cc: seanjc, pbonzini, kvm, linux-kernel, vannapurve

Yan Zhao <yan.y.zhao@intel.com> writes:

Coming here from https://lore.kernel.org/all/ahn6qysOGfa74A2E@google.com/.

> On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote:
>> Add more context and information to the comment in kvm_gmem_release() that
>> explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
>> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
>> from slot->gmem.file")
>>
>>       b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
>>          occur after the memslot is no longer visible to kvm_gmem_get_pfn().
>>
>> is especially difficult to fully grok, particularly in light of commit
>> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
>> gmem is dying"), which addressed a race between unbind() and release().
> As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on
> memslot deletion when gmem is dying"), unbind() and release() are mutually
> exclusive, i.e., both protected by slots_lock, mentioning "race" here is
> confusing. So, that commit addressed the mishandling in unbind() after
> kvm_gmem_get_file() returning NULL?
>

I agree that the use of the word "race" here is odd since both unbind
and release() are mutually exclusive and protected by slots_lock.

Looking at ae431059e75d again, there are really 3 changes:

1. Refactoring out __kvm_gmem_unbind(), which is mostly about setting
   bindings in the xarray to NULL and setting slot->gmem.file to NULL,
2. Checking that if slot->gmem.file is NULL, skip unbinding
3. Checking that if kvm_gmem_get_file() returns NULL, just unbind
   without taking filemap_invalidate_lock().

(2.) is explained in the comment

	/*
	 * Nothing to do if the underlying file was _already_ closed, as
	 * kvm_gmem_release() invalidates and nullifies all bindings.
	 */

The real fix is in (3.), I think? Could you explain how
kvm_gmem_get_file() works? Why would the file be dying with the
slots_lock making the unbind and release mutually exclusive?

>> No functional change intended.
>>
>> Cc: Yan Zhao <yan.y.zhao@intel.com>
>> Cc: Vishal Annapurve <vannapurve@google.com>
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>  virt/kvm/guest_memfd.c | 20 ++++++++++++++------
>>  1 file changed, 14 insertions(+), 6 deletions(-)
>>
>> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
>> index fdaea3422c30..2e09d7ec0cfc 100644
>> --- a/virt/kvm/guest_memfd.c
>> +++ b/virt/kvm/guest_memfd.c
>> @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
>>  	 * dereferencing the slot for existing bindings needs to be protected
>>  	 * against memslot updates, specifically so that unbind doesn't race
>>  	 * and free the memslot (kvm_gmem_get_file() will return NULL).
>> -	 *
>> -	 * Since .release is called only when the reference count is zero,
>> -	 * after which file_ref_get() and get_file_active() fail,
>> -	 * kvm_gmem_get_pfn() cannot be using the file concurrently.
>> -	 * file_ref_put() provides a full barrier, and get_file_active() the
>> -	 * matching acquire barrier.
>>  	 */
>
>>  	mutex_lock(&kvm->slots_lock);
>>
>>  	filemap_invalidate_lock(inode->i_mapping);
>>
>> +	/*
>> +	 * Note!  synchronize_srcu() is _not_ needed after nullifying memslot
>> +	 * bindings as slot->gmem.file cannot be set back to a non-null value
>> +	 * without the memslot first being deleted.  I.e. this relies on the
>> +	 * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure
>> +	 * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't
>> +	 * grab a reference to slot->gmem.file even if the struct file object
>> +	 * is reallocated.
> Do you mean that
> as kvm_gmem_get_pfn() can't find a stale slot, it can't grab reference to stale
> slot->gmem.file, even if slot->gmem.file has been set to a different value,
> i.e., after invoking unbind(), bind() ?
>
> But I'm not sure why to put the kvm_gmem_get_pfn() relying on
> synchronize_srcu_expedited() in kvm_swap_active_memslots() in the comment of
> release().

This is a good question, I think the comment can be improved to explain
this.

> Without the guard of kvm_gmem_get_file(), kvm_gmem_get_pfn() may need to
> provide some RCU read-critial section too for release() to wait by
> synchronize_srcu()?
>
> So
> 	 * Since .release is called only when the reference count is zero,
> 	 * after which file_ref_get() and get_file_active() fail,
> is still helpful?
>
>> +	 *
>> +	 * file_ref_put() provides a full barrier, and __get_file_rcu() the
>> +	 * matching acquire barrier, to ensure that kvm_gmem_get_file() (via
>> +	 * __get_file_rcu()) sees refcount==0 or fails the "file reloaded"
>> +	 * check (file != NULL due to nullifying the file pointer here).
>> +	 */
>
>>  	xa_for_each(&f->bindings, index, slot)
>>  		WRITE_ONCE(slot->gmem.file, NULL);
>>
>>
>> base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272
>> --
>> 2.52.0.rc1.455.g30608eb744-goog
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
  2026-06-03  0:40   ` Ackerley Tng
@ 2026-06-03 11:42     ` Yan Zhao
  2026-06-03 15:11       ` Ackerley Tng
  0 siblings, 1 reply; 8+ messages in thread
From: Yan Zhao @ 2026-06-03 11:42 UTC (permalink / raw)
  To: Ackerley Tng; +Cc: seanjc, pbonzini, kvm, linux-kernel, vannapurve

On Tue, Jun 02, 2026 at 05:40:00PM -0700, Ackerley Tng wrote:
> Yan Zhao <yan.y.zhao@intel.com> writes:
> 
> Coming here from https://lore.kernel.org/all/ahn6qysOGfa74A2E@google.com/.
> 
> > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote:
> >> Add more context and information to the comment in kvm_gmem_release() that
> >> explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
> >> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
> >> from slot->gmem.file")
> >>
> >>       b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
> >>          occur after the memslot is no longer visible to kvm_gmem_get_pfn().
> >>
> >> is especially difficult to fully grok, particularly in light of commit
> >> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
> >> gmem is dying"), which addressed a race between unbind() and release().
> > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on
> > memslot deletion when gmem is dying"), unbind() and release() are mutually
> > exclusive, i.e., both protected by slots_lock, mentioning "race" here is
> > confusing. So, that commit addressed the mishandling in unbind() after
> > kvm_gmem_get_file() returning NULL?
> >
> 
> I agree that the use of the word "race" here is odd since both unbind
> and release() are mutually exclusive and protected by slots_lock.
> 
> Looking at ae431059e75d again, there are really 3 changes:
> 
> 1. Refactoring out __kvm_gmem_unbind(), which is mostly about setting
>    bindings in the xarray to NULL and setting slot->gmem.file to NULL,
> 2. Checking that if slot->gmem.file is NULL, skip unbinding
> 3. Checking that if kvm_gmem_get_file() returns NULL, just unbind
>    without taking filemap_invalidate_lock().
> 
> (2.) is explained in the comment
> 
> 	/*
> 	 * Nothing to do if the underlying file was _already_ closed, as
> 	 * kvm_gmem_release() invalidates and nullifies all bindings.
> 	 */
> 
> The real fix is in (3.), I think? Could you explain how
> kvm_gmem_get_file() works? Why would the file be dying with the
> slots_lock making the unbind and release mutually exclusive?
Below is my understanding:

kvm_gmem_unbind() is invoked under slots_lock.
kvm_gmem_release() holds slots_lock before it accesses any slot and sets
slot->gmem.file to NULL.

When kvm_gmem_get_file() returns NULL, it could be
1) kvm_gmem_release() is to be invoked soon.
2) kvm_gmem_release() is being invoked, before holding slots_lock.
3) kvm_gmem_release() is being invoked, holding slots_lock.
4) kvm_gmem_release() is being invoked, after releasing slots_lock.
5) kvm_gmem_release() has been invoked and completed.

So, when kvm_gmem_unbind() finds slot->gmem.file != NULL while
kvm_gmem_get_file() returns NULL, it could only be 1) or 2). And at that point,
the remaining user of the slot->gmem.file should only be kvm_gmem_release().

For both 1) and 2), kvm_gmem_unbind() and kvm_gmem_release() can't operate on
f->bindings simultaneously. kvm_gmem_unbind() needs to remove the slot from
slot->gmem.file's binding. Otherwise, after kvm_gmem_unbind() returns, the slot
will be deleted and kvm_gmem_release() will later hold slots_lock and cause
use-after-free.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
  2026-06-03 11:42     ` Yan Zhao
@ 2026-06-03 15:11       ` Ackerley Tng
  0 siblings, 0 replies; 8+ messages in thread
From: Ackerley Tng @ 2026-06-03 15:11 UTC (permalink / raw)
  To: Yan Zhao; +Cc: seanjc, pbonzini, kvm, linux-kernel, vannapurve

Yan Zhao <yan.y.zhao@intel.com> writes:

> On Tue, Jun 02, 2026 at 05:40:00PM -0700, Ackerley Tng wrote:
>> Yan Zhao <yan.y.zhao@intel.com> writes:
>>
>> Coming here from https://lore.kernel.org/all/ahn6qysOGfa74A2E@google.com/.
>>
>> > On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote:
>> >> Add more context and information to the comment in kvm_gmem_release() that
>> >> explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
>> >> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
>> >> from slot->gmem.file")
>> >>
>> >>       b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
>> >>          occur after the memslot is no longer visible to kvm_gmem_get_pfn().
>> >>
>> >> is especially difficult to fully grok, particularly in light of commit
>> >> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
>> >> gmem is dying"), which addressed a race between unbind() and release().
>> > As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on
>> > memslot deletion when gmem is dying"), unbind() and release() are mutually
>> > exclusive, i.e., both protected by slots_lock, mentioning "race" here is
>> > confusing. So, that commit addressed the mishandling in unbind() after
>> > kvm_gmem_get_file() returning NULL?
>> >
>>
>> I agree that the use of the word "race" here is odd since both unbind
>> and release() are mutually exclusive and protected by slots_lock.
>>
>> Looking at ae431059e75d again, there are really 3 changes:
>>
>> 1. Refactoring out __kvm_gmem_unbind(), which is mostly about setting
>>    bindings in the xarray to NULL and setting slot->gmem.file to NULL,
>> 2. Checking that if slot->gmem.file is NULL, skip unbinding
>> 3. Checking that if kvm_gmem_get_file() returns NULL, just unbind
>>    without taking filemap_invalidate_lock().
>>
>> (2.) is explained in the comment
>>
>> 	/*
>> 	 * Nothing to do if the underlying file was _already_ closed, as
>> 	 * kvm_gmem_release() invalidates and nullifies all bindings.
>> 	 */
>>
>> The real fix is in (3.), I think? Could you explain how
>> kvm_gmem_get_file() works? Why would the file be dying with the
>> slots_lock making the unbind and release mutually exclusive?
> Below is my understanding:
>

Thanks for explaining!

> kvm_gmem_unbind() is invoked under slots_lock.
> kvm_gmem_release() holds slots_lock before it accesses any slot and sets
> slot->gmem.file to NULL.
>
> When kvm_gmem_get_file() returns NULL, it could be
> 1) kvm_gmem_release() is to be invoked soon.
> 2) kvm_gmem_release() is being invoked, before holding slots_lock.
> 3) kvm_gmem_release() is being invoked, holding slots_lock.
> 4) kvm_gmem_release() is being invoked, after releasing slots_lock.
> 5) kvm_gmem_release() has been invoked and completed.
>
> So, when kvm_gmem_unbind() finds slot->gmem.file != NULL while
> kvm_gmem_get_file() returns NULL, it could only be 1) or 2). And at

To clarify, 3, 4, 5 are not possible because kvm_gmem_unbind() is called
with slots_lock held, right?

> that point,
> the remaining user of the slot->gmem.file should only be kvm_gmem_release().
>
> For both 1) and 2), kvm_gmem_unbind() and kvm_gmem_release() can't operate on
> f->bindings simultaneously. kvm_gmem_unbind() needs to remove the slot from
> slot->gmem.file's binding. Otherwise, after kvm_gmem_unbind() returns, the slot
> will be deleted and kvm_gmem_release() will later hold slots_lock and cause
> use-after-free.

Thanks, this makes sense. The purpose of using kvm_gmem_get_file() is
that if the file is _being_ closed, as stated in the comment in
kvm_gmem_unbind(), kvm_gmem_get_file() would find the refcount to be 0
and return NULL. (The missing part for me was that _being_ closed leads
to kvm_gmem_get_file() finding a 0 refcount.)


Annotating all of these:

> void kvm_gmem_unbind(struct kvm_memory_slot *slot)
> {
> 	/*
> 	 * Nothing to do if the underlying file was _already_ closed, as
> 	 * kvm_gmem_release() invalidates and nullifies all bindings.
> 	 */
>

Above comment rephrased: !slot->gmem.file means kvm_gmem_release() had
already taken slots_lock, iterated all bindings and set slot->gmem.file
to NULL, and dropped slots_lock.

>
> 	if (!slot->gmem.file)
> 		return;
>

If we get here, we know for sure that slot->gmem.file is valid (not
freed) and can be dereferenced safely. We know this because
kvm_gmem_release() is the place that frees slot->gmem.file, and if there
was any chance of slot->gmem.file being freed, slot->gmem.file would
have been NULL-ed first.

Hence we can also safely proceed to do
filemap_invalidate_lock(slot->gmem.file->f_mapping), but as described in
ae431059e75d, file->f_mapping is not under KVM's purview. A future
change could NULL out file->f_mapping before .release() is called, for
example, and break the assumption that gmem.file->f_mapping is still
valid. Hence, rely on taking a reference on gmem.file from slot instead,
which is within KVM's control.

> 	CLASS(gmem_get_file, file)(slot);
>

If we do get the reference, we're good and we can take
filemap_invalidate_lock() and then unbind.

> 	/*
> 	 * However, if the file is _being_ closed, then the bindings need to be

If we don't get the reference, then the file is _being_ closed, so the
file's refcount is already 0, but kvm_gmem_release() hasn't yet been
called.

Later, when kvm_gmem_release() is called, it will iterate bindings and
then dereference slot. Hence, kvm_gmem_release() must not see the slot -
the slot must be removed from bindings. Hence we must unbind.

Unbinding without taking filemap_invalidate_lock() is fine since if
!file, we're sure there's only one last user of file -
kvm_gmem_release(), and kvm_gmem_release() is locked out by slots_lock.

> 	 * removed as kvm_gmem_release() might not run until after the memslot
> 	 * is freed.  Note, modifying the bindings is safe even though the file
> 	 * is dying as kvm_gmem_release() nullifies slot->gmem.file under
> 	 * slots_lock, and only puts its reference to KVM after destroying all
> 	 * bindings.  I.e. reaching this point means kvm_gmem_release() hasn't
> 	 * yet destroyed the bindings or freed the gmem_file, and can't do so
> 	 * until the caller drops slots_lock.
> 	 */
> 	if (!file) {
> 		__kvm_gmem_unbind(slot, slot->gmem.file->private_data);
> 		return;
> 	}
>

If we did get the reference, this is the most straightforward case. Take
the filemap_invalidate_lock(), unbind.

> 	filemap_invalidate_lock(file->f_mapping);
> 	__kvm_gmem_unbind(slot, file->private_data);
> 	filemap_invalidate_unlock(file->f_mapping);
> }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
  2025-11-13 23:22 [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF Sean Christopherson
  2025-11-17 11:36 ` Yan Zhao
@ 2026-06-03 15:37 ` Ackerley Tng
  2026-06-03 23:29   ` Sean Christopherson
  1 sibling, 1 reply; 8+ messages in thread
From: Ackerley Tng @ 2026-06-03 15:37 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: kvm, linux-kernel, Yan Zhao, Vishal Annapurve

Sean Christopherson <seanjc@google.com> writes:

> Add more context and information to the comment in kvm_gmem_release() that
> explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
> from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
> from slot->gmem.file")
>
>       b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
>          occur after the memslot is no longer visible to kvm_gmem_get_pfn().
>
> is especially difficult to fully grok, particularly in light of commit
> ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
> gmem is dying"), which addressed a race between unbind() and release().
>
> No functional change intended.
>
> Cc: Yan Zhao <yan.y.zhao@intel.com>
> Cc: Vishal Annapurve <vannapurve@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  virt/kvm/guest_memfd.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index fdaea3422c30..2e09d7ec0cfc 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
>  	 * dereferencing the slot for existing bindings needs to be protected
>  	 * against memslot updates, specifically so that unbind doesn't race
>  	 * and free the memslot (kvm_gmem_get_file() will return NULL).
> -	 *
> -	 * Since .release is called only when the reference count is zero,
> -	 * after which file_ref_get() and get_file_active() fail,
> -	 * kvm_gmem_get_pfn() cannot be using the file concurrently.
> -	 * file_ref_put() provides a full barrier, and get_file_active() the
> -	 * matching acquire barrier.
>  	 */
>  	mutex_lock(&kvm->slots_lock);
>
>  	filemap_invalidate_lock(inode->i_mapping);
>
> +	/*
> +	 * Note!  synchronize_srcu() is _not_ needed after nullifying memslot
> +	 * bindings as slot->gmem.file cannot be set back to a non-null value
> +	 * without the memslot first being deleted.  I.e. this relies on the
> +	 * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure

Is this kvm_swap_active_memslots() referring not to swapping out of the
deleted memslot, but swapping in of a new memslot?

> +	 * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't
> +	 * grab a reference to slot->gmem.file even if the struct file object
> +	 * is reallocated.
> +	 *

I think the part that's missing for me is: why is synchronize_srcu()
required after NULLifying memslot bindings? Normally, synchronize_srcu()
will be needed after memslot updates to make sure the next user of
kvm->srcu gets the new state of the memslots?

The "new state" here could be

1. slot->gmem.file == NULL
2. slot->gmem.file == some new file, set after the old memslot was
   replaced

> +	 * file_ref_put() provides a full barrier, and __get_file_rcu() the
> +	 * matching acquire barrier, to ensure that kvm_gmem_get_file() (via
> +	 * __get_file_rcu()) sees refcount==0 or fails the "file reloaded"
> +	 * check (file != NULL due to nullifying the file pointer here).

In this case, kvm_gmem_get_pfn() uses kvm_gmem_get_file(), and because
of the above, kvm_gmem_get_pfn() will not try to get a folio from a NULL
file (new state 1).

To handle new state 2, kvm_swap_active_memslots() to swap in a new
memslot must have called synchronize_srcu(), so synchronize_srcu() can
be skipped here.

> +	 */
>  	xa_for_each(&f->bindings, index, slot)
>  		WRITE_ONCE(slot->gmem.file, NULL);
>
>
> base-commit: 16ec4fb4ac95d878b879192d280db2baeec43272
> --
> 2.52.0.rc1.455.g30608eb744-goog

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
  2026-06-03 15:37 ` Ackerley Tng
@ 2026-06-03 23:29   ` Sean Christopherson
  0 siblings, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2026-06-03 23:29 UTC (permalink / raw)
  To: Ackerley Tng; +Cc: Paolo Bonzini, kvm, linux-kernel, Yan Zhao, Vishal Annapurve

On Wed, Jun 03, 2026, Ackerley Tng wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > Add more context and information to the comment in kvm_gmem_release() that
> > explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
> > from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
> > from slot->gmem.file")
> >
> >       b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
> >          occur after the memslot is no longer visible to kvm_gmem_get_pfn().
> >
> > is especially difficult to fully grok, particularly in light of commit
> > ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
> > gmem is dying"), which addressed a race between unbind() and release().
> >
> > No functional change intended.
> >
> > Cc: Yan Zhao <yan.y.zhao@intel.com>
> > Cc: Vishal Annapurve <vannapurve@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  virt/kvm/guest_memfd.c | 20 ++++++++++++++------
> >  1 file changed, 14 insertions(+), 6 deletions(-)
> >
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index fdaea3422c30..2e09d7ec0cfc 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
> >  	 * dereferencing the slot for existing bindings needs to be protected
> >  	 * against memslot updates, specifically so that unbind doesn't race
> >  	 * and free the memslot (kvm_gmem_get_file() will return NULL).
> > -	 *
> > -	 * Since .release is called only when the reference count is zero,
> > -	 * after which file_ref_get() and get_file_active() fail,
> > -	 * kvm_gmem_get_pfn() cannot be using the file concurrently.
> > -	 * file_ref_put() provides a full barrier, and get_file_active() the
> > -	 * matching acquire barrier.
> >  	 */
> >  	mutex_lock(&kvm->slots_lock);
> >
> >  	filemap_invalidate_lock(inode->i_mapping);
> >
> > +	/*
> > +	 * Note!  synchronize_srcu() is _not_ needed after nullifying memslot
> > +	 * bindings as slot->gmem.file cannot be set back to a non-null value
> > +	 * without the memslot first being deleted.  I.e. this relies on the
> > +	 * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure
> 
> Is this kvm_swap_active_memslots() referring not to swapping out of the
> deleted memslot, but swapping in of a new memslot?

No, it's the deletion that matters.  When doing an (S)RCU-protect pointer swap,
readers can see _either_ pointer (the old or the new).  I.e. when creating a new
memslot, synchronize_srcu_expedited() would only guarantee readers would see the
new slot *after* synchronization completes.

But that's not what we care about.  What we care about is that an in-flight
kvm_gmem_get_pfn() _can't_ reach the file for new, re-allocated memslot.  That's
guaranteed because deleting the old memslot waiting for readers to away, i.e.
guarantees that any readers that saw a non-NULL slot->gmem.file saw the one
associated with the old memslot.

> > +	 * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't
> > +	 * grab a reference to slot->gmem.file even if the struct file object
> > +	 * is reallocated.
> > +	 *
> 
> I think the part that's missing for me is: why is synchronize_srcu()
> required after NULLifying memslot bindings? Normally, synchronize_srcu()
> will be needed after memslot updates to make sure the next user of
> kvm->srcu gets the new state of the memslots?
> 
> The "new state" here could be
> 
> 1. slot->gmem.file == NULL
> 2. slot->gmem.file == some new file, set after the old memslot was
>    replaced

Only because there's a synchronize_srcu().  Without that, it could be:

 1. slot->gmem.file == old  ==
 2. slot->gmem.file == NULL  |==> window for readers
 3. slot->gmem.file == new  ==

versus

 1. slot->gmem.file == old  ==
    synchronize_srcu()       |===> window for readers
 2. slot->gmem.file == NULL ==
    synchronize_srcu()       |===> window #2 for readers
 3. slot->gmem.file == new  ==

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF
  2025-11-17 11:36 ` Yan Zhao
  2026-06-03  0:40   ` Ackerley Tng
@ 2026-06-04  0:10   ` Sean Christopherson
  1 sibling, 0 replies; 8+ messages in thread
From: Sean Christopherson @ 2026-06-04  0:10 UTC (permalink / raw)
  To: Yan Zhao; +Cc: Paolo Bonzini, kvm, linux-kernel, Vishal Annapurve

On Mon, Nov 17, 2025, Yan Zhao wrote:
> On Thu, Nov 13, 2025 at 03:22:29PM -0800, Sean Christopherson wrote:
> > Add more context and information to the comment in kvm_gmem_release() that
> > explains why there's no synchronization on RCU _or_ kvm->srcu.  Point (b)
> > from commit 67b43038ce14 ("KVM: guest_memfd: Remove RCU-protected attribute
> > from slot->gmem.file")
> > 
> >       b) kvm->srcu ensures that kvm_gmem_unbind() and freeing of a memslot
> >          occur after the memslot is no longer visible to kvm_gmem_get_pfn().
> > 
> > is especially difficult to fully grok, particularly in light of commit
> > ae431059e75d ("KVM: guest_memfd: Remove bindings on memslot deletion when
> > gmem is dying"), which addressed a race between unbind() and release().
> As mentioned in commit ae431059e75d ("KVM: guest_memfd: Remove bindings on
> memslot deletion when gmem is dying"), unbind() and release() are mutually
> exclusive, i.e., both protected by slots_lock, mentioning "race" here is
> confusing. So, that commit addressed the mishandling in unbind() after
> kvm_gmem_get_file() returning NULL?

Yes, that's the race.  Just because two chunks of code are mutually exclusive
doesn't mean there's no race, it just changes the granularity of the race, and/or
how it manifests.

I consider it a race because in the failing case, there's simply no way for
release() and unbind() to run sequentially.  I.e. the only way to encounter the
bug was to run two operations concurrently, *and* for a specific ordering to
occur.

> > No functional change intended.
> > 
> > Cc: Yan Zhao <yan.y.zhao@intel.com>
> > Cc: Vishal Annapurve <vannapurve@google.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  virt/kvm/guest_memfd.c | 20 ++++++++++++++------
> >  1 file changed, 14 insertions(+), 6 deletions(-)
> > 
> > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> > index fdaea3422c30..2e09d7ec0cfc 100644
> > --- a/virt/kvm/guest_memfd.c
> > +++ b/virt/kvm/guest_memfd.c
> > @@ -338,17 +338,25 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
> >  	 * dereferencing the slot for existing bindings needs to be protected
> >  	 * against memslot updates, specifically so that unbind doesn't race
> >  	 * and free the memslot (kvm_gmem_get_file() will return NULL).
> > -	 *
> > -	 * Since .release is called only when the reference count is zero,
> > -	 * after which file_ref_get() and get_file_active() fail,
> > -	 * kvm_gmem_get_pfn() cannot be using the file concurrently.
> > -	 * file_ref_put() provides a full barrier, and get_file_active() the
> > -	 * matching acquire barrier.
> >  	 */
> 
> >  	mutex_lock(&kvm->slots_lock);
> >  
> >  	filemap_invalidate_lock(inode->i_mapping);
> > 
> > +	/*
> > +	 * Note!  synchronize_srcu() is _not_ needed after nullifying memslot
> > +	 * bindings as slot->gmem.file cannot be set back to a non-null value
> > +	 * without the memslot first being deleted.  I.e. this relies on the
> > +	 * synchronize_srcu_expedited() in kvm_swap_active_memslots() to ensure
> > +	 * kvm_gmem_get_pfn() (which runs with kvm->srcu held for read) can't
> > +	 * grab a reference to slot->gmem.file even if the struct file object
> > +	 * is reallocated.
> Do you mean that as kvm_gmem_get_pfn() can't find a stale slot, it can't grab
> reference to stale slot->gmem.file, even if slot->gmem.file has been set to a
> different value, i.e., after invoking unbind(), bind() ?

I was specifically trying to say this:

 1. slot->gmem.file == old  ==
    synchronize_srcu()       |===> window #1 for readers
 2. slot->gmem.file == NULL ==
    synchronize_srcu()       |===> window #2 for readers
 3. slot->gmem.file == new  ==

kvm_gmem_get_pfn() can run in two "windows".  In the window #1, it can see @old
or NULL.  In window #2, it can see NULL or @new.  The synchronization ensures
it can't grab a reference to @old once window #2 has been opened.

> But I'm not sure why to put the kvm_gmem_get_pfn() relying on
> synchronize_srcu_expedited() in kvm_swap_active_memslots() in the comment of
> release().  Without the guard of kvm_gmem_get_file(), kvm_gmem_get_pfn() may
> need to provide some RCU read-critial section too for release() to wait by
> synchronize_srcu()?

No, what I'm saying is that without synchronize_srcu() to create distinct
window #1 and window #2 above, KVM would need this:

diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index a1cb72e66288..e66e431bdb98 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -347,6 +347,8 @@ static int kvm_gmem_release(struct inode *inode, struct file *file)
        xa_for_each(&f->bindings, index, slot)
                WRITE_ONCE(slot->gmem.file, NULL);
 
+       synchronize_srcu();
+
        /*
         * All in-flight operations are gone and new bindings can be created.
         * Zap all SPTEs pointed at by this file.  Do not free the backing

Otherwise there is no guarantee in-flight readers can't reach the old
slot->gmem.file.

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-06-04  0:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-13 23:22 [PATCH] KVM: guest_memfd: Elaborate on how release() vs. get_pfn() is safe against UAF Sean Christopherson
2025-11-17 11:36 ` Yan Zhao
2026-06-03  0:40   ` Ackerley Tng
2026-06-03 11:42     ` Yan Zhao
2026-06-03 15:11       ` Ackerley Tng
2026-06-04  0:10   ` Sean Christopherson
2026-06-03 15:37 ` Ackerley Tng
2026-06-03 23:29   ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox