Re: [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support

The Linux Kernel Mailing List
 help / color / mirror / Atom feed

* Re: [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support
       [not found] ` <177759840859.221039.13065406062747296947.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
@ 2026-05-12  8:42   ` David Hildenbrand (Arm)
  2026-05-12 16:18     ` Stanislav Kinsburskii
  0 siblings, 1 reply; 3+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-12  8:42 UTC (permalink / raw)
  To: Stanislav Kinsburskii, kys, Liam.Howlett, akpm, decui, haiyangz,
	jgg, corbet, leon, longli, ljs, mhocko, rppt, shuah, skhan,
	surenb, vbabka, wei.liu
  Cc: linux-doc, linux-hyperv, linux-kernel, linux-kselftest, linux-mm


> +	for (; addr < end; addr += PAGE_SIZE) {
> +		vm_fault_t ret;
> +
> +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> +
> +		if (ret & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)) {
> +			/*
> +			 * The mmap lock has been dropped by the fault handler.
> +			 * Record the failing address and signal lock-drop to
> +			 * the caller.
> +			 */
> +			*hmm_vma_walk->locked = 0;
> +			hmm_vma_walk->last = addr;
> +			return -EAGAIN;


Okay, so we'll return straight from hmm_vma_fault() to
hmm_vma_handle_pte()/hmm_vma_walk_pmd() -> walk_page_range() machinery.

Hopefully we don't refer to the MM/VMA on any path there? It would be nicer if
the hmm_vma_fault() could be called by the caller of walk_page_range(), but
that's tricky I guess, as hmm_vma_fault() consumes the walk structure and
requires the vma in there.


Note: am I wrong, or is hmm_vma_fault() really always called with
required_fault=true?

> +		}
> +
> +		if (ret & VM_FAULT_ERROR)
>  			return -EFAULT;
> +	}
>  	return -EBUSY;
>  }
>  
> @@ -566,6 +585,17 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
>  	if (required_fault) {
>  		int ret;
>  
> +		/*
> +		 * Faulting hugetlb pages on the unlockable path is not
> +		 * supported. The walk framework holds hugetlb_vma_lock_read
> +		 * which must be dropped before handle_mm_fault, but if the
> +		 * mmap lock is also dropped (VM_FAULT_RETRY), the vma may
> +		 * be freed and the walk framework's unconditional unlock
> +		 * becomes a use-after-free.
> +		 */
> +		if (hmm_vma_walk->locked)
> +			return -EFAULT;

Just because it's unlockable doesn't mean that you must unlock. Can't this be
kept working as is, just simulating here as if it would not be unlockable?


-- 
Cheers,

David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support
  2026-05-12  8:42   ` [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support David Hildenbrand (Arm)
@ 2026-05-12 16:18     ` Stanislav Kinsburskii
  2026-05-12 19:18       ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 3+ messages in thread
From: Stanislav Kinsburskii @ 2026-05-12 16:18 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: kys, Liam.Howlett, akpm, decui, haiyangz, jgg, corbet, leon,
	longli, ljs, mhocko, rppt, shuah, skhan, surenb, vbabka, wei.liu,
	linux-doc, linux-hyperv, linux-kernel, linux-kselftest, linux-mm

On Tue, May 12, 2026 at 10:42:14AM +0200, David Hildenbrand (Arm) wrote:
> 
> > +	for (; addr < end; addr += PAGE_SIZE) {
> > +		vm_fault_t ret;
> > +
> > +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> > +
> > +		if (ret & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)) {
> > +			/*
> > +			 * The mmap lock has been dropped by the fault handler.
> > +			 * Record the failing address and signal lock-drop to
> > +			 * the caller.
> > +			 */
> > +			*hmm_vma_walk->locked = 0;
> > +			hmm_vma_walk->last = addr;
> > +			return -EAGAIN;
> 
> 
> Okay, so we'll return straight from hmm_vma_fault() to
> hmm_vma_handle_pte()/hmm_vma_walk_pmd() -> walk_page_range() machinery.
> 
> Hopefully we don't refer to the MM/VMA on any path there? It would be nicer if
> the hmm_vma_fault() could be called by the caller of walk_page_range(), but
> that's tricky I guess, as hmm_vma_fault() consumes the walk structure and
> requires the vma in there.
> 

It looks like a caller can provide a post_vma callback in mm_walk_ops. I
missed that case here. This callback cannot be supported by this change.
I will update the patch.

> 
> Note: am I wrong, or is hmm_vma_fault() really always called with
> required_fault=true?
> 

No, hmm_pte_need_fault can return false.

> > +		}
> > +
> > +		if (ret & VM_FAULT_ERROR)
> >  			return -EFAULT;
> > +	}
> >  	return -EBUSY;
> >  }
> >  
> > @@ -566,6 +585,17 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
> >  	if (required_fault) {
> >  		int ret;
> >  
> > +		/*
> > +		 * Faulting hugetlb pages on the unlockable path is not
> > +		 * supported. The walk framework holds hugetlb_vma_lock_read
> > +		 * which must be dropped before handle_mm_fault, but if the
> > +		 * mmap lock is also dropped (VM_FAULT_RETRY), the vma may
> > +		 * be freed and the walk framework's unconditional unlock
> > +		 * becomes a use-after-free.
> > +		 */
> > +		if (hmm_vma_walk->locked)
> > +			return -EFAULT;
> 
> Just because it's unlockable doesn't mean that you must unlock. Can't this be
> kept working as is, just simulating here as if it would not be unlockable?
> 

I’m not sure how to implement this. The walk_page_range code expects the
hugetlb VMA to still be read-locked when we return from
hmm_vma_walk_hugetlb_entry. How can we guarantee that if the VMA might
be gone?

I added a note in the docs. Whoever tackles this will likely need to
either rework `walk_page_range` to handle the case where the VMA is
gone, or use a different approach.

Do you have any other suggestions on how to implement it?

Thanks,
Stanislav

> 
> -- 
> Cheers,
> 
> David

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support
  2026-05-12 16:18     ` Stanislav Kinsburskii
@ 2026-05-12 19:18       ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 3+ messages in thread
From: David Hildenbrand (Arm) @ 2026-05-12 19:18 UTC (permalink / raw)
  To: Stanislav Kinsburskii
  Cc: kys, Liam.Howlett, akpm, decui, haiyangz, jgg, corbet, leon,
	longli, ljs, mhocko, rppt, shuah, skhan, surenb, vbabka, wei.liu,
	linux-doc, linux-hyperv, linux-kernel, linux-kselftest, linux-mm

On 5/12/26 18:18, Stanislav Kinsburskii wrote:
> On Tue, May 12, 2026 at 10:42:14AM +0200, David Hildenbrand (Arm) wrote:
>>
>>> +	for (; addr < end; addr += PAGE_SIZE) {
>>> +		vm_fault_t ret;
>>> +
>>> +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
>>> +
>>> +		if (ret & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)) {
>>> +			/*
>>> +			 * The mmap lock has been dropped by the fault handler.
>>> +			 * Record the failing address and signal lock-drop to
>>> +			 * the caller.
>>> +			 */
>>> +			*hmm_vma_walk->locked = 0;
>>> +			hmm_vma_walk->last = addr;
>>> +			return -EAGAIN;
>>
>>
>> Okay, so we'll return straight from hmm_vma_fault() to
>> hmm_vma_handle_pte()/hmm_vma_walk_pmd() -> walk_page_range() machinery.
>>
>> Hopefully we don't refer to the MM/VMA on any path there? It would be nicer if
>> the hmm_vma_fault() could be called by the caller of walk_page_range(), but
>> that's tricky I guess, as hmm_vma_fault() consumes the walk structure and
>> requires the vma in there.
>>
> 
> It looks like a caller can provide a post_vma callback in mm_walk_ops. I
> missed that case here. This callback cannot be supported by this change.
> I will update the patch.
> 
>>
>> Note: am I wrong, or is hmm_vma_fault() really always called with
>> required_fault=true?
>>
> 
> No, hmm_pte_need_fault can return false.

That's not what I mean. Looks like all paths leading to hmm_vma_fault() have
required_fault = true;

IOW, there is always a "if (required_fault)" before it one way or the other.

Ah, and there even is a "WARN_ON_ONCE(!required_fault)" in the function. What an
odd thing to do :)

> 
>>> +		}
>>> +
>>> +		if (ret & VM_FAULT_ERROR)
>>>  			return -EFAULT;
>>> +	}
>>>  	return -EBUSY;
>>>  }
>>>  
>>> @@ -566,6 +585,17 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
>>>  	if (required_fault) {
>>>  		int ret;
>>>  
>>> +		/*
>>> +		 * Faulting hugetlb pages on the unlockable path is not
>>> +		 * supported. The walk framework holds hugetlb_vma_lock_read
>>> +		 * which must be dropped before handle_mm_fault, but if the
>>> +		 * mmap lock is also dropped (VM_FAULT_RETRY), the vma may
>>> +		 * be freed and the walk framework's unconditional unlock
>>> +		 * becomes a use-after-free.
>>> +		 */
>>> +		if (hmm_vma_walk->locked)
>>> +			return -EFAULT;
>>
>> Just because it's unlockable doesn't mean that you must unlock. Can't this be
>> kept working as is, just simulating here as if it would not be unlockable?
>>
> 
> I’m not sure how to implement this. The walk_page_range code expects the
> hugetlb VMA to still be read-locked when we return from
> hmm_vma_walk_hugetlb_entry. How can we guarantee that if the VMA might
> be gone?
> 
> I added a note in the docs. Whoever tackles this will likely need to
> either rework `walk_page_range` to handle the case where the VMA is
> gone, or use a different approach.
> 
> Do you have any other suggestions on how to implement it?

You just want hmm_vma_fault() to not set
"FAULT_FLAG_ALLOW_RETRY·|·FAULT_FLAG_KILLABLE".

The hacky way could be:

diff --git a/mm/hmm.c b/mm/hmm.c
index 5955f2f0c83d..83dba990e10a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -564,6 +564,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned
long hmask,
        required_fault =
                hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
        if (required_fault) {
+               int *saved_locked = hmm_vma_walk->locked;
                int ret;

                spin_unlock(ptl);
@@ -576,7 +577,9 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned
long hmask,
                 * use here of either pte or ptl after dropping the vma
                 * lock.
                 */
+               hmm_vma_walk->locked = NULL;
                ret = hmm_vma_fault(addr, end, required_fault, walk);
+               hmm_vma_walk->locked = saved_locked;
                hugetlb_vma_lock_read(vma);
                return ret;
        }

But really, I think we should just try to get uffd support working properly, not
excluding hugetlb.

GUP achieves it properly by performing the fault handling outside of page table
walking context ... essentially what I described in my first comment above:
return the information to the caller and let it just trigger the fault.

The issue here is that we trigger a fault out of walk_hugetlb_range() where we
still hold locks, resulting in this questionable hugetlb_vma_unlock_read +
hugetlb_vma_lock_read pattern.

The fault should just be triggered from a place where we don't have to play with
hugetlb vma locks or be afraid that dropping the mmap lock causes other problems.


-- 
Cheers,

David

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-12 19:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <177759835313.221039.2807391868456411507.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
     [not found] ` <177759840859.221039.13065406062747296947.stgit@skinsburskii-cloud-desktop.internal.cloudapp.net>
2026-05-12  8:42   ` [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support David Hildenbrand (Arm)
2026-05-12 16:18     ` Stanislav Kinsburskii
2026-05-12 19:18       ` David Hildenbrand (Arm)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox