[RFC PATCH] mm/hmm: Add userfaultfd support to fault handling

public inbox for linux-mm@kvack.org
 help / color / mirror / Atom feed

* [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling
@ 2026-03-31 22:24 Stanislav Kinsburskii
  2026-04-22  7:27 ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 5+ messages in thread
From: Stanislav Kinsburskii @ 2026-03-31 22:24 UTC (permalink / raw)
  To: jgg, leon, akpm; +Cc: linux-mm, linux-kernel

Add support for userfaultfd-enabled VMAs to the HMM framework.

Extract fault handling logic into hmm_handle_mm_fault() to handle both
regular and userfaultfd-backed mappings. The implementation follows
fixup_user_fault() for handling VM_FAULT_RETRY and VM_FAULT_COMPLETED, with
a key difference: instead of retrying or moving forward respectively,
return -EBUSY after reacquiring mmap_read_lock. Since the lock was
released, the VMA could have changed, so defer retry logic to the caller.

This approach is inefficient for userfaultfd-backed VMAs, as HMM can only
populate one page at a time, but keeps the framework simple by avoiding
complex retry logic within HMM itself.

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 mm/hmm.c |   40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index f6c4ddff4bd6..d04d68e21473 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -59,6 +59,35 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
 	return 0;
 }
 
+static int hmm_handle_mm_fault(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       unsigned int fault_flags)
+{
+	int ret;
+
+	if (userfaultfd_missing(vma)) {
+		struct mm_struct *mm = vma->vm_mm;
+
+		fault_flags |= FAULT_FLAG_ALLOW_RETRY |
+			       FAULT_FLAG_USER;
+
+		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
+
+		if (ret & (VM_FAULT_COMPLETED | VM_FAULT_RETRY)) {
+			mmap_read_lock(mm);
+			return -EBUSY;
+		}
+
+		if (ret & VM_FAULT_ERROR)
+			return vm_fault_to_errno(ret, 0);
+	} else {
+		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
+		if (ret & VM_FAULT_ERROR)
+			return vm_fault_to_errno(ret, 0);
+	}
+	return 0;
+}
+
 /*
  * hmm_vma_fault() - fault in a range lacking valid pmd or pte(s)
  * @addr: range virtual start address (inclusive)
@@ -86,10 +115,13 @@ static int hmm_vma_fault(unsigned long addr, unsigned long end,
 		fault_flags |= FAULT_FLAG_WRITE;
 	}
 
-	for (; addr < end; addr += PAGE_SIZE)
-		if (handle_mm_fault(vma, addr, fault_flags, NULL) &
-		    VM_FAULT_ERROR)
-			return -EFAULT;
+	for (; addr < end; addr += PAGE_SIZE) {
+		int ret;
+
+		ret = hmm_handle_mm_fault(vma, addr, fault_flags);
+		if (ret)
+			return ret;
+	}
 	return -EBUSY;
 }
 




^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling
  2026-03-31 22:24 [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling Stanislav Kinsburskii
@ 2026-04-22  7:27 ` David Hildenbrand (Arm)
  2026-04-22 14:59   ` Stanislav Kinsburskii
  0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-22  7:27 UTC (permalink / raw)
  To: Stanislav Kinsburskii, jgg, leon, akpm; +Cc: linux-mm, linux-kernel

On 4/1/26 00:24, Stanislav Kinsburskii wrote:
> Add support for userfaultfd-enabled VMAs to the HMM framework.
> 
> Extract fault handling logic into hmm_handle_mm_fault() to handle both
> regular and userfaultfd-backed mappings. The implementation follows
> fixup_user_fault() for handling VM_FAULT_RETRY and VM_FAULT_COMPLETED, with
> a key difference: instead of retrying or moving forward respectively,
> return -EBUSY after reacquiring mmap_read_lock. Since the lock was
> released, the VMA could have changed, so defer retry logic to the caller.
> 
> This approach is inefficient for userfaultfd-backed VMAs, as HMM can only
> populate one page at a time, but keeps the framework simple by avoiding
> complex retry logic within HMM itself.
> 
> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> ---
>  mm/hmm.c |   40 ++++++++++++++++++++++++++++++++++++----
>  1 file changed, 36 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/hmm.c b/mm/hmm.c
> index f6c4ddff4bd6..d04d68e21473 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -59,6 +59,35 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
>  	return 0;
>  }
>  
> +static int hmm_handle_mm_fault(struct vm_area_struct *vma,
> +			       unsigned long addr,
> +			       unsigned int fault_flags)
> +{
> +	int ret;
> +
> +	if (userfaultfd_missing(vma)) {
> +		struct mm_struct *mm = vma->vm_mm;
> +
> +		fault_flags |= FAULT_FLAG_ALLOW_RETRY |
> +			       FAULT_FLAG_USER;
> +
> +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> +
> +		if (ret & (VM_FAULT_COMPLETED | VM_FAULT_RETRY)) {
> +			mmap_read_lock(mm);
> +			return -EBUSY;
> +		}
> +
> +		if (ret & VM_FAULT_ERROR)
> +			return vm_fault_to_errno(ret, 0);
> +	} else {
> +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> +		if (ret & VM_FAULT_ERROR)
> +			return vm_fault_to_errno(ret, 0);
> +	}

I'm surprised that there is userfaultfd_missing() logic required here at
all.

What prevents us from always calling handle_mm_fault() in a way +
handling return values, such that we will just do the right thing
independent of userfaultfd_missing()?

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling
  2026-04-22  7:27 ` David Hildenbrand (Arm)
@ 2026-04-22 14:59   ` Stanislav Kinsburskii
  2026-04-22 18:14     ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 5+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-22 14:59 UTC (permalink / raw)
  To: David Hildenbrand (Arm); +Cc: jgg, leon, akpm, linux-mm, linux-kernel

On Wed, Apr 22, 2026 at 09:27:44AM +0200, David Hildenbrand (Arm) wrote:
> On 4/1/26 00:24, Stanislav Kinsburskii wrote:
> > Add support for userfaultfd-enabled VMAs to the HMM framework.
> > 
> > Extract fault handling logic into hmm_handle_mm_fault() to handle both
> > regular and userfaultfd-backed mappings. The implementation follows
> > fixup_user_fault() for handling VM_FAULT_RETRY and VM_FAULT_COMPLETED, with
> > a key difference: instead of retrying or moving forward respectively,
> > return -EBUSY after reacquiring mmap_read_lock. Since the lock was
> > released, the VMA could have changed, so defer retry logic to the caller.
> > 
> > This approach is inefficient for userfaultfd-backed VMAs, as HMM can only
> > populate one page at a time, but keeps the framework simple by avoiding
> > complex retry logic within HMM itself.
> > 
> > Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
> > ---
> >  mm/hmm.c |   40 ++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 36 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index f6c4ddff4bd6..d04d68e21473 100644
> > --- a/mm/hmm.c
> > +++ b/mm/hmm.c
> > @@ -59,6 +59,35 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
> >  	return 0;
> >  }
> >  
> > +static int hmm_handle_mm_fault(struct vm_area_struct *vma,
> > +			       unsigned long addr,
> > +			       unsigned int fault_flags)
> > +{
> > +	int ret;
> > +
> > +	if (userfaultfd_missing(vma)) {
> > +		struct mm_struct *mm = vma->vm_mm;
> > +
> > +		fault_flags |= FAULT_FLAG_ALLOW_RETRY |
> > +			       FAULT_FLAG_USER;
> > +
> > +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> > +
> > +		if (ret & (VM_FAULT_COMPLETED | VM_FAULT_RETRY)) {
> > +			mmap_read_lock(mm);
> > +			return -EBUSY;
> > +		}
> > +
> > +		if (ret & VM_FAULT_ERROR)
> > +			return vm_fault_to_errno(ret, 0);
> > +	} else {
> > +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> > +		if (ret & VM_FAULT_ERROR)
> > +			return vm_fault_to_errno(ret, 0);
> > +	}
> 
> I'm surprised that there is userfaultfd_missing() logic required here at
> all.
> 
> What prevents us from always calling handle_mm_fault() in a way +
> handling return values, such that we will just do the right thing
> independent of userfaultfd_missing()?
> 

Essentially, the main issue with the current HMM framework is that
handle_mm_fault() with FAULT_FLAG_ALLOW_RETRY (needed for userfaultfd)
can drop the mm lock. The framework does not expect or support that yet.
That is why I had to add the branching.

The worst part is that handle_mm_fault() can unlock via
mm_read_unlock(), while svm_range_restore_work() takes the mm write
lock and this RFC patch doesn't address this problem.

The main question where I’d like feedback is: should userfaultfd be
supported in the HMM framework at all, given the complexity around
unlocking and retry handling (similar to other gup helpers) or should
this complexity be offloaded to the caller side?

If it should be suported, then I see two ways to implement this:

1. Introduce a new HMM_NEED_USER_FAULT flag to tell the framework that
   the caller needs userfaultfd support, and extend the framework
   accordingly. This is not intrusive to existing code. But if we want lazy
   migration for GPU state in the future (and we likely do), GPU drivers
   will have to be updated to set this flag later. The only real user of
   this feature right now is the MSHV driver and I'd extend it to set
   this flag right away and leave GPU HMM users for later.

2. Add retry logic to hmm_handle_mm_fault() to handle VM_FAULT_RETRY and
   VM_FAULT_COMPLETED, as required for userfaultfd support. This would be
   transparent to users. But it would require significant changes: mm
   relocking, VMA lookup, and other parts. Also, before doing that, the
   kfd_svm driver would need to be changed to downgrade the mm lock to
   read.

Thanks,
Stanislav

> -- 
> Cheers,
> 
> David

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling
  2026-04-22 14:59   ` Stanislav Kinsburskii
@ 2026-04-22 18:14     ` David Hildenbrand (Arm)
  0 siblings, 0 replies; 5+ messages in thread
From: David Hildenbrand (Arm) @ 2026-04-22 18:14 UTC (permalink / raw)
  To: Stanislav Kinsburskii; +Cc: jgg, leon, akpm, linux-mm, linux-kernel

On 4/22/26 16:59, Stanislav Kinsburskii wrote:
> On Wed, Apr 22, 2026 at 09:27:44AM +0200, David Hildenbrand (Arm) wrote:
>> On 4/1/26 00:24, Stanislav Kinsburskii wrote:
>>> Add support for userfaultfd-enabled VMAs to the HMM framework.
>>>
>>> Extract fault handling logic into hmm_handle_mm_fault() to handle both
>>> regular and userfaultfd-backed mappings. The implementation follows
>>> fixup_user_fault() for handling VM_FAULT_RETRY and VM_FAULT_COMPLETED, with
>>> a key difference: instead of retrying or moving forward respectively,
>>> return -EBUSY after reacquiring mmap_read_lock. Since the lock was
>>> released, the VMA could have changed, so defer retry logic to the caller.
>>>
>>> This approach is inefficient for userfaultfd-backed VMAs, as HMM can only
>>> populate one page at a time, but keeps the framework simple by avoiding
>>> complex retry logic within HMM itself.
>>>
>>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
>>> ---
>>>  mm/hmm.c |   40 ++++++++++++++++++++++++++++++++++++----
>>>  1 file changed, 36 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/hmm.c b/mm/hmm.c
>>> index f6c4ddff4bd6..d04d68e21473 100644
>>> --- a/mm/hmm.c
>>> +++ b/mm/hmm.c
>>> @@ -59,6 +59,35 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
>>>  	return 0;
>>>  }
>>>  
>>> +static int hmm_handle_mm_fault(struct vm_area_struct *vma,
>>> +			       unsigned long addr,
>>> +			       unsigned int fault_flags)
>>> +{
>>> +	int ret;
>>> +
>>> +	if (userfaultfd_missing(vma)) {
>>> +		struct mm_struct *mm = vma->vm_mm;
>>> +
>>> +		fault_flags |= FAULT_FLAG_ALLOW_RETRY |
>>> +			       FAULT_FLAG_USER;
>>> +
>>> +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
>>> +
>>> +		if (ret & (VM_FAULT_COMPLETED | VM_FAULT_RETRY)) {
>>> +			mmap_read_lock(mm);
>>> +			return -EBUSY;
>>> +		}
>>> +
>>> +		if (ret & VM_FAULT_ERROR)
>>> +			return vm_fault_to_errno(ret, 0);
>>> +	} else {
>>> +		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
>>> +		if (ret & VM_FAULT_ERROR)
>>> +			return vm_fault_to_errno(ret, 0);
>>> +	}
>>
>> I'm surprised that there is userfaultfd_missing() logic required here at
>> all.
>>
>> What prevents us from always calling handle_mm_fault() in a way +
>> handling return values, such that we will just do the right thing
>> independent of userfaultfd_missing()?
>>
> 
> Essentially, the main issue with the current HMM framework is that
> handle_mm_fault() with FAULT_FLAG_ALLOW_RETRY (needed for userfaultfd)
> can drop the mm lock. The framework does not expect or support that yet.
> That is why I had to add the branching.
> 
> The worst part is that handle_mm_fault() can unlock via
> mm_read_unlock(), while svm_range_restore_work() takes the mm write
> lock and this RFC patch doesn't address this problem.
> 
> The main question where I’d like feedback is: should userfaultfd be
> supported in the HMM framework at all, given the complexity around
> unlocking and retry handling (similar to other gup helpers) or should
> this complexity be offloaded to the caller side?
> 
> If it should be suported, then I see two ways to implement this:
> 
> 1. Introduce a new HMM_NEED_USER_FAULT flag to tell the framework that
>    the caller needs userfaultfd support, and extend the framework
>    accordingly. This is not intrusive to existing code. But if we want lazy
>    migration for GPU state in the future (and we likely do), GPU drivers
>    will have to be updated to set this flag later. The only real user of
>    this feature right now is the MSHV driver and I'd extend it to set
>    this flag right away and leave GPU HMM users for later.
> 
> 2. Add retry logic to hmm_handle_mm_fault() to handle VM_FAULT_RETRY and
>    VM_FAULT_COMPLETED, as required for userfaultfd support. This would be
>    transparent to users. But it would require significant changes: mm
>    relocking, VMA lookup, and other parts. Also, before doing that, the
>    kfd_svm driver would need to be changed to downgrade the mm lock to
>    read.

Exactly this. If we want this, it should be done the proper way, by teaching
calling code that the lock was dropped etc.

There should not be userfaultfd special-casing anywhere, it's just a matter of
teaching the code that we might get the mmap lock dropped.

Now, that might be a bit of work :)

What we do in mm/gup.c, is letting callers opt-in whether they can handle
getting the mmap lock dropped -- not opting in to userfaultfd.

See FOLL_UNLOCKABLE. Such an approach enables a step-wise implementation.
But note that FOLL_UNLOCKABLE is an internal flag. We set it automatically when
someone passes us a "locked" variable, indicating that they can deal with it.


See populate_vma_page_range() as one example.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling
@ 2026-04-02 16:37 Stanislav Kinsburskii
  0 siblings, 0 replies; 5+ messages in thread
From: Stanislav Kinsburskii @ 2026-04-02 16:37 UTC (permalink / raw)
  To: jgg, leon, akpm; +Cc: linux-mm, linux-kernel

Add support for userfaultfd-enabled VMAs to the HMM framework.

Extract fault handling logic into hmm_handle_mm_fault() to handle both
regular and userfaultfd-backed mappings. The implementation follows
fixup_user_fault() for handling VM_FAULT_RETRY and VM_FAULT_COMPLETED, with
a key difference: instead of retrying or moving forward respectively,
return -EBUSY after reacquiring mmap_read_lock. Since the lock was
released, the VMA could have changed, so defer retry logic to the caller.

This approach is inefficient for userfaultfd-backed VMAs, as HMM can only
populate one page at a time, but keeps the framework simple by avoiding
complex retry logic within HMM itself.

v2: Addressed sashiko review comments

Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
---
 mm/hmm.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index f6c4ddff4bd6..284032c4b2c8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -59,6 +59,53 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
 	return 0;
 }
 
+static int hmm_handle_mm_fault(struct vm_area_struct *vma,
+			       unsigned long addr,
+			       unsigned int fault_flags)
+{
+	int ret;
+
+	if (userfaultfd_armed(vma)) {
+		struct mm_struct *mm = vma->vm_mm;
+
+		/*
+		 * handle_mm_fault() with FAULT_FLAG_ALLOW_RETRY will
+		 * release the mmap lock via mmap_read_unlock() on
+		 * VM_FAULT_RETRY or VM_FAULT_COMPLETED.
+		 * If the caller holds the lock in write mode, the
+		 * mmap_read_unlock() call would corrupt the rwsem state.
+		 * Assert that we only hold a read lock here.
+		 */
+		lockdep_assert_held_read(&mm->mmap_lock);
+
+		fault_flags |= FAULT_FLAG_ALLOW_RETRY |
+			       FAULT_FLAG_USER |
+			       FAULT_FLAG_KILLABLE;
+
+		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
+
+		if (ret & VM_FAULT_COMPLETED) {
+			mmap_read_lock(mm);
+			return -EBUSY;
+		}
+
+		if (ret & VM_FAULT_RETRY) {
+			mmap_read_lock(mm);
+			if (fatal_signal_pending(current))
+				return -EINTR;
+			return -EBUSY;
+		}
+
+		if (ret & VM_FAULT_ERROR)
+			return vm_fault_to_errno(ret, 0);
+	} else {
+		ret = handle_mm_fault(vma, addr, fault_flags, NULL);
+		if (ret & VM_FAULT_ERROR)
+			return vm_fault_to_errno(ret, 0);
+	}
+	return 0;
+}
+
 /*
  * hmm_vma_fault() - fault in a range lacking valid pmd or pte(s)
  * @addr: range virtual start address (inclusive)
@@ -86,10 +133,13 @@ static int hmm_vma_fault(unsigned long addr, unsigned long end,
 		fault_flags |= FAULT_FLAG_WRITE;
 	}
 
-	for (; addr < end; addr += PAGE_SIZE)
-		if (handle_mm_fault(vma, addr, fault_flags, NULL) &
-		    VM_FAULT_ERROR)
-			return -EFAULT;
+	for (; addr < end; addr += PAGE_SIZE) {
+		int ret;
+
+		ret = hmm_handle_mm_fault(vma, addr, fault_flags);
+		if (ret)
+			return ret;
+	}
 	return -EBUSY;
 }
 




^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-04-22 18:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-31 22:24 [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling Stanislav Kinsburskii
2026-04-22  7:27 ` David Hildenbrand (Arm)
2026-04-22 14:59   ` Stanislav Kinsburskii
2026-04-22 18:14     ` David Hildenbrand (Arm)
  -- strict thread matches above, loose matches on Subject: below --
2026-04-02 16:37 Stanislav Kinsburskii

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox