From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Cc: kys@microsoft.com, Liam.Howlett@oracle.com,
akpm@linux-foundation.org, decui@microsoft.com,
haiyangz@microsoft.com, jgg@ziepe.ca, corbet@lwn.net,
leon@kernel.org, longli@microsoft.com, ljs@kernel.org,
mhocko@suse.com, rppt@kernel.org, shuah@kernel.org,
skhan@linuxfoundation.org, surenb@google.com, vbabka@kernel.org,
wei.liu@kernel.org, linux-doc@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support
Date: Tue, 12 May 2026 21:18:11 +0200 [thread overview]
Message-ID: <f073a8d7-5761-4f7b-a5e5-c6aeae5fdc72@kernel.org> (raw)
In-Reply-To: <agNS4llNtAHBkMA2@skinsburskii.localdomain>
On 5/12/26 18:18, Stanislav Kinsburskii wrote:
> On Tue, May 12, 2026 at 10:42:14AM +0200, David Hildenbrand (Arm) wrote:
>>
>>> + for (; addr < end; addr += PAGE_SIZE) {
>>> + vm_fault_t ret;
>>> +
>>> + ret = handle_mm_fault(vma, addr, fault_flags, NULL);
>>> +
>>> + if (ret & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)) {
>>> + /*
>>> + * The mmap lock has been dropped by the fault handler.
>>> + * Record the failing address and signal lock-drop to
>>> + * the caller.
>>> + */
>>> + *hmm_vma_walk->locked = 0;
>>> + hmm_vma_walk->last = addr;
>>> + return -EAGAIN;
>>
>>
>> Okay, so we'll return straight from hmm_vma_fault() to
>> hmm_vma_handle_pte()/hmm_vma_walk_pmd() -> walk_page_range() machinery.
>>
>> Hopefully we don't refer to the MM/VMA on any path there? It would be nicer if
>> the hmm_vma_fault() could be called by the caller of walk_page_range(), but
>> that's tricky I guess, as hmm_vma_fault() consumes the walk structure and
>> requires the vma in there.
>>
>
> It looks like a caller can provide a post_vma callback in mm_walk_ops. I
> missed that case here. This callback cannot be supported by this change.
> I will update the patch.
>
>>
>> Note: am I wrong, or is hmm_vma_fault() really always called with
>> required_fault=true?
>>
>
> No, hmm_pte_need_fault can return false.
That's not what I mean. Looks like all paths leading to hmm_vma_fault() have
required_fault = true;
IOW, there is always a "if (required_fault)" before it one way or the other.
Ah, and there even is a "WARN_ON_ONCE(!required_fault)" in the function. What an
odd thing to do :)
>
>>> + }
>>> +
>>> + if (ret & VM_FAULT_ERROR)
>>> return -EFAULT;
>>> + }
>>> return -EBUSY;
>>> }
>>>
>>> @@ -566,6 +585,17 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
>>> if (required_fault) {
>>> int ret;
>>>
>>> + /*
>>> + * Faulting hugetlb pages on the unlockable path is not
>>> + * supported. The walk framework holds hugetlb_vma_lock_read
>>> + * which must be dropped before handle_mm_fault, but if the
>>> + * mmap lock is also dropped (VM_FAULT_RETRY), the vma may
>>> + * be freed and the walk framework's unconditional unlock
>>> + * becomes a use-after-free.
>>> + */
>>> + if (hmm_vma_walk->locked)
>>> + return -EFAULT;
>>
>> Just because it's unlockable doesn't mean that you must unlock. Can't this be
>> kept working as is, just simulating here as if it would not be unlockable?
>>
>
> I’m not sure how to implement this. The walk_page_range code expects the
> hugetlb VMA to still be read-locked when we return from
> hmm_vma_walk_hugetlb_entry. How can we guarantee that if the VMA might
> be gone?
>
> I added a note in the docs. Whoever tackles this will likely need to
> either rework `walk_page_range` to handle the case where the VMA is
> gone, or use a different approach.
>
> Do you have any other suggestions on how to implement it?
You just want hmm_vma_fault() to not set
"FAULT_FLAG_ALLOW_RETRY·|·FAULT_FLAG_KILLABLE".
The hacky way could be:
diff --git a/mm/hmm.c b/mm/hmm.c
index 5955f2f0c83d..83dba990e10a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -564,6 +564,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned
long hmask,
required_fault =
hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
if (required_fault) {
+ int *saved_locked = hmm_vma_walk->locked;
int ret;
spin_unlock(ptl);
@@ -576,7 +577,9 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned
long hmask,
* use here of either pte or ptl after dropping the vma
* lock.
*/
+ hmm_vma_walk->locked = NULL;
ret = hmm_vma_fault(addr, end, required_fault, walk);
+ hmm_vma_walk->locked = saved_locked;
hugetlb_vma_lock_read(vma);
return ret;
}
But really, I think we should just try to get uffd support working properly, not
excluding hugetlb.
GUP achieves it properly by performing the fault handling outside of page table
walking context ... essentially what I described in my first comment above:
return the information to the caller and let it just trigger the fault.
The issue here is that we trigger a fault out of walk_hugetlb_range() where we
still hold locks, resulting in this questionable hugetlb_vma_unlock_read +
hugetlb_vma_lock_read pattern.
The fault should just be triggered from a place where we don't have to play with
hugetlb vma locks or be afraid that dropping the mmap lock causes other problems.
--
Cheers,
David
next prev parent reply other threads:[~2026-05-12 19:18 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 1:20 [PATCH 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings Stanislav Kinsburskii
2026-05-01 1:20 ` [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support Stanislav Kinsburskii
2026-05-12 8:42 ` David Hildenbrand (Arm)
2026-05-12 16:18 ` Stanislav Kinsburskii
2026-05-12 19:18 ` David Hildenbrand (Arm) [this message]
2026-05-01 1:20 ` [PATCH 2/3] mshv: Use hmm_range_fault_unlockable() for userfaultfd support Stanislav Kinsburskii
2026-05-01 1:20 ` [PATCH 3/3] selftests/mm: Add userfaultfd test for HMM unlockable path Stanislav Kinsburskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f073a8d7-5761-4f7b-a5e5-c6aeae5fdc72@kernel.org \
--to=david@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=jgg@ziepe.ca \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=longli@microsoft.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=skinsburskii@linux.microsoft.com \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox