From: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: kys@microsoft.com, Liam.Howlett@oracle.com,
akpm@linux-foundation.org, decui@microsoft.com,
haiyangz@microsoft.com, jgg@ziepe.ca, corbet@lwn.net,
leon@kernel.org, longli@microsoft.com, ljs@kernel.org,
mhocko@suse.com, rppt@kernel.org, shuah@kernel.org,
skhan@linuxfoundation.org, surenb@google.com, vbabka@kernel.org,
wei.liu@kernel.org, linux-doc@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-kselftest@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support
Date: Tue, 12 May 2026 19:42:57 -0700 [thread overview]
Message-ID: <agPlMYFCrdD2WKYZ@skinsburskii.localdomain> (raw)
In-Reply-To: <f073a8d7-5761-4f7b-a5e5-c6aeae5fdc72@kernel.org>
On Tue, May 12, 2026 at 09:18:11PM +0200, David Hildenbrand (Arm) wrote:
> On 5/12/26 18:18, Stanislav Kinsburskii wrote:
> > On Tue, May 12, 2026 at 10:42:14AM +0200, David Hildenbrand (Arm) wrote:
> >>
> >>> + for (; addr < end; addr += PAGE_SIZE) {
> >>> + vm_fault_t ret;
> >>> +
> >>> + ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> >>> +
> >>> + if (ret & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)) {
> >>> + /*
> >>> + * The mmap lock has been dropped by the fault handler.
> >>> + * Record the failing address and signal lock-drop to
> >>> + * the caller.
> >>> + */
> >>> + *hmm_vma_walk->locked = 0;
> >>> + hmm_vma_walk->last = addr;
> >>> + return -EAGAIN;
> >>
> >>
> >> Okay, so we'll return straight from hmm_vma_fault() to
> >> hmm_vma_handle_pte()/hmm_vma_walk_pmd() -> walk_page_range() machinery.
> >>
> >> Hopefully we don't refer to the MM/VMA on any path there? It would be nicer if
> >> the hmm_vma_fault() could be called by the caller of walk_page_range(), but
> >> that's tricky I guess, as hmm_vma_fault() consumes the walk structure and
> >> requires the vma in there.
> >>
> >
> > It looks like a caller can provide a post_vma callback in mm_walk_ops. I
> > missed that case here. This callback cannot be supported by this change.
> > I will update the patch.
> >
> >>
> >> Note: am I wrong, or is hmm_vma_fault() really always called with
> >> required_fault=true?
> >>
> >
> > No, hmm_pte_need_fault can return false.
>
> That's not what I mean. Looks like all paths leading to hmm_vma_fault() have
> required_fault = true;
>
> IOW, there is always a "if (required_fault)" before it one way or the other.
>
> Ah, and there even is a "WARN_ON_ONCE(!required_fault)" in the function. What an
> odd thing to do :)
>
> >
> >>> + }
> >>> +
> >>> + if (ret & VM_FAULT_ERROR)
> >>> return -EFAULT;
> >>> + }
> >>> return -EBUSY;
> >>> }
> >>>
> >>> @@ -566,6 +585,17 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
> >>> if (required_fault) {
> >>> int ret;
> >>>
> >>> + /*
> >>> + * Faulting hugetlb pages on the unlockable path is not
> >>> + * supported. The walk framework holds hugetlb_vma_lock_read
> >>> + * which must be dropped before handle_mm_fault, but if the
> >>> + * mmap lock is also dropped (VM_FAULT_RETRY), the vma may
> >>> + * be freed and the walk framework's unconditional unlock
> >>> + * becomes a use-after-free.
> >>> + */
> >>> + if (hmm_vma_walk->locked)
> >>> + return -EFAULT;
> >>
> >> Just because it's unlockable doesn't mean that you must unlock. Can't this be
> >> kept working as is, just simulating here as if it would not be unlockable?
> >>
> >
> > I’m not sure how to implement this. The walk_page_range code expects the
> > hugetlb VMA to still be read-locked when we return from
> > hmm_vma_walk_hugetlb_entry. How can we guarantee that if the VMA might
> > be gone?
> >
> > I added a note in the docs. Whoever tackles this will likely need to
> > either rework `walk_page_range` to handle the case where the VMA is
> > gone, or use a different approach.
> >
> > Do you have any other suggestions on how to implement it?
>
> You just want hmm_vma_fault() to not set
> "FAULT_FLAG_ALLOW_RETRY·|·FAULT_FLAG_KILLABLE".
>
> The hacky way could be:
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 5955f2f0c83d..83dba990e10a 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -564,6 +564,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned
> long hmask,
> required_fault =
> hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags);
> if (required_fault) {
> + int *saved_locked = hmm_vma_walk->locked;
> int ret;
>
> spin_unlock(ptl);
> @@ -576,7 +577,9 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned
> long hmask,
> * use here of either pte or ptl after dropping the vma
> * lock.
> */
> + hmm_vma_walk->locked = NULL;
> ret = hmm_vma_fault(addr, end, required_fault, walk);
> + hmm_vma_walk->locked = saved_locked;
> hugetlb_vma_lock_read(vma);
> return ret;
> }
>
I see. AFAIU the outcome would be the same.
> But really, I think we should just try to get uffd support working properly, not
> excluding hugetlb.
>
> GUP achieves it properly by performing the fault handling outside of page table
> walking context ... essentially what I described in my first comment above:
> return the information to the caller and let it just trigger the fault.
>
> The issue here is that we trigger a fault out of walk_hugetlb_range() where we
> still hold locks, resulting in this questionable hugetlb_vma_unlock_read +
> hugetlb_vma_lock_read pattern.
>
Fair enough.
> The fault should just be triggered from a place where we don't have to play with
> hugetlb vma locks or be afraid that dropping the mmap lock causes other problems.
>
I reworked this part. Please take a look at v2.
Thanks,
Stanislav
>
> --
> Cheers,
>
> David
next prev parent reply other threads:[~2026-05-13 2:42 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-01 1:20 [PATCH 0/3] mm/hmm: Add mmap lock-drop support for userfaultfd-backed mappings Stanislav Kinsburskii
2026-05-01 1:20 ` [PATCH 1/3] mm/hmm: Add hmm_range_fault_unlockable() for mmap lock-drop support Stanislav Kinsburskii
2026-05-12 8:42 ` David Hildenbrand (Arm)
2026-05-12 16:18 ` Stanislav Kinsburskii
2026-05-12 19:18 ` David Hildenbrand (Arm)
2026-05-13 2:42 ` Stanislav Kinsburskii [this message]
2026-05-01 1:20 ` [PATCH 2/3] mshv: Use hmm_range_fault_unlockable() for userfaultfd support Stanislav Kinsburskii
2026-05-01 1:20 ` [PATCH 3/3] selftests/mm: Add userfaultfd test for HMM unlockable path Stanislav Kinsburskii
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agPlMYFCrdD2WKYZ@skinsburskii.localdomain \
--to=skinsburskii@linux.microsoft.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=corbet@lwn.net \
--cc=david@kernel.org \
--cc=decui@microsoft.com \
--cc=haiyangz@microsoft.com \
--cc=jgg@ziepe.ca \
--cc=kys@microsoft.com \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=longli@microsoft.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=shuah@kernel.org \
--cc=skhan@linuxfoundation.org \
--cc=surenb@google.com \
--cc=vbabka@kernel.org \
--cc=wei.liu@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox