From: "Harry Yoo (Oracle)" <harry@kernel.org>
To: Mike Rapoport <rppt@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrei Vagin <avagin@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Hildenbrand <david@kernel.org>,
Hugh Dickins <hughd@google.com>,
James Houghton <jthoughton@google.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
"Lorenzo Stoakes (Oracle)" <ljs@kernel.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Michal Hocko <mhocko@suse.com>,
Muchun Song <muchun.song@linux.dev>,
Nikita Kalyazin <kalyazin@amazon.com>,
Oscar Salvador <osalvador@suse.de>,
Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Shuah Khan <shuah@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
kvm@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v3 02/15] userfaultfd: introduce struct mfill_state
Date: Wed, 1 Apr 2026 00:24:01 +0900 [thread overview]
Message-ID: <acvnEd3-s6XI26vb@hyeyoo> (raw)
In-Reply-To: <acva_NKf7T5rbIC8@kernel.org>
On Tue, Mar 31, 2026 at 05:32:28PM +0300, Mike Rapoport wrote:
> Hi Harry,
>
> On Tue, Mar 31, 2026 at 04:03:13PM +0900, Harry Yoo (Oracle) wrote:
> > On Mon, Mar 30, 2026 at 01:11:03PM +0300, Mike Rapoport wrote:
> > > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> > >
> > > mfill_atomic() passes a lot of parameters down to its callees.
> > >
> > > Aggregate them all into mfill_state structure and pass this structure to
> > > functions that implement various UFFDIO_ commands.
> > >
> > > Tracking the state in a structure will allow moving the code that retries
> > > copying of data for UFFDIO_COPY into mfill_atomic_pte_copy() and make the
> > > loop in mfill_atomic() identical for all UFFDIO operations on PTE-mapped
> > > memory.
> > >
> > > The mfill_state definition is deliberately local to mm/userfaultfd.c,
> > > hence shmem_mfill_atomic_pte() is not updated.
> > >
> > > [harry.yoo@oracle.com: properly initialize mfill_state.len to fix
> > > folio_add_new_anon_rmap() WARN]
> > > Link: https://lkml.kernel.org/r/abehBY7QakYF9bK4@hyeyoo
> > > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
> > > Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> > > ---
> > > mm/userfaultfd.c | 148 ++++++++++++++++++++++++++---------------------
> > > 1 file changed, 82 insertions(+), 66 deletions(-)
> > >
> > > @@ -790,12 +804,14 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx,
> > > uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE))
> > > goto out_unlock;
> > >
> > > - while (src_addr < src_start + len) {
> > > - pmd_t dst_pmdval;
> > > + state.vma = dst_vma;
> >
> > Oh wait, the lock leak was introduced in patch 2.
>
> Lock leak was introduced in patch 4 that moved getting the vma.
Still not sure what I could possibly be missing, so let me try again.
when I check out to this commit "userfaultfd: introduce struct mfill_state" I see:
| static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx,
| unsigned long dst_start,
| unsigned long src_start,
| unsigned long len,
| uffd_flags_t flags)
| {
| struct mfill_state state = (struct mfill_state){
| .ctx = ctx,
| .dst_start = dst_start,
| .src_start = src_start, .flags = flags,
| .len = len,
| .src_addr = src_start,
| .dst_addr = dst_start,
| };
|
[ ...snip...]
| retry:
| /*
| * Make sure the vma is not shared, that the dst range is
| * both valid and fully within a single existing vma.
| */
| dst_vma = uffd_mfill_lock(dst_mm, dst_start, len);
It acquires the vma lock (or mmap_lock) here, but doesn't set state.vma.
| if (IS_ERR(dst_vma)) {
| err = PTR_ERR(dst_vma);
| goto out;
| }
|
| /*
| * If memory mappings are changing because of non-cooperative
| * operation (e.g. mremap) running in parallel, bail out and
| * request the user to retry later
| */
| down_read(&ctx->map_changing_lock);
| err = -EAGAIN;
| if (atomic_read(&ctx->mmap_changing))
| goto out_unlock;
|
| err = -EINVAL;
| /*
| * shmem_zero_setup is invoked in mmap for MAP_ANONYMOUS|MAP_SHARED but
| * it will overwrite vm_ops, so vma_is_anonymous must return false.
| */
| if (WARN_ON_ONCE(vma_is_anonymous(dst_vma) &&
| dst_vma->vm_flags & VM_SHARED))
|
| /*
| * validate 'mode' now that we know the dst_vma: don't allow
| * a wrprotect copy if the userfaultfd didn't register as WP.
| */
| if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP))
| goto out_unlock;
|
| /*
| * If this is a HUGETLB vma, pass off to appropriate routine
| */
| if (is_vm_hugetlb_page(dst_vma))
| return mfill_atomic_hugetlb(ctx, dst_vma, dst_start,
| src_start, len, flags);
|
| if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma))
| goto out_unlock;
| if (!vma_is_shmem(dst_vma) &&
| uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE))
| goto out_unlock;
|
| state.vma = dst_vma;
It is set here. So if anything before this jumps to `out_unlock`
label due to a sanity check,
[...]
| while (state.src_addr < src_start + len) {
| VM_WARN_ON_ONCE(state.dst_addr >= dst_start + len);
|
| pmd_t dst_pmdval;
| [...]
|
| out_unlock:
| up_read(&ctx->map_changing_lock);
| uffd_mfill_unlock(state.vma);
the `vma` parameter will be NULL?
If I'm not missing something this is introduced in patch 2 and
fixed in patch 4.
| out:
| if (state.folio)
| folio_put(state.folio);
| VM_WARN_ON_ONCE(copied < 0);
| VM_WARN_ON_ONCE(err > 0);
| VM_WARN_ON_ONCE(!copied && !err);
| return copied ? copied : err;
| }
> > > @@ -866,10 +882,10 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx,
> > >
> > > out_unlock:
> > > up_read(&ctx->map_changing_lock);
> > > - uffd_mfill_unlock(dst_vma);
> > > + uffd_mfill_unlock(state.vma);
> > > out:
> > > - if (folio)
> > > - folio_put(folio);
> > > + if (state.folio)
> > > + folio_put(state.folio);
> >
> > Sashiko raised a concern [2] that it the VMA might be unmapped and
> > a new mapping created as a uffd hugetlb vma and leak the folio by
> > going through
> >
> > `if (is_vm_hugetlb_page(dst_vma))
> > return mfill_atomic_hugetlb(ctx, dst_vma, dst_start,
> > src_start, len, flags);`
> >
> > but it appears to be a false positive (to me) because
> >
> > `if (atomic_read(&ctx->mmap_changing))` check should have detected unmapping
> > and free the folio?
>
> I think it's real, and it's there more or less from the beginning, although
> nobody hit it yet :)
>
> Before retrying the copy we drop all the locks, so if the copy is really
> long the old mapping can be wiped and a new mapping can be created instead.
Oops, perhaps I should have imagined harder :)
> There's already a v4 of a patch that attempts to solve this:
>
> https://lore.kernel.org/all/20260331134158.622084-1-devnexen@gmail.com
Thanks for the pointer!
> > [2] https://sashiko.dev/#/patchset/20260330101116.1117699-1-rppt%40kernel.org?patch=13671
> >
> > > VM_WARN_ON_ONCE(copied < 0);
> > > VM_WARN_ON_ONCE(err > 0);
> > > VM_WARN_ON_ONCE(!copied && !err);
--
Cheers,
Harry / Hyeonggon
next prev parent reply other threads:[~2026-03-31 15:24 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-30 10:11 [PATCH v3 00/15] mm, kvm: allow uffd support in guest_memfd Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 01/15] userfaultfd: introduce mfill_copy_folio_locked() helper Mike Rapoport
2026-03-31 3:33 ` Harry Yoo (Oracle)
2026-03-30 10:11 ` [PATCH v3 02/15] userfaultfd: introduce struct mfill_state Mike Rapoport
2026-03-31 7:03 ` Harry Yoo (Oracle)
2026-03-31 14:32 ` Mike Rapoport
2026-03-31 15:24 ` Harry Yoo (Oracle) [this message]
2026-04-01 7:36 ` Mike Rapoport
2026-04-01 17:37 ` Andrew Morton
2026-04-01 17:44 ` Andrew Morton
2026-04-02 4:36 ` Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 03/15] userfaultfd: introduce mfill_establish_pmd() helper Mike Rapoport
2026-03-31 7:50 ` Harry Yoo (Oracle)
2026-03-30 10:11 ` [PATCH v3 04/15] userfaultfd: introduce mfill_get_vma() and mfill_put_vma() Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 05/15] userfaultfd: retry copying with locks dropped in mfill_atomic_pte_copy() Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 06/15] userfaultfd: move vma_can_userfault out of line Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 07/15] userfaultfd: introduce vm_uffd_ops Mike Rapoport
2026-03-30 16:58 ` Matthew Wilcox
2026-03-31 11:42 ` Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 08/15] shmem, userfaultfd: use a VMA callback to handle UFFDIO_CONTINUE Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 09/15] userfaultfd: introduce vm_uffd_ops->alloc_folio() Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 10/15] shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 11/15] userfaultfd: mfill_atomic(): remove retry logic Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 12/15] mm: generalize handling of userfaults in __do_fault() Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 13/15] KVM: guest_memfd: implement userfaultfd operations Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 14/15] KVM: selftests: test userfaultfd minor for guest_memfd Mike Rapoport
2026-03-30 10:11 ` [PATCH v3 15/15] KVM: selftests: test userfaultfd missing " Mike Rapoport
2026-03-30 19:54 ` [PATCH v3 00/15] mm, kvm: allow uffd support in guest_memfd Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acvnEd3-s6XI26vb@hyeyoo \
--to=harry@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=avagin@google.com \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@kernel.org \
--cc=hughd@google.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=rppt@kernel.org \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.