From: Mike Rapoport <rppt@kernel.org>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Axel Rasmussen <axelrasmussen@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Hildenbrand <david@redhat.com>,
Hugh Dickins <hughd@google.com>,
James Houghton <jthoughton@google.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Michal Hocko <mhocko@suse.com>,
Muchun Song <muchun.song@linux.dev>,
Nikita Kalyazin <kalyazin@amazon.com>,
Oscar Salvador <osalvador@suse.de>,
Paolo Bonzini <pbonzini@redhat.com>,
Sean Christopherson <seanjc@google.com>,
Shuah Khan <shuah@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Vlastimil Babka <vbabka@suse.cz>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
linux-kselftest@vger.kernel.org
Subject: Re: [PATCH RFC 10/17] shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops
Date: Sun, 8 Feb 2026 12:35:43 +0200 [thread overview]
Message-ID: <aYhm_4difwN5XXxe@kernel.org> (raw)
In-Reply-To: <aYIzCuh8cjd09zrP@x1.local>
On Tue, Feb 03, 2026 at 12:40:26PM -0500, Peter Xu wrote:
> On Tue, Jan 27, 2026 at 09:29:29PM +0200, Mike Rapoport wrote:
> > From: "Mike Rapoport (Microsoft)" <rppt@kernel.org>
> >
> > Add filemap_add() and filemap_remove() methods to vm_uffd_ops and use
> > them in __mfill_atomic_pte() to add shmem folios to page cache and
> > remove them in case of error.
> >
> > Implement these methods in shmem along with vm_uffd_ops->alloc_folio()
> > and drop shmem_mfill_atomic_pte().
> >
> > Since userfaultfd now does not reference any functions from shmem, drop
> > include if linux/shmem_fs.h from mm/userfaultfd.c
> >
> > mfill_atomic_install_pte() is not used anywhere outside of
> > mm/userfaultfd, make it static.
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
> This patch looks like a real nice cleanup on its own, thanks Mike!
>
> I guess I never tried to read into shmem accountings, now after I read some
> of the codes I don't see any issue with your change. We can also wait for
> some shmem developers double check those. Comments inline below on
> something I spot.
>
> >
> > fixup
> >
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
>
> [unexpected lines can be removed here]
Sure :)
> > ---
> > include/linux/shmem_fs.h | 14 ----
> > include/linux/userfaultfd_k.h | 20 +++--
> > mm/shmem.c | 148 ++++++++++++----------------------
> > mm/userfaultfd.c | 79 +++++++++---------
> > 4 files changed, 106 insertions(+), 155 deletions(-)
> >
> > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> > index e2069b3179c4..754f17e5b53c 100644
> > --- a/include/linux/shmem_fs.h
> > +++ b/include/linux/shmem_fs.h
> > @@ -97,6 +97,21 @@ struct vm_uffd_ops {
> > */
> > struct folio *(*alloc_folio)(struct vm_area_struct *vma,
> > unsigned long addr);
> > + /*
> > + * Called during resolution of UFFDIO_COPY request.
> > + * Should lock the folio and add it to VMA's page cache.
> > + * Returns 0 on success, error code on failre.
>
> failure
Thanks, will fix.
> > + */
> > + int (*filemap_add)(struct folio *folio, struct vm_area_struct *vma,
> > + unsigned long addr);
> > + /*
> > + * Called during resolution of UFFDIO_COPY request on the error
> > + * handling path.
> > + * Should revert the operation of ->filemap_add().
> > + * The folio should be unlocked, but the reference to it should not be
> > + * dropped.
>
> Might be slightly misleading to explicitly mention this? As page cache
> also holds references and IIUC they need to be dropped there. But I get
> your point, on keeping the last refcount due to allocation.
>
> IMHO the "should revert the operation of ->filemap_add()" is good enough
> and accurately describes it.
Yeah, sounds good.
> > + */
> > + void (*filemap_remove)(struct folio *folio, struct vm_area_struct *vma);
> > };
> >
> > /* A combined operation mode + behavior flags. */
...
> > +static int shmem_mfill_filemap_add(struct folio *folio,
> > + struct vm_area_struct *vma,
> > + unsigned long addr)
> > +{
> > + struct inode *inode = file_inode(vma->vm_file);
> > + struct address_space *mapping = inode->i_mapping;
> > + pgoff_t pgoff = linear_page_index(vma, addr);
> > + gfp_t gfp = mapping_gfp_mask(mapping);
> > + int err;
> > +
> > __folio_set_locked(folio);
> > __folio_set_swapbacked(folio);
> > - __folio_mark_uptodate(folio);
> > -
> > - ret = -EFAULT;
> > - max_off = DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE);
> > - if (unlikely(pgoff >= max_off))
> > - goto out_release;
> >
> > - ret = mem_cgroup_charge(folio, dst_vma->vm_mm, gfp);
> > - if (ret)
> > - goto out_release;
> > - ret = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > - if (ret)
> > - goto out_release;
> > + err = shmem_add_to_page_cache(folio, mapping, pgoff, NULL, gfp);
> > + if (err)
> > + goto err_unlock;
> >
> > - ret = mfill_atomic_install_pte(dst_pmd, dst_vma, dst_addr,
> > - &folio->page, true, flags);
> > - if (ret)
> > - goto out_delete_from_cache;
> > + if (shmem_inode_acct_blocks(inode, 1)) {
>
> We used to do this early before allocation, IOW, I think we still have an
> option to leave this to alloc_folio() hook. However I don't see an issue
> either keeping it in filemap_add(). Maybe this movement should better be
> spelled out in the commit message anyway on how this decision is made.
>
> IIUC it's indeed safe we move this acct_blocks() here, I even see Hugh
> mentioned such in an older commit 3022fd7af96, but Hugh left uffd alone at
> that time:
>
> Userfaultfd is a foreign country: they do things differently there, and
> for good reason - to avoid mmap_lock deadlock. Leave ordering in
> shmem_mfill_atomic_pte() untouched for now, but I would rather like to
> mesh it better with shmem_get_folio_gfp() in the future.
>
> I'm not sure if that's also what you wanted to do - to make userfaultfd
> code work similarly like what shmem_alloc_and_add_folio() does right now.
> Maybe you want to mention that too somewhere in the commit log when posting
> a formal patch.
>
> One thing not directly relevant is, shmem_alloc_and_add_folio() also does
> proper recalc of inode allocation info when acct_blocks() fails here. But
> if that's a problem, that's pre-existing for userfaultfd, so IIUC we can
> also leave it alone until someone (maybe quota user) complains about shmem
> allocation failures on UFFDIO_COPY.. It's just that it looks similar
> problem here in userfaultfd path.
I actually wanted to have ordering as close as possible to
shmem_alloc_and_add_folio(), that's the first reason on moving acct_blocks
to ->filemap_add().
Another reason, is that it simplifies rollback in case of a failure, as
shmem_recalc_inode(inode, 0, 0); in ->filemap_remove() takes care of the
block accounting as well.
> > + err = -ENOMEM;
> > + goto err_delete_from_cache;
> > + }
> >
> > + folio_add_lru(folio);
>
> This change is pretty separate from the work, but looks correct to me: IIUC
> we moved the lru add earlier now, and it should be safe as long as we're
> holding folio lock all through the process, and folio_put() (ultimately,
> __page_cache_release()) will always properly undo the lru change. Please
> help double check if my understanding is correct.
This follows shmem_alloc_and_add_folio(), and my understanding as well that
this is safe as long as we hold folio lock.
> > +static void shmem_mfill_filemap_remove(struct folio *folio,
> > + struct vm_area_struct *vma)
> > +{
> > + struct inode *inode = file_inode(vma->vm_file);
> > +
> > + filemap_remove_folio(folio);
> > + shmem_recalc_inode(inode, 0, 0);
> > folio_unlock(folio);
> > - folio_put(folio);
> > -out_unacct_blocks:
> > - shmem_inode_unacct_blocks(inode, 1);
>
> This looks wrong, or maybe I miss somewhere we did the unacct_blocks()?
This is handled by shmem_recalc_inode(inode, 0, 0).
> > @@ -401,6 +397,9 @@ int mfill_atomic_install_pte(pmd_t *dst_pmd,
> >
> > set_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte);
> >
> > + if (page_in_cache)
> > + folio_unlock(folio);
>
> Nitpick: another small change that looks correct, but IMHO would be nice to
> either make it a small separate patch, or mention in the commit message.
I'll address this in the commit log,
> > +
> > /* No need to invalidate - it was non-present before */
> > update_mmu_cache(dst_vma, dst_addr, dst_pte);
> > ret = 0;
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2026-02-08 10:35 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-27 19:29 [PATCH RFC 00/17] mm, kvm: allow uffd suppot in guest_memfd Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 01/17] userfaultfd: introduce mfill_copy_folio_locked() helper Mike Rapoport
2026-02-03 17:45 ` Peter Xu
2026-02-08 9:49 ` Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 02/17] userfaultfd: introduce struct mfill_state Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 03/17] userfaultfd: introduce mfill_get_pmd() helper Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 04/17] userfaultfd: introduce mfill_get_vma() and mfill_put_vma() Mike Rapoport
2026-02-02 21:49 ` Peter Xu
2026-02-08 9:54 ` Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 05/17] userfaultfd: retry copying with locks dropped in mfill_atomic_pte_copy() Mike Rapoport
2026-02-02 21:23 ` Peter Xu
2026-02-08 10:01 ` Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 06/17] userfaultfd: move vma_can_userfault out of line Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 07/17] userfaultfd: introduce vm_uffd_ops Mike Rapoport
2026-02-02 21:36 ` Peter Xu
2026-02-08 10:13 ` Mike Rapoport
2026-02-11 19:35 ` Peter Xu
2026-02-15 17:47 ` Mike Rapoport
2026-02-18 21:34 ` Peter Xu
2026-01-27 19:29 ` [PATCH RFC 08/17] userfaultfd, shmem: use a VMA callback to handle UFFDIO_CONTINUE Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 09/17] userfaultfd: introduce vm_uffd_ops->alloc_folio() Mike Rapoport
2026-02-02 22:13 ` Peter Xu
2026-02-08 10:22 ` Mike Rapoport
2026-02-11 19:37 ` Peter Xu
2026-01-27 19:29 ` [PATCH RFC 10/17] shmem, userfaultfd: implement shmem uffd operations using vm_uffd_ops Mike Rapoport
2026-02-03 17:40 ` Peter Xu
2026-02-08 10:35 ` Mike Rapoport [this message]
2026-02-11 20:00 ` Peter Xu
2026-02-15 17:45 ` Mike Rapoport
2026-02-18 21:45 ` Peter Xu
2026-01-27 19:29 ` [PATCH RFC 11/17] userfaultfd: mfill_atomic() remove retry logic Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 12/17] mm: introduce VM_FAULT_UFFD_MINOR fault reason Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 13/17] mm: introduce VM_FAULT_UFFD_MISSING " Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 14/17] KVM: guest_memfd: implement userfaultfd minor mode Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 15/17] KVM: guest_memfd: implement userfaultfd missing mode Mike Rapoport
2026-03-04 17:12 ` Nikita Kalyazin
2026-03-04 17:17 ` Nikita Kalyazin
2026-01-27 19:29 ` [PATCH RFC 16/17] KVM: selftests: test userfaultfd minor for guest_memfd Mike Rapoport
2026-01-27 19:29 ` [PATCH RFC 17/17] KVM: selftests: test userfaultfd missing " Mike Rapoport
2026-02-03 20:56 ` [PATCH RFC 00/17] mm, kvm: allow uffd suppot in guest_memfd Peter Xu
2026-02-09 15:35 ` David Hildenbrand (Arm)
2026-02-11 6:04 ` Mike Rapoport
2026-02-11 9:52 ` David Hildenbrand (Arm)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYhm_4difwN5XXxe@kernel.org \
--to=rppt@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axelrasmussen@google.com \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=jthoughton@google.com \
--cc=kalyazin@amazon.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.