From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF44B1DFE09 for ; Fri, 27 Jun 2025 21:23:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751059411; cv=none; b=sm26H+Ckv7jIn5jFXXQg+gckBBpc1i6b3XbDXFxFw0y2WatJQbDHHWp1PRfz+QrzKRN0psX9ICoQoZEgVYsV25lh7j4D0hpdF3fdG4nCl7VsLzOcIsn30bs8aj3rT+OMAseDeID9HgtsIVwzkqljr809DXFDs/F8Rb0l6O2NW7s= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751059411; c=relaxed/simple; bh=L0moEsA++NKV5J3cgGs6DyVRdy422eMLnV9P8+FxGSQ=; h=Date:To:From:Subject:Message-Id; b=Jfq/SV54zZz+7nl1ImrnNAq5SmxZ9Tsqa/r8Ex1YnqE/RnayZM2KYoe6ibT+HKOb724crITxVv+4z3OgnaVaQk4kykVjve59/EQbLGfLCvbK2PwlwLW56PvYw4Wij6nfCR7x65ExhrJ4V59ilUuTL8/L8fukAkACTfmFwxONzOU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b=GHSY5DMD; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="GHSY5DMD" Received: by smtp.kernel.org (Postfix) with ESMTPSA id AF304C4CEE3; Fri, 27 Jun 2025 21:23:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1751059410; bh=L0moEsA++NKV5J3cgGs6DyVRdy422eMLnV9P8+FxGSQ=; h=Date:To:From:Subject:From; b=GHSY5DMDgdUymWocUwocf4RaVU/8tovVobxobaQB4Ak8lRXHPqgH7a52sVtL89kSn gTtkpBgN95N3Do+Ia19s1DtQOe4ymd9HSHzEwx+BCwOIOyOU6slkwUO7GfYOkf1Xrw iWEFM/zAG40Nf5NrhH3pZuX9eqwRQJ9CrjKpRREk= Date: Fri, 27 Jun 2025 14:23:30 -0700 To: mm-commits@vger.kernel.org,vbabka@suse.cz,surenb@google.com,rppt@kernel.org,osalvador@suse.de,muchun.song@linux.dev,mhocko@suse.com,lorenzo.stoakes@oracle.com,liam.howlett@oracle.com,jthoughton@google.com,hughd@google.com,david@redhat.com,axelrasmussen@google.com,aarcange@redhat.com,peterx@redhat.com,akpm@linux-foundation.org From: Andrew Morton Subject: + mm-apply-vm_uffd_ops-api-to-core-mm.patch added to mm-new branch Message-Id: <20250627212330.AF304C4CEE3@smtp.kernel.org> Precedence: bulk X-Mailing-List: mm-commits@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: The patch titled Subject: mm: apply vm_uffd_ops API to core mm has been added to the -mm mm-new branch. Its filename is mm-apply-vm_uffd_ops-api-to-core-mm.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-apply-vm_uffd_ops-api-to-core-mm.patch This patch will later appear in the mm-new branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Note, mm-new is a provisional staging ground for work-in-progress patches, and acceptance into mm-new is a notification for others take notice and to finish up reviews. Please do not hesitate to respond to review feedback and post updated versions to replace or incrementally fixup patches in mm-new. Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Peter Xu Subject: mm: apply vm_uffd_ops API to core mm Date: Fri, 27 Jun 2025 11:46:55 -0400 This patch completely moves the old userfaultfd core to use the new vm_uffd_ops API. After this change, existing file systems will start to use the new API for userfault operations. When at it, moving vma_can_userfault() into mm/userfaultfd.c instead, because it's getting too big. It's only used in slow paths so it shouldn't be an issue. Move the pte marker check before wp_async, which might be more intuitive because wp_async depends on pte markers. That shouldn't cause any functional change though because only one check would take effect depending on whether pte marker was selected in config. This will also remove quite some hard-coded checks for either shmem or hugetlbfs. Now all the old checks should still work but with vm_uffd_ops. Note that anonymous memory will still need to be processed separately because it doesn't have vm_ops at all. Link: https://lkml.kernel.org/r/20250627154655.2085903-5-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: James Houghton Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: Mike Rapoport Cc: Muchun Song Cc: Oscar Salvador Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 14 --- include/linux/userfaultfd_k.h | 48 +++---------- mm/shmem.c | 2 mm/userfaultfd.c | 115 ++++++++++++++++++++++++-------- 4 files changed, 102 insertions(+), 77 deletions(-) --- a/include/linux/shmem_fs.h~mm-apply-vm_uffd_ops-api-to-core-mm +++ a/include/linux/shmem_fs.h @@ -195,20 +195,6 @@ static inline pgoff_t shmem_fallocend(st extern bool shmem_charge(struct inode *inode, long pages); extern void shmem_uncharge(struct inode *inode, long pages); -#ifdef CONFIG_USERFAULTFD -#ifdef CONFIG_SHMEM -extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd, - struct vm_area_struct *dst_vma, - unsigned long dst_addr, - unsigned long src_addr, - uffd_flags_t flags, - struct folio **foliop); -#else /* !CONFIG_SHMEM */ -#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \ - src_addr, flags, foliop) ({ BUG(); 0; }) -#endif /* CONFIG_SHMEM */ -#endif /* CONFIG_USERFAULTFD */ - /* * Used space is stored as unsigned 64-bit value in bytes but * quota core supports only signed 64-bit values so use that --- a/include/linux/userfaultfd_k.h~mm-apply-vm_uffd_ops-api-to-core-mm +++ a/include/linux/userfaultfd_k.h @@ -149,9 +149,14 @@ typedef struct vm_uffd_ops vm_uffd_ops; #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) #define MFILL_ATOMIC_MODE_MASK ((__force uffd_flags_t) (MFILL_ATOMIC_BIT(0) - 1)) +static inline enum mfill_atomic_mode uffd_flags_get_mode(uffd_flags_t flags) +{ + return (__force enum mfill_atomic_mode)(flags & MFILL_ATOMIC_MODE_MASK); +} + static inline bool uffd_flags_mode_is(uffd_flags_t flags, enum mfill_atomic_mode expected) { - return (flags & MFILL_ATOMIC_MODE_MASK) == ((__force uffd_flags_t) expected); + return uffd_flags_get_mode(flags) == expected; } static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mfill_atomic_mode mode) @@ -260,41 +265,16 @@ static inline bool userfaultfd_armed(str return vma->vm_flags & __VM_UFFD_FLAGS; } -static inline bool vma_can_userfault(struct vm_area_struct *vma, - vm_flags_t vm_flags, - bool wp_async) -{ - vm_flags &= __VM_UFFD_FLAGS; - - if (vma->vm_flags & VM_DROPPABLE) - return false; - - if ((vm_flags & VM_UFFD_MINOR) && - (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) - return false; - - /* - * If wp async enabled, and WP is the only mode enabled, allow any - * memory type. - */ - if (wp_async && (vm_flags == VM_UFFD_WP)) - return true; - -#ifndef CONFIG_PTE_MARKER_UFFD_WP - /* - * If user requested uffd-wp but not enabled pte markers for - * uffd-wp, then shmem & hugetlbfs are not supported but only - * anonymous. - */ - if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) - return false; -#endif - - /* By default, allow any of anon|shmem|hugetlb */ - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); +static inline const vm_uffd_ops *vma_get_uffd_ops(struct vm_area_struct *vma) +{ + if (vma->vm_ops && vma->vm_ops->userfaultfd_ops) + return vma->vm_ops->userfaultfd_ops; + return NULL; } +bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags, bool wp_async); + static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma) { struct userfaultfd_ctx *uffd_ctx = vma->vm_userfaultfd_ctx.ctx; --- a/mm/shmem.c~mm-apply-vm_uffd_ops-api-to-core-mm +++ a/mm/shmem.c @@ -3168,7 +3168,7 @@ static int shmem_uffd_get_folio(struct i return shmem_get_folio(inode, pgoff, 0, folio, SGP_NOALLOC); } -int shmem_mfill_atomic_pte(pmd_t *dst_pmd, +static int shmem_mfill_atomic_pte(pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, --- a/mm/userfaultfd.c~mm-apply-vm_uffd_ops-api-to-core-mm +++ a/mm/userfaultfd.c @@ -14,12 +14,48 @@ #include #include #include -#include #include #include #include "internal.h" #include "swap.h" +bool vma_can_userfault(struct vm_area_struct *vma, vm_flags_t vm_flags, + bool wp_async) +{ + unsigned long supported; + + if (vma->vm_flags & VM_DROPPABLE) + return false; + + vm_flags &= __VM_UFFD_FLAGS; + +#ifndef CONFIG_PTE_MARKER_UFFD_WP + /* + * If user requested uffd-wp but not enabled pte markers for + * uffd-wp, then any file system (like shmem or hugetlbfs) are not + * supported but only anonymous. + */ + if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) + return false; +#endif + /* + * If wp async enabled, and WP is the only mode enabled, allow any + * memory type. + */ + if (wp_async && (vm_flags == VM_UFFD_WP)) + return true; + + if (vma_is_anonymous(vma)) + /* Anonymous has no page cache, MINOR not supported */ + supported = VM_UFFD_MISSING | VM_UFFD_WP; + else if (vma_get_uffd_ops(vma)) + supported = vma_get_uffd_ops(vma)->uffd_features; + else + return false; + + return !(vm_flags & (~supported)); +} + static __always_inline bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end) { @@ -384,11 +420,15 @@ static int mfill_atomic_pte_continue(pmd { struct inode *inode = file_inode(dst_vma->vm_file); pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); + const vm_uffd_ops *uffd_ops = vma_get_uffd_ops(dst_vma); struct folio *folio; struct page *page; int ret; - ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_get_folio)) + return -EINVAL; + + ret = uffd_ops->uffd_get_folio(inode, pgoff, &folio); /* Our caller expects us to return -EFAULT if we failed to find folio */ if (ret == -ENOENT) ret = -EFAULT; @@ -504,18 +544,6 @@ static __always_inline ssize_t mfill_ato u32 hash; struct address_space *mapping; - /* - * There is no default zero huge page for all huge page sizes as - * supported by hugetlb. A PMD_SIZE huge pages may exist as used - * by THP. Since we can not reliably insert a zero page, this - * feature is not supported. - */ - if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) { - up_read(&ctx->map_changing_lock); - uffd_mfill_unlock(dst_vma); - return -EINVAL; - } - src_addr = src_start; dst_addr = dst_start; copied = 0; @@ -686,14 +714,55 @@ static __always_inline ssize_t mfill_ato err = mfill_atomic_pte_zeropage(dst_pmd, dst_vma, dst_addr); } else { - err = shmem_mfill_atomic_pte(dst_pmd, dst_vma, - dst_addr, src_addr, - flags, foliop); + const vm_uffd_ops *uffd_ops = vma_get_uffd_ops(dst_vma); + + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_copy)) { + err = -EINVAL; + } else { + err = uffd_ops->uffd_copy(dst_pmd, dst_vma, + dst_addr, src_addr, + flags, foliop); + } } return err; } +static inline bool +vma_uffd_ops_supported(struct vm_area_struct *vma, uffd_flags_t flags) +{ + enum mfill_atomic_mode mode = uffd_flags_get_mode(flags); + const vm_uffd_ops *uffd_ops; + unsigned long uffd_ioctls; + + if ((flags & MFILL_ATOMIC_WP) && !(vma->vm_flags & VM_UFFD_WP)) + return false; + + /* Anonymous supports everything except CONTINUE */ + if (vma_is_anonymous(vma)) + return mode != MFILL_ATOMIC_CONTINUE; + + uffd_ops = vma_get_uffd_ops(vma); + if (!uffd_ops) + return false; + + uffd_ioctls = uffd_ops->uffd_ioctls; + switch (mode) { + case MFILL_ATOMIC_COPY: + return uffd_ioctls & BIT(_UFFDIO_COPY); + case MFILL_ATOMIC_ZEROPAGE: + return uffd_ioctls & BIT(_UFFDIO_ZEROPAGE); + case MFILL_ATOMIC_CONTINUE: + if (!(vma->vm_flags & VM_SHARED)) + return false; + return uffd_ioctls & BIT(_UFFDIO_CONTINUE); + case MFILL_ATOMIC_POISON: + return uffd_ioctls & BIT(_UFFDIO_POISON); + default: + return false; + } +} + static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, unsigned long dst_start, unsigned long src_start, @@ -752,11 +821,7 @@ retry: dst_vma->vm_flags & VM_SHARED)) goto out_unlock; - /* - * validate 'mode' now that we know the dst_vma: don't allow - * a wrprotect copy if the userfaultfd didn't register as WP. - */ - if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP)) + if (!vma_uffd_ops_supported(dst_vma, flags)) goto out_unlock; /* @@ -766,12 +831,6 @@ retry: return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, src_start, len, flags); - if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) - goto out_unlock; - if (!vma_is_shmem(dst_vma) && - uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) - goto out_unlock; - while (src_addr < src_start + len) { pmd_t dst_pmdval; _ Patches currently in -mm which might be from peterx@redhat.com are selftests-mm-reduce-uffd-unit-test-poison-test-to-minimum.patch selftests-mm-reduce-uffd-unit-test-poison-test-to-minimum-fix.patch mm-hugetlb-remove-prepare_hugepage_range.patch mm-deduplicate-mm_get_unmapped_area.patch mm-introduce-vm_uffd_ops-api.patch mm-shmem-support-vm_uffd_ops-api.patch mm-hugetlb-support-vm_uffd_ops-api.patch mm-apply-vm_uffd_ops-api-to-core-mm.patch