From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BD7EC7EE30 for ; Sun, 29 Jun 2025 08:55:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D132E6B008A; Sun, 29 Jun 2025 04:55:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CC4206B0093; Sun, 29 Jun 2025 04:55:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD9846B0098; Sun, 29 Jun 2025 04:55:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id A56C76B008A for ; Sun, 29 Jun 2025 04:55:39 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2DAC6123456 for ; Sun, 29 Jun 2025 08:55:39 +0000 (UTC) X-FDA: 83607829998.04.5554460 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf18.hostedemail.com (Postfix) with ESMTP id 911E41C0007 for ; Sun, 29 Jun 2025 08:55:37 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=omtBSUgJ; spf=pass (imf18.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751187337; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wDMpKUu4q7PmUiL1FXl2TFX+nUCUSXQ3zUc9SzT++AA=; b=bdExPICAvICzXA7W2OPmUTO9G6meImwvx+HHYeisr9l7lBGu/t2IGDfaB9zfnt105IhlNi OLk4hs2F8gfdcaNlFatxyOAyO5tkgkNHy+3Zj+ZDTYIoKA4OgaIZ3Vt//4DwQJNwZPr3uk uHfjgVIPlVKNdPy6pq7Zkulqb0R8PCw= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=omtBSUgJ; spf=pass (imf18.hostedemail.com: domain of rppt@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751187337; a=rsa-sha256; cv=none; b=Z6L8yuEyYFNTrTgHmd2kybSSSO/R7ztz9x1pl2hyxKuXrx/OnqaUfYyIg6qAxfHn9LcGa9 wEXVyk7qXal6lWOpnJ6LWYBHCZ00HZB+WHhRuJYyXRNZ8ieQpDcOn4L1rSx3I+va7pF79/ TyuBIdcnw0YxKzBVDycZulwjBZ4nzyQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 36FB860EDF; Sun, 29 Jun 2025 08:55:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 50233C4CEEB; Sun, 29 Jun 2025 08:55:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1751187335; bh=IhCM6N26vIJSS+eywalQiTuVBL3ReiDTukapMspwnVQ=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=omtBSUgJa/T6ZDQV1lBgz9lIBRvGfh+mru2kTExh8zU3KCUOwywilUwIRJzOjaTdA QcC2TUZAsBEt54Xea5BLF2Ij5B5kxJknl3X6FDgcHCk3NUBzEuO2/d/bJHH4xBZMXk qpDEDOZqNJYQEjyq6SeMryntrQjbrTU3I2GfL64Rs3B1KLNvVzEuIxxIkNMOKFn4JR LrcW30AO1yiindezowvze9tY9E1s06UvNzsheqqHBUS2yNMoY1dVRYUi7C+RVzcceJ o3cIs4XceqBuVAp8DYreh55QXuafYoYfPHGPlGg6FEd3HZE2Cmy0NXQAv9uYx379pS 3T4feSg2B5PRA== Date: Sun, 29 Jun 2025 11:55:27 +0300 From: Mike Rapoport To: Peter Xu Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Suren Baghdasaryan , Muchun Song , Lorenzo Stoakes , Hugh Dickins , Andrew Morton , James Houghton , "Liam R . Howlett" , Nikita Kalyazin , Michal Hocko , David Hildenbrand , Andrea Arcangeli , Oscar Salvador , Axel Rasmussen , Ujwal Kundur Subject: Re: [PATCH v2 4/4] mm: Apply vm_uffd_ops API to core mm Message-ID: References: <20250627154655.2085903-1-peterx@redhat.com> <20250627154655.2085903-5-peterx@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20250627154655.2085903-5-peterx@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 911E41C0007 X-Stat-Signature: uurdhm7wkurhgh47x8nqc8ujaeych8zm X-HE-Tag: 1751187337-692241 X-HE-Meta: U2FsdGVkX18/5Lee7DgEOquwqYPXnlZ+E5bNI4NwS01DeMkmjPAFKEqtm9qdQ32kxg4V7wrOGGVankAsO2WOmMXV4wcZr6i8kiM7iQda/ADoBPJgpu67X+Wk4FISW/Zf16g2JbBgsRf2e92gUxaX+tkRIwIF2axLyL5ROY1wo0YIUix4ZqJmAQLAtpoJhGvSbHnLtSXO0OgmYAkn1xNoNSfs3SSZ/KvOLB/yWVSd32qg5jlmXLVsUYavYP1PJ1FEMMTCEvn6dyILUslSYV0j8bgevw3vqGgssIZAcsHnXiYfgYw/l719LE+QbliAPdeKDO56zm5KrJCF1n1HhzDXbxEWjerRMiO/p1if8DAo8KIOXrpaQx+32ASSs4nVtUvX6D1TzaaUoNLyJG6xhiLhOLNfRhZwEjwz8ENKOtBpMLyEgipYMQ6w4sHKjcCw2eE8WgkxRW1ERQX682wmGBB56qSoaS66gTm95ES41rGncJZg3ELjTWKP1oLJAokMI4P6ST9Mv16pzxp57mRrvYiNEyt/sk0zUDM6Nps9JgDkXH0EPv8kvscBsJzGEkVy01NY9tZVZq/4A/kcZs5jXhnZtVwiLDcAWhJ1RiC/1jyKUgQmQNZ5zaWzcYX/h1aXAmVmNsAe7iBI/Kxq3cCf/0su5wU2nXZwkyySnHIPrI9QM/B+B/UHGLQioAP4y5qLVN0aEVJPN8ztbLFjdZlb5WuaGa+pWaVPtqu1tiUmH+Pqtb3nBfd6FaClqb4prsN9maTtwm2E9GDHxTYr5rndr3qSvujvhEKbe4W1dBcvNyqiZRw0DNoCpf4BK/5rp9qso3AQma1D+swr71vY51i8EZvsfq3zA/gK38CH4er4sXKXHEY/BppzeQF/EEKgAH3ke+5p3/uPwtKRnC2KrzmG/tGOBHmS3mug3UXZEYRWm103gNvYzW/SkCWOiKO5n9EBY4YIyLtvYua83TOOPKPFTcP y2XvOD2t vSDe8qdoObYlvD/au5OjpL79kv2y0h19ZpRdWIS2fhzStXtEg5Z0YHNsWhCWheWfGgj/FpaSUkUnbmmoR0nebNgs5U4QKb945CIDQ6AlBCX8IhXt0ZAooWcf572L6VXZBN/YhTWNeCHgSgXeglrdAXQLmLRDXRjEijR3i5C/wt7q2g+Sa0YFjlLEv9ZzJX4KklgfQQNVp5osrlfBkP9+zyYowiu/HXVpyEyGmxJf1OKFpfVnh3ZNpT2UIQ4q+o0/nwS2+qo/2ycaxcyNiTUmj+mo6jHXVj+gXfl6as6i1BHAnYSZsb+FByJP0+c0Pg4kqLTRSLIDQeSqLLFXmt9mGhQr1/qfJTMeI0YbJgKQy7m/Tijz6bbl57hBYscNKUKlp+xsJ51xDAJI3HoazzEN3K9zqd43DG/oHcbegv9mynKdOQuchRQkXGXJ4rs/qg7ywg2MbcZkRYOKL5iZCNO0N2yIqh0w5rbpFKqAy X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Jun 27, 2025 at 11:46:55AM -0400, Peter Xu wrote: > This patch completely moves the old userfaultfd core to use the new > vm_uffd_ops API. After this change, existing file systems will start to > use the new API for userfault operations. Maybe: Move userfaultfd core to use new vm_uffd_ops API. After this change file systems that implement vm_operations_struct can start using new API for userfaultfd operations. > When at it, moving vma_can_userfault() into mm/userfaultfd.c instead, > because it's getting too big. It's only used in slow paths so it shouldn't > be an issue. Move the pte marker check before wp_async, which might be > more intuitive because wp_async depends on pte markers. That shouldn't > cause any functional change though because only one check would take effect > depending on whether pte marker was selected in config. > > This will also remove quite some hard-coded checks for either shmem or > hugetlbfs. Now all the old checks should still work but with vm_uffd_ops. > > Note that anonymous memory will still need to be processed separately > because it doesn't have vm_ops at all. > > Reviewed-by: James Houghton > Signed-off-by: Peter Xu Acked-by: Mike Rapoport (Microsoft) > --- > include/linux/shmem_fs.h | 14 ----- > include/linux/userfaultfd_k.h | 46 ++++---------- > mm/shmem.c | 2 +- > mm/userfaultfd.c | 115 +++++++++++++++++++++++++--------- > 4 files changed, 101 insertions(+), 76 deletions(-) > > diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h > index 6d0f9c599ff7..2f5b7b295cf6 100644 > --- a/include/linux/shmem_fs.h > +++ b/include/linux/shmem_fs.h > @@ -195,20 +195,6 @@ static inline pgoff_t shmem_fallocend(struct inode *inode, pgoff_t eof) > extern bool shmem_charge(struct inode *inode, long pages); > extern void shmem_uncharge(struct inode *inode, long pages); > > -#ifdef CONFIG_USERFAULTFD > -#ifdef CONFIG_SHMEM > -extern int shmem_mfill_atomic_pte(pmd_t *dst_pmd, > - struct vm_area_struct *dst_vma, > - unsigned long dst_addr, > - unsigned long src_addr, > - uffd_flags_t flags, > - struct folio **foliop); > -#else /* !CONFIG_SHMEM */ > -#define shmem_mfill_atomic_pte(dst_pmd, dst_vma, dst_addr, \ > - src_addr, flags, foliop) ({ BUG(); 0; }) > -#endif /* CONFIG_SHMEM */ > -#endif /* CONFIG_USERFAULTFD */ > - > /* > * Used space is stored as unsigned 64-bit value in bytes but > * quota core supports only signed 64-bit values so use that > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index c9a093c4502b..1aa9b246fb84 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -149,9 +149,14 @@ typedef struct vm_uffd_ops vm_uffd_ops; > #define MFILL_ATOMIC_FLAG(nr) ((__force uffd_flags_t) MFILL_ATOMIC_BIT(nr)) > #define MFILL_ATOMIC_MODE_MASK ((__force uffd_flags_t) (MFILL_ATOMIC_BIT(0) - 1)) > > +static inline enum mfill_atomic_mode uffd_flags_get_mode(uffd_flags_t flags) > +{ > + return (__force enum mfill_atomic_mode)(flags & MFILL_ATOMIC_MODE_MASK); > +} > + > static inline bool uffd_flags_mode_is(uffd_flags_t flags, enum mfill_atomic_mode expected) > { > - return (flags & MFILL_ATOMIC_MODE_MASK) == ((__force uffd_flags_t) expected); > + return uffd_flags_get_mode(flags) == expected; > } > > static inline uffd_flags_t uffd_flags_set_mode(uffd_flags_t flags, enum mfill_atomic_mode mode) > @@ -260,41 +265,16 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) > return vma->vm_flags & __VM_UFFD_FLAGS; > } > > -static inline bool vma_can_userfault(struct vm_area_struct *vma, > - vm_flags_t vm_flags, > - bool wp_async) > +static inline const vm_uffd_ops *vma_get_uffd_ops(struct vm_area_struct *vma) > { > - vm_flags &= __VM_UFFD_FLAGS; > - > - if (vma->vm_flags & VM_DROPPABLE) > - return false; > - > - if ((vm_flags & VM_UFFD_MINOR) && > - (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) > - return false; > - > - /* > - * If wp async enabled, and WP is the only mode enabled, allow any > - * memory type. > - */ > - if (wp_async && (vm_flags == VM_UFFD_WP)) > - return true; > - > -#ifndef CONFIG_PTE_MARKER_UFFD_WP > - /* > - * If user requested uffd-wp but not enabled pte markers for > - * uffd-wp, then shmem & hugetlbfs are not supported but only > - * anonymous. > - */ > - if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) > - return false; > -#endif > - > - /* By default, allow any of anon|shmem|hugetlb */ > - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || > - vma_is_shmem(vma); > + if (vma->vm_ops && vma->vm_ops->userfaultfd_ops) > + return vma->vm_ops->userfaultfd_ops; > + return NULL; > } > > +bool vma_can_userfault(struct vm_area_struct *vma, > + unsigned long vm_flags, bool wp_async); > + > static inline bool vma_has_uffd_without_event_remap(struct vm_area_struct *vma) > { > struct userfaultfd_ctx *uffd_ctx = vma->vm_userfaultfd_ctx.ctx; > diff --git a/mm/shmem.c b/mm/shmem.c > index 9a8b8dd4709b..fc85002dd8af 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3158,7 +3158,7 @@ static int shmem_uffd_get_folio(struct inode *inode, pgoff_t pgoff, > return shmem_get_folio(inode, pgoff, 0, folio, SGP_NOALLOC); > } > > -int shmem_mfill_atomic_pte(pmd_t *dst_pmd, > +static int shmem_mfill_atomic_pte(pmd_t *dst_pmd, > struct vm_area_struct *dst_vma, > unsigned long dst_addr, > unsigned long src_addr, > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index cbed91b09640..52d7d5f144b8 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -14,12 +14,48 @@ > #include > #include > #include > -#include > #include > #include > #include "internal.h" > #include "swap.h" > > +bool vma_can_userfault(struct vm_area_struct *vma, vm_flags_t vm_flags, > + bool wp_async) > +{ > + unsigned long supported; > + > + if (vma->vm_flags & VM_DROPPABLE) > + return false; > + > + vm_flags &= __VM_UFFD_FLAGS; > + > +#ifndef CONFIG_PTE_MARKER_UFFD_WP > + /* > + * If user requested uffd-wp but not enabled pte markers for > + * uffd-wp, then any file system (like shmem or hugetlbfs) are not > + * supported but only anonymous. > + */ > + if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) > + return false; > +#endif > + /* > + * If wp async enabled, and WP is the only mode enabled, allow any > + * memory type. > + */ > + if (wp_async && (vm_flags == VM_UFFD_WP)) > + return true; > + > + if (vma_is_anonymous(vma)) > + /* Anonymous has no page cache, MINOR not supported */ > + supported = VM_UFFD_MISSING | VM_UFFD_WP; > + else if (vma_get_uffd_ops(vma)) > + supported = vma_get_uffd_ops(vma)->uffd_features; > + else > + return false; > + > + return !(vm_flags & (~supported)); > +} > + > static __always_inline > bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long dst_end) > { > @@ -384,11 +420,15 @@ static int mfill_atomic_pte_continue(pmd_t *dst_pmd, > { > struct inode *inode = file_inode(dst_vma->vm_file); > pgoff_t pgoff = linear_page_index(dst_vma, dst_addr); > + const vm_uffd_ops *uffd_ops = vma_get_uffd_ops(dst_vma); > struct folio *folio; > struct page *page; > int ret; > > - ret = shmem_get_folio(inode, pgoff, 0, &folio, SGP_NOALLOC); > + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_get_folio)) > + return -EINVAL; > + > + ret = uffd_ops->uffd_get_folio(inode, pgoff, &folio); > /* Our caller expects us to return -EFAULT if we failed to find folio */ > if (ret == -ENOENT) > ret = -EFAULT; > @@ -504,18 +544,6 @@ static __always_inline ssize_t mfill_atomic_hugetlb( > u32 hash; > struct address_space *mapping; > > - /* > - * There is no default zero huge page for all huge page sizes as > - * supported by hugetlb. A PMD_SIZE huge pages may exist as used > - * by THP. Since we can not reliably insert a zero page, this > - * feature is not supported. > - */ > - if (uffd_flags_mode_is(flags, MFILL_ATOMIC_ZEROPAGE)) { > - up_read(&ctx->map_changing_lock); > - uffd_mfill_unlock(dst_vma); > - return -EINVAL; > - } > - > src_addr = src_start; > dst_addr = dst_start; > copied = 0; > @@ -686,14 +714,55 @@ static __always_inline ssize_t mfill_atomic_pte(pmd_t *dst_pmd, > err = mfill_atomic_pte_zeropage(dst_pmd, > dst_vma, dst_addr); > } else { > - err = shmem_mfill_atomic_pte(dst_pmd, dst_vma, > - dst_addr, src_addr, > - flags, foliop); > + const vm_uffd_ops *uffd_ops = vma_get_uffd_ops(dst_vma); > + > + if (WARN_ON_ONCE(!uffd_ops || !uffd_ops->uffd_copy)) { > + err = -EINVAL; > + } else { > + err = uffd_ops->uffd_copy(dst_pmd, dst_vma, > + dst_addr, src_addr, > + flags, foliop); > + } > } > > return err; > } > > +static inline bool > +vma_uffd_ops_supported(struct vm_area_struct *vma, uffd_flags_t flags) > +{ > + enum mfill_atomic_mode mode = uffd_flags_get_mode(flags); > + const vm_uffd_ops *uffd_ops; > + unsigned long uffd_ioctls; > + > + if ((flags & MFILL_ATOMIC_WP) && !(vma->vm_flags & VM_UFFD_WP)) > + return false; > + > + /* Anonymous supports everything except CONTINUE */ > + if (vma_is_anonymous(vma)) > + return mode != MFILL_ATOMIC_CONTINUE; > + > + uffd_ops = vma_get_uffd_ops(vma); > + if (!uffd_ops) > + return false; > + > + uffd_ioctls = uffd_ops->uffd_ioctls; > + switch (mode) { > + case MFILL_ATOMIC_COPY: > + return uffd_ioctls & BIT(_UFFDIO_COPY); > + case MFILL_ATOMIC_ZEROPAGE: > + return uffd_ioctls & BIT(_UFFDIO_ZEROPAGE); > + case MFILL_ATOMIC_CONTINUE: > + if (!(vma->vm_flags & VM_SHARED)) > + return false; > + return uffd_ioctls & BIT(_UFFDIO_CONTINUE); > + case MFILL_ATOMIC_POISON: > + return uffd_ioctls & BIT(_UFFDIO_POISON); > + default: > + return false; > + } > +} > + > static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > unsigned long dst_start, > unsigned long src_start, > @@ -752,11 +821,7 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > dst_vma->vm_flags & VM_SHARED)) > goto out_unlock; > > - /* > - * validate 'mode' now that we know the dst_vma: don't allow > - * a wrprotect copy if the userfaultfd didn't register as WP. > - */ > - if ((flags & MFILL_ATOMIC_WP) && !(dst_vma->vm_flags & VM_UFFD_WP)) > + if (!vma_uffd_ops_supported(dst_vma, flags)) > goto out_unlock; > > /* > @@ -766,12 +831,6 @@ static __always_inline ssize_t mfill_atomic(struct userfaultfd_ctx *ctx, > return mfill_atomic_hugetlb(ctx, dst_vma, dst_start, > src_start, len, flags); > > - if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) > - goto out_unlock; > - if (!vma_is_shmem(dst_vma) && > - uffd_flags_mode_is(flags, MFILL_ATOMIC_CONTINUE)) > - goto out_unlock; > - > while (src_addr < src_start + len) { > pmd_t dst_pmdval; > > -- > 2.49.0 > -- Sincerely yours, Mike.