[PATCH v2 0/3] Prevent ->map_pages from sleeping

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/3] Prevent ->map_pages from sleeping
@ 2023-03-27 17:45 Matthew Wilcox (Oracle)
  2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-27 17:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), linux-xfs, linux-afs, linux-fsdevel,
	linux-mm

In preparation for a larger patch series which will handle (some, easy)
page faults protected only by RCU, change the two filesystems which have
sleeping locks to not take them and hold the RCU lock around calls to
->map_page to prevent other filesystems from adding sleeping locks.

v2:
 - Add tags from David Howells
 - Go into more detail about the locking in the XFS patch

Matthew Wilcox (Oracle) (3):
  xfs: Remove xfs_filemap_map_pages() wrapper
  afs: Split afs_pagecache_valid() out of afs_validate()
  mm: Hold the RCU read lock over calls to ->map_pages

 Documentation/filesystems/locking.rst |  4 ++--
 fs/afs/file.c                         | 14 ++------------
 fs/afs/inode.c                        | 27 +++++++++++++++++++--------
 fs/afs/internal.h                     |  1 +
 fs/xfs/xfs_file.c                     | 17 +----------------
 mm/memory.c                           | 11 ++++++++---
 6 files changed, 33 insertions(+), 41 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper
  2023-03-27 17:45 [PATCH v2 0/3] Prevent ->map_pages from sleeping Matthew Wilcox (Oracle)
@ 2023-03-27 17:45 ` Matthew Wilcox (Oracle)
  2023-03-27 22:57   ` Dave Chinner
  2023-03-27 17:45 ` [PATCH v2 2/3] afs: Split afs_pagecache_valid() out of afs_validate() Matthew Wilcox (Oracle)
  2023-03-27 17:45 ` [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages Matthew Wilcox (Oracle)
  2 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-27 17:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), linux-xfs, linux-afs, linux-fsdevel,
	linux-mm

XFS doesn't actually need to be holding the XFS_MMAPLOCK_SHARED to do
this.  filemap_map_pages() cannot bring new folios into the page cache
and the folio lock is taken during filemap_map_pages() which provides
sufficient protection against a truncation or hole punch.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/xfs/xfs_file.c | 17 +----------------
 1 file changed, 1 insertion(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 863289aaa441..aede746541f8 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1389,25 +1389,10 @@ xfs_filemap_pfn_mkwrite(
 	return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true);
 }
 
-static vm_fault_t
-xfs_filemap_map_pages(
-	struct vm_fault		*vmf,
-	pgoff_t			start_pgoff,
-	pgoff_t			end_pgoff)
-{
-	struct inode		*inode = file_inode(vmf->vma->vm_file);
-	vm_fault_t ret;
-
-	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	ret = filemap_map_pages(vmf, start_pgoff, end_pgoff);
-	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
-	return ret;
-}
-
 static const struct vm_operations_struct xfs_file_vm_ops = {
 	.fault		= xfs_filemap_fault,
 	.huge_fault	= xfs_filemap_huge_fault,
-	.map_pages	= xfs_filemap_map_pages,
+	.map_pages	= filemap_map_pages,
 	.page_mkwrite	= xfs_filemap_page_mkwrite,
 	.pfn_mkwrite	= xfs_filemap_pfn_mkwrite,
 };
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper
  2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
@ 2023-03-27 22:57   ` Dave Chinner
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2023-03-27 22:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, linux-xfs, linux-afs, linux-fsdevel, linux-mm

On Mon, Mar 27, 2023 at 06:45:13PM +0100, Matthew Wilcox (Oracle) wrote:
> XFS doesn't actually need to be holding the XFS_MMAPLOCK_SHARED to do
> this.  filemap_map_pages() cannot bring new folios into the page cache
> and the folio lock is taken during filemap_map_pages() which provides
> sufficient protection against a truncation or hole punch.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/xfs/xfs_file.c | 17 +----------------
>  1 file changed, 1 insertion(+), 16 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 863289aaa441..aede746541f8 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1389,25 +1389,10 @@ xfs_filemap_pfn_mkwrite(
>  	return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true);
>  }
>  
> -static vm_fault_t
> -xfs_filemap_map_pages(
> -	struct vm_fault		*vmf,
> -	pgoff_t			start_pgoff,
> -	pgoff_t			end_pgoff)
> -{
> -	struct inode		*inode = file_inode(vmf->vma->vm_file);
> -	vm_fault_t ret;
> -
> -	xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> -	ret = filemap_map_pages(vmf, start_pgoff, end_pgoff);
> -	xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> -	return ret;
> -}
> -
>  static const struct vm_operations_struct xfs_file_vm_ops = {
>  	.fault		= xfs_filemap_fault,
>  	.huge_fault	= xfs_filemap_huge_fault,
> -	.map_pages	= xfs_filemap_map_pages,
> +	.map_pages	= filemap_map_pages,
>  	.page_mkwrite	= xfs_filemap_page_mkwrite,
>  	.pfn_mkwrite	= xfs_filemap_pfn_mkwrite,
>  };
> -- 
> 2.39.2

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] afs: Split afs_pagecache_valid() out of afs_validate()
  2023-03-27 17:45 [PATCH v2 0/3] Prevent ->map_pages from sleeping Matthew Wilcox (Oracle)
  2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
@ 2023-03-27 17:45 ` Matthew Wilcox (Oracle)
  2023-03-27 17:45 ` [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages Matthew Wilcox (Oracle)
  2 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-27 17:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), linux-xfs, linux-afs, linux-fsdevel,
	linux-mm, David Howells

For the map_pages() method, we need a test that does not sleep.  The page
fault handler will continue to call the fault() method where we can
sleep and do the full revalidation there.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
---
 fs/afs/file.c     | 14 ++------------
 fs/afs/inode.c    | 27 +++++++++++++++++++--------
 fs/afs/internal.h |  1 +
 3 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/fs/afs/file.c b/fs/afs/file.c
index 68d6d5dc608d..719b31374879 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -569,20 +569,10 @@ static void afs_vm_close(struct vm_area_struct *vma)
 static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff)
 {
 	struct afs_vnode *vnode = AFS_FS_I(file_inode(vmf->vma->vm_file));
-	struct afs_file *af = vmf->vma->vm_file->private_data;
 
-	switch (afs_validate(vnode, af->key)) {
-	case 0:
+	if (afs_pagecache_valid(vnode))
 		return filemap_map_pages(vmf, start_pgoff, end_pgoff);
-	case -ENOMEM:
-		return VM_FAULT_OOM;
-	case -EINTR:
-	case -ERESTARTSYS:
-		return VM_FAULT_RETRY;
-	case -ESTALE:
-	default:
-		return VM_FAULT_SIGBUS;
-	}
+	return 0;
 }
 
 static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 0167e96e5198..b1bdffd5e888 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -667,6 +667,24 @@ bool afs_check_validity(struct afs_vnode *vnode)
 	return false;
 }
 
+/*
+ * Returns true if the pagecache is still valid.  Does not sleep.
+ */
+bool afs_pagecache_valid(struct afs_vnode *vnode)
+{
+	if (unlikely(test_bit(AFS_VNODE_DELETED, &vnode->flags))) {
+		if (vnode->netfs.inode.i_nlink)
+			clear_nlink(&vnode->netfs.inode);
+		return true;
+	}
+
+	if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags) &&
+	    afs_check_validity(vnode))
+		return true;
+
+	return false;
+}
+
 /*
  * validate a vnode/inode
  * - there are several things we need to check
@@ -684,14 +702,7 @@ int afs_validate(struct afs_vnode *vnode, struct key *key)
 	       vnode->fid.vid, vnode->fid.vnode, vnode->flags,
 	       key_serial(key));
 
-	if (unlikely(test_bit(AFS_VNODE_DELETED, &vnode->flags))) {
-		if (vnode->netfs.inode.i_nlink)
-			clear_nlink(&vnode->netfs.inode);
-		goto valid;
-	}
-
-	if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags) &&
-	    afs_check_validity(vnode))
+	if (afs_pagecache_valid(vnode))
 		goto valid;
 
 	down_write(&vnode->validate_lock);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ad8523d0d038..5c95df6621f9 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1171,6 +1171,7 @@ extern struct inode *afs_iget(struct afs_operation *, struct afs_vnode_param *);
 extern struct inode *afs_root_iget(struct super_block *, struct key *);
 extern bool afs_check_validity(struct afs_vnode *);
 extern int afs_validate(struct afs_vnode *, struct key *);
+bool afs_pagecache_valid(struct afs_vnode *);
 extern int afs_getattr(struct mnt_idmap *idmap, const struct path *,
 		       struct kstat *, u32, unsigned int);
 extern int afs_setattr(struct mnt_idmap *idmap, struct dentry *, struct iattr *);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages
  2023-03-27 17:45 [PATCH v2 0/3] Prevent ->map_pages from sleeping Matthew Wilcox (Oracle)
  2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
  2023-03-27 17:45 ` [PATCH v2 2/3] afs: Split afs_pagecache_valid() out of afs_validate() Matthew Wilcox (Oracle)
@ 2023-03-27 17:45 ` Matthew Wilcox (Oracle)
  2023-03-27 23:02   ` Dave Chinner
  2 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-27 17:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Wilcox (Oracle), linux-xfs, linux-afs, linux-fsdevel,
	linux-mm

Prevent filesystems from doing things which sleep in their map_pages
method.  This is in preparation for a pagefault path protected only
by RCU.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 Documentation/filesystems/locking.rst |  4 ++--
 mm/memory.c                           | 11 ++++++++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 922886fefb7f..8a80390446ba 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -645,7 +645,7 @@ ops		mmap_lock	PageLocked(page)
 open:		yes
 close:		yes
 fault:		yes		can return with page locked
-map_pages:	yes
+map_pages:	read
 page_mkwrite:	yes		can return with page locked
 pfn_mkwrite:	yes
 access:		yes
@@ -661,7 +661,7 @@ locked. The VM will unlock the page.
 
 ->map_pages() is called when VM asks to map easy accessible pages.
 Filesystem should find and map pages associated with offsets from "start_pgoff"
-till "end_pgoff". ->map_pages() is called with page table locked and must
+till "end_pgoff". ->map_pages() is called with the RCU lock held and must
 not block.  If it's not possible to reach a page without blocking,
 filesystem should skip it. Filesystem should use set_pte_range() to setup
 page table entry. Pointer to entry associated with the page is passed in
diff --git a/mm/memory.c b/mm/memory.c
index 8071bb17abf2..a7edf6d714db 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
 	/* The page offset of vmf->address within the VMA. */
 	pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
 	pgoff_t from_pte, to_pte;
+	vm_fault_t ret;
 
 	/* The PTE offset of the start address, clamped to the VMA. */
 	from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
@@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
 			return VM_FAULT_OOM;
 	}
 
-	return vmf->vma->vm_ops->map_pages(vmf,
-		vmf->pgoff + from_pte - pte_off,
-		vmf->pgoff + to_pte - pte_off);
+	rcu_read_lock();
+	ret = vmf->vma->vm_ops->map_pages(vmf,
+			vmf->pgoff + from_pte - pte_off,
+			vmf->pgoff + to_pte - pte_off);
+	rcu_read_unlock();
+
+	return ret;
 }
 
 /* Return true if we should do read fault-around, false otherwise */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages
  2023-03-27 17:45 ` [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages Matthew Wilcox (Oracle)
@ 2023-03-27 23:02   ` Dave Chinner
  2023-03-28 15:45     ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2023-03-27 23:02 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, linux-xfs, linux-afs, linux-fsdevel, linux-mm

On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote:
> Prevent filesystems from doing things which sleep in their map_pages
> method.  This is in preparation for a pagefault path protected only
> by RCU.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  Documentation/filesystems/locking.rst |  4 ++--
>  mm/memory.c                           | 11 ++++++++---
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 922886fefb7f..8a80390446ba 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -645,7 +645,7 @@ ops		mmap_lock	PageLocked(page)
>  open:		yes
>  close:		yes
>  fault:		yes		can return with page locked
> -map_pages:	yes
> +map_pages:	read
>  page_mkwrite:	yes		can return with page locked
>  pfn_mkwrite:	yes
>  access:		yes
> @@ -661,7 +661,7 @@ locked. The VM will unlock the page.
>  
>  ->map_pages() is called when VM asks to map easy accessible pages.
>  Filesystem should find and map pages associated with offsets from "start_pgoff"
> -till "end_pgoff". ->map_pages() is called with page table locked and must
> +till "end_pgoff". ->map_pages() is called with the RCU lock held and must
>  not block.  If it's not possible to reach a page without blocking,
>  filesystem should skip it. Filesystem should use set_pte_range() to setup
>  page table entry. Pointer to entry associated with the page is passed in
> diff --git a/mm/memory.c b/mm/memory.c
> index 8071bb17abf2..a7edf6d714db 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
>  	/* The page offset of vmf->address within the VMA. */
>  	pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
>  	pgoff_t from_pte, to_pte;
> +	vm_fault_t ret;
>  
>  	/* The PTE offset of the start address, clamped to the VMA. */
>  	from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
> @@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
>  			return VM_FAULT_OOM;
>  	}
>  
> -	return vmf->vma->vm_ops->map_pages(vmf,
> -		vmf->pgoff + from_pte - pte_off,
> -		vmf->pgoff + to_pte - pte_off);
> +	rcu_read_lock();
> +	ret = vmf->vma->vm_ops->map_pages(vmf,
> +			vmf->pgoff + from_pte - pte_off,
> +			vmf->pgoff + to_pte - pte_off);
> +	rcu_read_unlock();
> +
> +	return ret;

Doesn't this mean that the rcu_read_lock/unlock can be removed from
filemap_map_pages()? i.e. all callers are now already under
rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could
be put in filemap_map_pages() if you are worried about callers not
holding it...

Otherwise it looks fine.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages
  2023-03-27 23:02   ` Dave Chinner
@ 2023-03-28 15:45     ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2023-03-28 15:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Andrew Morton, linux-xfs, linux-afs, linux-fsdevel, linux-mm

On Tue, Mar 28, 2023 at 10:02:06AM +1100, Dave Chinner wrote:
> On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote:
> > Prevent filesystems from doing things which sleep in their map_pages
> > method.  This is in preparation for a pagefault path protected only
> > by RCU.
> > +	rcu_read_lock();
> > +	ret = vmf->vma->vm_ops->map_pages(vmf,
> > +			vmf->pgoff + from_pte - pte_off,
> > +			vmf->pgoff + to_pte - pte_off);
> > +	rcu_read_unlock();
> > +
> > +	return ret;
> 
> Doesn't this mean that the rcu_read_lock/unlock can be removed from
> filemap_map_pages()? i.e. all callers are now already under
> rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could
> be put in filemap_map_pages() if you are worried about callers not
> holding it...

Yes, it could now be removed.  I wasn't too bothered because it's so
cheap (either a noop, or an inc/dec of a CPU-local variable).  I don't
think we need the WARN becaause there's one embedded in the XArray
code (must be holding the spinlock or the RCU read lock to iterate
the XArray).

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-03-28 15:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-03-27 17:45 [PATCH v2 0/3] Prevent ->map_pages from sleeping Matthew Wilcox (Oracle)
2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
2023-03-27 22:57   ` Dave Chinner
2023-03-27 17:45 ` [PATCH v2 2/3] afs: Split afs_pagecache_valid() out of afs_validate() Matthew Wilcox (Oracle)
2023-03-27 17:45 ` [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages Matthew Wilcox (Oracle)
2023-03-27 23:02   ` Dave Chinner
2023-03-28 15:45     ` Matthew Wilcox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).