* [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper
2023-03-27 17:45 [PATCH v2 0/3] Prevent ->map_pages from sleeping Matthew Wilcox (Oracle)
@ 2023-03-27 17:45 ` Matthew Wilcox (Oracle)
2023-03-27 22:57 ` Dave Chinner
2023-03-27 17:45 ` [PATCH v2 2/3] afs: Split afs_pagecache_valid() out of afs_validate() Matthew Wilcox (Oracle)
2023-03-27 17:45 ` [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages Matthew Wilcox (Oracle)
2 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-27 17:45 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), linux-xfs, linux-afs, linux-fsdevel,
linux-mm
XFS doesn't actually need to be holding the XFS_MMAPLOCK_SHARED to do
this. filemap_map_pages() cannot bring new folios into the page cache
and the folio lock is taken during filemap_map_pages() which provides
sufficient protection against a truncation or hole punch.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
fs/xfs/xfs_file.c | 17 +----------------
1 file changed, 1 insertion(+), 16 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 863289aaa441..aede746541f8 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1389,25 +1389,10 @@ xfs_filemap_pfn_mkwrite(
return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true);
}
-static vm_fault_t
-xfs_filemap_map_pages(
- struct vm_fault *vmf,
- pgoff_t start_pgoff,
- pgoff_t end_pgoff)
-{
- struct inode *inode = file_inode(vmf->vma->vm_file);
- vm_fault_t ret;
-
- xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- ret = filemap_map_pages(vmf, start_pgoff, end_pgoff);
- xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
- return ret;
-}
-
static const struct vm_operations_struct xfs_file_vm_ops = {
.fault = xfs_filemap_fault,
.huge_fault = xfs_filemap_huge_fault,
- .map_pages = xfs_filemap_map_pages,
+ .map_pages = filemap_map_pages,
.page_mkwrite = xfs_filemap_page_mkwrite,
.pfn_mkwrite = xfs_filemap_pfn_mkwrite,
};
--
2.39.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper
2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
@ 2023-03-27 22:57 ` Dave Chinner
0 siblings, 0 replies; 7+ messages in thread
From: Dave Chinner @ 2023-03-27 22:57 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, linux-xfs, linux-afs, linux-fsdevel, linux-mm
On Mon, Mar 27, 2023 at 06:45:13PM +0100, Matthew Wilcox (Oracle) wrote:
> XFS doesn't actually need to be holding the XFS_MMAPLOCK_SHARED to do
> this. filemap_map_pages() cannot bring new folios into the page cache
> and the folio lock is taken during filemap_map_pages() which provides
> sufficient protection against a truncation or hole punch.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> fs/xfs/xfs_file.c | 17 +----------------
> 1 file changed, 1 insertion(+), 16 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 863289aaa441..aede746541f8 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1389,25 +1389,10 @@ xfs_filemap_pfn_mkwrite(
> return __xfs_filemap_fault(vmf, PE_SIZE_PTE, true);
> }
>
> -static vm_fault_t
> -xfs_filemap_map_pages(
> - struct vm_fault *vmf,
> - pgoff_t start_pgoff,
> - pgoff_t end_pgoff)
> -{
> - struct inode *inode = file_inode(vmf->vma->vm_file);
> - vm_fault_t ret;
> -
> - xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> - ret = filemap_map_pages(vmf, start_pgoff, end_pgoff);
> - xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED);
> - return ret;
> -}
> -
> static const struct vm_operations_struct xfs_file_vm_ops = {
> .fault = xfs_filemap_fault,
> .huge_fault = xfs_filemap_huge_fault,
> - .map_pages = xfs_filemap_map_pages,
> + .map_pages = filemap_map_pages,
> .page_mkwrite = xfs_filemap_page_mkwrite,
> .pfn_mkwrite = xfs_filemap_pfn_mkwrite,
> };
> --
> 2.39.2
Looks fine.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 2/3] afs: Split afs_pagecache_valid() out of afs_validate()
2023-03-27 17:45 [PATCH v2 0/3] Prevent ->map_pages from sleeping Matthew Wilcox (Oracle)
2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
@ 2023-03-27 17:45 ` Matthew Wilcox (Oracle)
2023-03-27 17:45 ` [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages Matthew Wilcox (Oracle)
2 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-27 17:45 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), linux-xfs, linux-afs, linux-fsdevel,
linux-mm, David Howells
For the map_pages() method, we need a test that does not sleep. The page
fault handler will continue to call the fault() method where we can
sleep and do the full revalidation there.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: David Howells <dhowells@redhat.com>
Tested-by: David Howells <dhowells@redhat.com>
---
fs/afs/file.c | 14 ++------------
fs/afs/inode.c | 27 +++++++++++++++++++--------
fs/afs/internal.h | 1 +
3 files changed, 22 insertions(+), 20 deletions(-)
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 68d6d5dc608d..719b31374879 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -569,20 +569,10 @@ static void afs_vm_close(struct vm_area_struct *vma)
static vm_fault_t afs_vm_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff)
{
struct afs_vnode *vnode = AFS_FS_I(file_inode(vmf->vma->vm_file));
- struct afs_file *af = vmf->vma->vm_file->private_data;
- switch (afs_validate(vnode, af->key)) {
- case 0:
+ if (afs_pagecache_valid(vnode))
return filemap_map_pages(vmf, start_pgoff, end_pgoff);
- case -ENOMEM:
- return VM_FAULT_OOM;
- case -EINTR:
- case -ERESTARTSYS:
- return VM_FAULT_RETRY;
- case -ESTALE:
- default:
- return VM_FAULT_SIGBUS;
- }
+ return 0;
}
static ssize_t afs_file_read_iter(struct kiocb *iocb, struct iov_iter *iter)
diff --git a/fs/afs/inode.c b/fs/afs/inode.c
index 0167e96e5198..b1bdffd5e888 100644
--- a/fs/afs/inode.c
+++ b/fs/afs/inode.c
@@ -667,6 +667,24 @@ bool afs_check_validity(struct afs_vnode *vnode)
return false;
}
+/*
+ * Returns true if the pagecache is still valid. Does not sleep.
+ */
+bool afs_pagecache_valid(struct afs_vnode *vnode)
+{
+ if (unlikely(test_bit(AFS_VNODE_DELETED, &vnode->flags))) {
+ if (vnode->netfs.inode.i_nlink)
+ clear_nlink(&vnode->netfs.inode);
+ return true;
+ }
+
+ if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags) &&
+ afs_check_validity(vnode))
+ return true;
+
+ return false;
+}
+
/*
* validate a vnode/inode
* - there are several things we need to check
@@ -684,14 +702,7 @@ int afs_validate(struct afs_vnode *vnode, struct key *key)
vnode->fid.vid, vnode->fid.vnode, vnode->flags,
key_serial(key));
- if (unlikely(test_bit(AFS_VNODE_DELETED, &vnode->flags))) {
- if (vnode->netfs.inode.i_nlink)
- clear_nlink(&vnode->netfs.inode);
- goto valid;
- }
-
- if (test_bit(AFS_VNODE_CB_PROMISED, &vnode->flags) &&
- afs_check_validity(vnode))
+ if (afs_pagecache_valid(vnode))
goto valid;
down_write(&vnode->validate_lock);
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index ad8523d0d038..5c95df6621f9 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -1171,6 +1171,7 @@ extern struct inode *afs_iget(struct afs_operation *, struct afs_vnode_param *);
extern struct inode *afs_root_iget(struct super_block *, struct key *);
extern bool afs_check_validity(struct afs_vnode *);
extern int afs_validate(struct afs_vnode *, struct key *);
+bool afs_pagecache_valid(struct afs_vnode *);
extern int afs_getattr(struct mnt_idmap *idmap, const struct path *,
struct kstat *, u32, unsigned int);
extern int afs_setattr(struct mnt_idmap *idmap, struct dentry *, struct iattr *);
--
2.39.2
^ permalink raw reply related [flat|nested] 7+ messages in thread* [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages
2023-03-27 17:45 [PATCH v2 0/3] Prevent ->map_pages from sleeping Matthew Wilcox (Oracle)
2023-03-27 17:45 ` [PATCH v2 1/3] xfs: Remove xfs_filemap_map_pages() wrapper Matthew Wilcox (Oracle)
2023-03-27 17:45 ` [PATCH v2 2/3] afs: Split afs_pagecache_valid() out of afs_validate() Matthew Wilcox (Oracle)
@ 2023-03-27 17:45 ` Matthew Wilcox (Oracle)
2023-03-27 23:02 ` Dave Chinner
2 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox (Oracle) @ 2023-03-27 17:45 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle), linux-xfs, linux-afs, linux-fsdevel,
linux-mm
Prevent filesystems from doing things which sleep in their map_pages
method. This is in preparation for a pagefault path protected only
by RCU.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
Documentation/filesystems/locking.rst | 4 ++--
mm/memory.c | 11 ++++++++---
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 922886fefb7f..8a80390446ba 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -645,7 +645,7 @@ ops mmap_lock PageLocked(page)
open: yes
close: yes
fault: yes can return with page locked
-map_pages: yes
+map_pages: read
page_mkwrite: yes can return with page locked
pfn_mkwrite: yes
access: yes
@@ -661,7 +661,7 @@ locked. The VM will unlock the page.
->map_pages() is called when VM asks to map easy accessible pages.
Filesystem should find and map pages associated with offsets from "start_pgoff"
-till "end_pgoff". ->map_pages() is called with page table locked and must
+till "end_pgoff". ->map_pages() is called with the RCU lock held and must
not block. If it's not possible to reach a page without blocking,
filesystem should skip it. Filesystem should use set_pte_range() to setup
page table entry. Pointer to entry associated with the page is passed in
diff --git a/mm/memory.c b/mm/memory.c
index 8071bb17abf2..a7edf6d714db 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
/* The page offset of vmf->address within the VMA. */
pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
pgoff_t from_pte, to_pte;
+ vm_fault_t ret;
/* The PTE offset of the start address, clamped to the VMA. */
from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
@@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
return VM_FAULT_OOM;
}
- return vmf->vma->vm_ops->map_pages(vmf,
- vmf->pgoff + from_pte - pte_off,
- vmf->pgoff + to_pte - pte_off);
+ rcu_read_lock();
+ ret = vmf->vma->vm_ops->map_pages(vmf,
+ vmf->pgoff + from_pte - pte_off,
+ vmf->pgoff + to_pte - pte_off);
+ rcu_read_unlock();
+
+ return ret;
}
/* Return true if we should do read fault-around, false otherwise */
--
2.39.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages
2023-03-27 17:45 ` [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages Matthew Wilcox (Oracle)
@ 2023-03-27 23:02 ` Dave Chinner
2023-03-28 15:45 ` Matthew Wilcox
0 siblings, 1 reply; 7+ messages in thread
From: Dave Chinner @ 2023-03-27 23:02 UTC (permalink / raw)
To: Matthew Wilcox (Oracle)
Cc: Andrew Morton, linux-xfs, linux-afs, linux-fsdevel, linux-mm
On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote:
> Prevent filesystems from doing things which sleep in their map_pages
> method. This is in preparation for a pagefault path protected only
> by RCU.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> Documentation/filesystems/locking.rst | 4 ++--
> mm/memory.c | 11 ++++++++---
> 2 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
> index 922886fefb7f..8a80390446ba 100644
> --- a/Documentation/filesystems/locking.rst
> +++ b/Documentation/filesystems/locking.rst
> @@ -645,7 +645,7 @@ ops mmap_lock PageLocked(page)
> open: yes
> close: yes
> fault: yes can return with page locked
> -map_pages: yes
> +map_pages: read
> page_mkwrite: yes can return with page locked
> pfn_mkwrite: yes
> access: yes
> @@ -661,7 +661,7 @@ locked. The VM will unlock the page.
>
> ->map_pages() is called when VM asks to map easy accessible pages.
> Filesystem should find and map pages associated with offsets from "start_pgoff"
> -till "end_pgoff". ->map_pages() is called with page table locked and must
> +till "end_pgoff". ->map_pages() is called with the RCU lock held and must
> not block. If it's not possible to reach a page without blocking,
> filesystem should skip it. Filesystem should use set_pte_range() to setup
> page table entry. Pointer to entry associated with the page is passed in
> diff --git a/mm/memory.c b/mm/memory.c
> index 8071bb17abf2..a7edf6d714db 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4461,6 +4461,7 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
> /* The page offset of vmf->address within the VMA. */
> pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
> pgoff_t from_pte, to_pte;
> + vm_fault_t ret;
>
> /* The PTE offset of the start address, clamped to the VMA. */
> from_pte = max(ALIGN_DOWN(pte_off, nr_pages),
> @@ -4476,9 +4477,13 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf)
> return VM_FAULT_OOM;
> }
>
> - return vmf->vma->vm_ops->map_pages(vmf,
> - vmf->pgoff + from_pte - pte_off,
> - vmf->pgoff + to_pte - pte_off);
> + rcu_read_lock();
> + ret = vmf->vma->vm_ops->map_pages(vmf,
> + vmf->pgoff + from_pte - pte_off,
> + vmf->pgoff + to_pte - pte_off);
> + rcu_read_unlock();
> +
> + return ret;
Doesn't this mean that the rcu_read_lock/unlock can be removed from
filemap_map_pages()? i.e. all callers are now already under
rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could
be put in filemap_map_pages() if you are worried about callers not
holding it...
Otherwise it looks fine.
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v2 3/3] mm: Hold the RCU read lock over calls to ->map_pages
2023-03-27 23:02 ` Dave Chinner
@ 2023-03-28 15:45 ` Matthew Wilcox
0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2023-03-28 15:45 UTC (permalink / raw)
To: Dave Chinner; +Cc: Andrew Morton, linux-xfs, linux-afs, linux-fsdevel, linux-mm
On Tue, Mar 28, 2023 at 10:02:06AM +1100, Dave Chinner wrote:
> On Mon, Mar 27, 2023 at 06:45:15PM +0100, Matthew Wilcox (Oracle) wrote:
> > Prevent filesystems from doing things which sleep in their map_pages
> > method. This is in preparation for a pagefault path protected only
> > by RCU.
> > + rcu_read_lock();
> > + ret = vmf->vma->vm_ops->map_pages(vmf,
> > + vmf->pgoff + from_pte - pte_off,
> > + vmf->pgoff + to_pte - pte_off);
> > + rcu_read_unlock();
> > +
> > + return ret;
>
> Doesn't this mean that the rcu_read_lock/unlock can be removed from
> filemap_map_pages()? i.e. all callers are now already under
> rcu_read_lock(). Maybe WARN_ON_ONCE(!rcu_read_lock_held()) could
> be put in filemap_map_pages() if you are worried about callers not
> holding it...
Yes, it could now be removed. I wasn't too bothered because it's so
cheap (either a noop, or an inc/dec of a CPU-local variable). I don't
think we need the WARN becaause there's one embedded in the XArray
code (must be holding the spinlock or the RCU read lock to iterate
the XArray).
^ permalink raw reply [flat|nested] 7+ messages in thread