* [PATCH 0/5] Assortment of I/O fixes for the NFS client
@ 2025-09-05 19:35 Trond Myklebust
2025-09-05 19:35 ` [PATCH 1/5] NFS: Protect against 'eof page pollution' Trond Myklebust
` (5 more replies)
0 siblings, 6 replies; 18+ messages in thread
From: Trond Myklebust @ 2025-09-05 19:35 UTC (permalink / raw)
To: linux-nfs
From: Trond Myklebust <trond.myklebust@hammerspace.com>
The following patch series addresses a series of issues that affect the
NFS client's I/O path. The first patch is needed to fix a corner case
when using mmap() to map the tail end of the file into memory, and then
later extending the file length (see recent xfstests generic/363).
The second set is used to ensure correct ordering of O_DIRECT operations
with truncate + fallocate.
Finally, there are fixes to ensure folio invalidation is correct w.r.t.
the VFS documentation, and a cleanup of the code that decides when a
folio can be marked as up to date.
Trond Myklebust (5):
NFS: Protect against 'eof page pollution'
NFS: Serialise O_DIRECT i/o and truncate()
NFSv4.2: Serialise O_DIRECT i/o and fallocate()
NFS: nfs_invalidate_folio() must observe the offset and size arguments
NFS: Fix the marking of the folio as up to date
fs/nfs/file.c | 40 +++++++++++++++++++++++++++++++---
fs/nfs/inode.c | 13 +++++++++---
fs/nfs/internal.h | 12 +++++++++++
fs/nfs/io.c | 13 ++----------
fs/nfs/nfs42proc.c | 15 ++++++++++---
fs/nfs/nfstrace.h | 1 +
fs/nfs/write.c | 53 ++++++----------------------------------------
7 files changed, 80 insertions(+), 67 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/5] NFS: Protect against 'eof page pollution' 2025-09-05 19:35 [PATCH 0/5] Assortment of I/O fixes for the NFS client Trond Myklebust @ 2025-09-05 19:35 ` Trond Myklebust 2026-01-24 21:01 ` Lorenzo Stoakes 2025-09-05 19:35 ` [PATCH 2/5] NFS: Serialise O_DIRECT i/o and truncate() Trond Myklebust ` (4 subsequent siblings) 5 siblings, 1 reply; 18+ messages in thread From: Trond Myklebust @ 2025-09-05 19:35 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> This commit fixes the failing xfstest 'generic/363'. When the user mmaps() an area that extends beyond the end of file, and proceeds to write data into the folio that straddles that eof, we're required to discard that folio data if the user calls some function that extends the file length. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/file.c | 33 +++++++++++++++++++++++++++++++++ fs/nfs/inode.c | 9 +++++++-- fs/nfs/internal.h | 2 ++ fs/nfs/nfs42proc.c | 14 +++++++++++--- fs/nfs/nfstrace.h | 1 + 5 files changed, 54 insertions(+), 5 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 86e36c630f09..af2fdbfcbbf6 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -28,6 +28,7 @@ #include <linux/mm.h> #include <linux/pagemap.h> #include <linux/gfp.h> +#include <linux/rmap.h> #include <linux/swap.h> #include <linux/compaction.h> @@ -280,6 +281,37 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync) } EXPORT_SYMBOL_GPL(nfs_file_fsync); +void nfs_truncate_last_folio(struct address_space *mapping, loff_t from, + loff_t to) +{ + struct folio *folio; + + if (from >= to) + return; + + folio = filemap_lock_folio(mapping, from >> PAGE_SHIFT); + if (IS_ERR(folio)) + return; + + if (folio_mkclean(folio)) + folio_mark_dirty(folio); + + if (folio_test_uptodate(folio)) { + loff_t fpos = folio_pos(folio); + size_t offset = from - fpos; + size_t end = folio_size(folio); + + if (to - fpos < end) + end = to - fpos; + folio_zero_segment(folio, offset, end); + trace_nfs_size_truncate_folio(mapping->host, to); + } + + folio_unlock(folio); + folio_put(folio); +} +EXPORT_SYMBOL_GPL(nfs_truncate_last_folio); + /* * Decide whether a read/modify/write cycle may be more efficient * then a modify/write/read cycle when writing to a page in the @@ -356,6 +388,7 @@ static int nfs_write_begin(const struct kiocb *iocb, dfprintk(PAGECACHE, "NFS: write_begin(%pD2(%lu), %u@%lld)\n", file, mapping->host->i_ino, len, (long long) pos); + nfs_truncate_last_folio(mapping, i_size_read(mapping->host), pos + 1); fgp |= fgf_set_order(len); start: diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 338ef77ae423..0b141feacc52 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -716,6 +716,7 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, { struct inode *inode = d_inode(dentry); struct nfs_fattr *fattr; + loff_t oldsize = i_size_read(inode); int error = 0; nfs_inc_stats(inode, NFSIOS_VFSSETATTR); @@ -731,7 +732,7 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, if (error) return error; - if (attr->ia_size == i_size_read(inode)) + if (attr->ia_size == oldsize) attr->ia_valid &= ~ATTR_SIZE; } @@ -777,8 +778,12 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, } error = NFS_PROTO(inode)->setattr(dentry, fattr, attr); - if (error == 0) + if (error == 0) { + if (attr->ia_valid & ATTR_SIZE) + nfs_truncate_last_folio(inode->i_mapping, oldsize, + attr->ia_size); error = nfs_refresh_inode(inode, fattr); + } nfs_free_fattr(fattr); out: trace_nfs_setattr_exit(inode, error); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 74d712b58423..1433ae13dba0 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -437,6 +437,8 @@ int nfs_file_release(struct inode *, struct file *); int nfs_lock(struct file *, int, struct file_lock *); int nfs_flock(struct file *, int, struct file_lock *); int nfs_check_flags(int); +void nfs_truncate_last_folio(struct address_space *mapping, loff_t from, + loff_t to); /* inode.c */ extern struct workqueue_struct *nfsiod_workqueue; diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 01c01f45358b..4420b8740e2f 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -137,6 +137,7 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len) .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ALLOCATE], }; struct inode *inode = file_inode(filep); + loff_t oldsize = i_size_read(inode); int err; if (!nfs_server_capable(inode, NFS_CAP_ALLOCATE)) @@ -145,7 +146,11 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len) inode_lock(inode); err = nfs42_proc_fallocate(&msg, filep, offset, len); - if (err == -EOPNOTSUPP) + + if (err == 0) + nfs_truncate_last_folio(inode->i_mapping, oldsize, + offset + len); + else if (err == -EOPNOTSUPP) NFS_SERVER(inode)->caps &= ~(NFS_CAP_ALLOCATE | NFS_CAP_ZERO_RANGE); @@ -183,6 +188,7 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len) .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ZERO_RANGE], }; struct inode *inode = file_inode(filep); + loff_t oldsize = i_size_read(inode); int err; if (!nfs_server_capable(inode, NFS_CAP_ZERO_RANGE)) @@ -191,9 +197,11 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len) inode_lock(inode); err = nfs42_proc_fallocate(&msg, filep, offset, len); - if (err == 0) + if (err == 0) { + nfs_truncate_last_folio(inode->i_mapping, oldsize, + offset + len); truncate_pagecache_range(inode, offset, (offset + len) -1); - if (err == -EOPNOTSUPP) + } else if (err == -EOPNOTSUPP) NFS_SERVER(inode)->caps &= ~NFS_CAP_ZERO_RANGE; inode_unlock(inode); diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h index 96b1323318c2..627115179795 100644 --- a/fs/nfs/nfstrace.h +++ b/fs/nfs/nfstrace.h @@ -272,6 +272,7 @@ DECLARE_EVENT_CLASS(nfs_update_size_class, TP_ARGS(inode, new_size)) DEFINE_NFS_UPDATE_SIZE_EVENT(truncate); +DEFINE_NFS_UPDATE_SIZE_EVENT(truncate_folio); DEFINE_NFS_UPDATE_SIZE_EVENT(wcc); DEFINE_NFS_UPDATE_SIZE_EVENT(update); DEFINE_NFS_UPDATE_SIZE_EVENT(grow); -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/5] NFS: Protect against 'eof page pollution' 2025-09-05 19:35 ` [PATCH 1/5] NFS: Protect against 'eof page pollution' Trond Myklebust @ 2026-01-24 21:01 ` Lorenzo Stoakes 2026-01-24 22:20 ` Trond Myklebust 0 siblings, 1 reply; 18+ messages in thread From: Lorenzo Stoakes @ 2026-01-24 21:01 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Fri, Sep 05, 2025 at 03:35:16PM -0400, Trond Myklebust wrote: > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > This commit fixes the failing xfstest 'generic/363'. > > When the user mmaps() an area that extends beyond the end of file, and > proceeds to write data into the folio that straddles that eof, we're > required to discard that folio data if the user calls some function that > extends the file length. > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Hi Trond, Sorry to dig up an old patch but I wanted to ask about this change: > --- > fs/nfs/file.c | 33 +++++++++++++++++++++++++++++++++ > fs/nfs/inode.c | 9 +++++++-- > fs/nfs/internal.h | 2 ++ > fs/nfs/nfs42proc.c | 14 +++++++++++--- > fs/nfs/nfstrace.h | 1 + > 5 files changed, 54 insertions(+), 5 deletions(-) > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > index 86e36c630f09..af2fdbfcbbf6 100644 > --- a/fs/nfs/file.c > +++ b/fs/nfs/file.c > @@ -28,6 +28,7 @@ > #include <linux/mm.h> > #include <linux/pagemap.h> > #include <linux/gfp.h> > +#include <linux/rmap.h> > #include <linux/swap.h> > #include <linux/compaction.h> > > @@ -280,6 +281,37 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync) > } > EXPORT_SYMBOL_GPL(nfs_file_fsync); > > +void nfs_truncate_last_folio(struct address_space *mapping, loff_t from, > + loff_t to) So this seems to be a slightly adjusted version of pagecache_isize_extend(), what was it about that that didn't work for you? It seems the main differences are - block size alignment (surely you still need that though?) switching folio_test_dirty() for folio_test_uptodate() (I'm not sure about this change though?) and adding the trace line. > +{ > + struct folio *folio; > + > + if (from >= to) > + return; > + > + folio = filemap_lock_folio(mapping, from >> PAGE_SHIFT); > + if (IS_ERR(folio)) > + return; > + > + if (folio_mkclean(folio)) > + folio_mark_dirty(folio); > + > + if (folio_test_uptodate(folio)) { Really I'm confused as to why you are using folio_test_uptodate() here rather than folio_test_dirty() as in pagecache_isize_extend()? Wouldn't writeback have already handled this if the folio is up to date, or if the folio was never dirty in the first place? Shouldn't we only need to do this if the folio is dirty? I may well be missing something here as I'm an mm developer not an fs one! So just curious to understand this change. Thanks, Lorenzo > + loff_t fpos = folio_pos(folio); > + size_t offset = from - fpos; > + size_t end = folio_size(folio); > + > + if (to - fpos < end) > + end = to - fpos; > + folio_zero_segment(folio, offset, end); > + trace_nfs_size_truncate_folio(mapping->host, to); > + } > + > + folio_unlock(folio); > + folio_put(folio); > +} > +EXPORT_SYMBOL_GPL(nfs_truncate_last_folio); > + > /* > * Decide whether a read/modify/write cycle may be more efficient > * then a modify/write/read cycle when writing to a page in the > @@ -356,6 +388,7 @@ static int nfs_write_begin(const struct kiocb *iocb, > > dfprintk(PAGECACHE, "NFS: write_begin(%pD2(%lu), %u@%lld)\n", > file, mapping->host->i_ino, len, (long long) pos); > + nfs_truncate_last_folio(mapping, i_size_read(mapping->host), pos + 1); > > fgp |= fgf_set_order(len); > start: > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c > index 338ef77ae423..0b141feacc52 100644 > --- a/fs/nfs/inode.c > +++ b/fs/nfs/inode.c > @@ -716,6 +716,7 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, > { > struct inode *inode = d_inode(dentry); > struct nfs_fattr *fattr; > + loff_t oldsize = i_size_read(inode); > int error = 0; > > nfs_inc_stats(inode, NFSIOS_VFSSETATTR); > @@ -731,7 +732,7 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, > if (error) > return error; > > - if (attr->ia_size == i_size_read(inode)) > + if (attr->ia_size == oldsize) > attr->ia_valid &= ~ATTR_SIZE; > } > > @@ -777,8 +778,12 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, > } > > error = NFS_PROTO(inode)->setattr(dentry, fattr, attr); > - if (error == 0) > + if (error == 0) { > + if (attr->ia_valid & ATTR_SIZE) > + nfs_truncate_last_folio(inode->i_mapping, oldsize, > + attr->ia_size); > error = nfs_refresh_inode(inode, fattr); > + } > nfs_free_fattr(fattr); > out: > trace_nfs_setattr_exit(inode, error); > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h > index 74d712b58423..1433ae13dba0 100644 > --- a/fs/nfs/internal.h > +++ b/fs/nfs/internal.h > @@ -437,6 +437,8 @@ int nfs_file_release(struct inode *, struct file *); > int nfs_lock(struct file *, int, struct file_lock *); > int nfs_flock(struct file *, int, struct file_lock *); > int nfs_check_flags(int); > +void nfs_truncate_last_folio(struct address_space *mapping, loff_t from, > + loff_t to); > > /* inode.c */ > extern struct workqueue_struct *nfsiod_workqueue; > diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c > index 01c01f45358b..4420b8740e2f 100644 > --- a/fs/nfs/nfs42proc.c > +++ b/fs/nfs/nfs42proc.c > @@ -137,6 +137,7 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len) > .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ALLOCATE], > }; > struct inode *inode = file_inode(filep); > + loff_t oldsize = i_size_read(inode); > int err; > > if (!nfs_server_capable(inode, NFS_CAP_ALLOCATE)) > @@ -145,7 +146,11 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len) > inode_lock(inode); > > err = nfs42_proc_fallocate(&msg, filep, offset, len); > - if (err == -EOPNOTSUPP) > + > + if (err == 0) > + nfs_truncate_last_folio(inode->i_mapping, oldsize, > + offset + len); > + else if (err == -EOPNOTSUPP) > NFS_SERVER(inode)->caps &= ~(NFS_CAP_ALLOCATE | > NFS_CAP_ZERO_RANGE); > > @@ -183,6 +188,7 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len) > .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ZERO_RANGE], > }; > struct inode *inode = file_inode(filep); > + loff_t oldsize = i_size_read(inode); > int err; > > if (!nfs_server_capable(inode, NFS_CAP_ZERO_RANGE)) > @@ -191,9 +197,11 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len) > inode_lock(inode); > > err = nfs42_proc_fallocate(&msg, filep, offset, len); > - if (err == 0) > + if (err == 0) { > + nfs_truncate_last_folio(inode->i_mapping, oldsize, > + offset + len); > truncate_pagecache_range(inode, offset, (offset + len) -1); > - if (err == -EOPNOTSUPP) > + } else if (err == -EOPNOTSUPP) > NFS_SERVER(inode)->caps &= ~NFS_CAP_ZERO_RANGE; > > inode_unlock(inode); > diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h > index 96b1323318c2..627115179795 100644 > --- a/fs/nfs/nfstrace.h > +++ b/fs/nfs/nfstrace.h > @@ -272,6 +272,7 @@ DECLARE_EVENT_CLASS(nfs_update_size_class, > TP_ARGS(inode, new_size)) > > DEFINE_NFS_UPDATE_SIZE_EVENT(truncate); > +DEFINE_NFS_UPDATE_SIZE_EVENT(truncate_folio); > DEFINE_NFS_UPDATE_SIZE_EVENT(wcc); > DEFINE_NFS_UPDATE_SIZE_EVENT(update); > DEFINE_NFS_UPDATE_SIZE_EVENT(grow); > -- > 2.51.0 > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/5] NFS: Protect against 'eof page pollution' 2026-01-24 21:01 ` Lorenzo Stoakes @ 2026-01-24 22:20 ` Trond Myklebust 2026-01-26 13:03 ` Lorenzo Stoakes 0 siblings, 1 reply; 18+ messages in thread From: Trond Myklebust @ 2026-01-24 22:20 UTC (permalink / raw) To: Lorenzo Stoakes; +Cc: linux-nfs On Sat, 2026-01-24 at 21:01 +0000, Lorenzo Stoakes wrote: > On Fri, Sep 05, 2025 at 03:35:16PM -0400, Trond Myklebust wrote: > > From: Trond Myklebust <trond.myklebust@hammerspace.com> > > > > This commit fixes the failing xfstest 'generic/363'. > > > > When the user mmaps() an area that extends beyond the end of file, > > and > > proceeds to write data into the folio that straddles that eof, > > we're > > required to discard that folio data if the user calls some function > > that > > extends the file length. > > > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> > > Hi Trond, > > Sorry to dig up an old patch but I wanted to ask about this change: > > > --- > > fs/nfs/file.c | 33 +++++++++++++++++++++++++++++++++ > > fs/nfs/inode.c | 9 +++++++-- > > fs/nfs/internal.h | 2 ++ > > fs/nfs/nfs42proc.c | 14 +++++++++++--- > > fs/nfs/nfstrace.h | 1 + > > 5 files changed, 54 insertions(+), 5 deletions(-) > > > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > > index 86e36c630f09..af2fdbfcbbf6 100644 > > --- a/fs/nfs/file.c > > +++ b/fs/nfs/file.c > > @@ -28,6 +28,7 @@ > > #include <linux/mm.h> > > #include <linux/pagemap.h> > > #include <linux/gfp.h> > > +#include <linux/rmap.h> > > #include <linux/swap.h> > > #include <linux/compaction.h> > > > > @@ -280,6 +281,37 @@ nfs_file_fsync(struct file *file, loff_t > > start, loff_t end, int datasync) > > } > > EXPORT_SYMBOL_GPL(nfs_file_fsync); > > > > +void nfs_truncate_last_folio(struct address_space *mapping, loff_t > > from, > > + loff_t to) > > So this seems to be a slightly adjusted version of > pagecache_isize_extend(), > what was it about that that didn't work for you? > > It seems the main differences are - block size alignment (surely you > still need > that though?) switching folio_test_dirty() for folio_test_uptodate() > (I'm not > sure about this change though?) and adding the trace line. 1. NFS is not a block protocol. Reads and writes are byte aligned. 2. The test for bsize >= PAGE_SIZE is nonsense for a byte aligned filesystem. 3. NFS does care about using folio_mkclean() to fix races between an application that is writing to the folio, and any zeroing of the data that may result from the file truncation. 4. The existing folio dirty state isn't of interest here, since NFS won't extend existing writes to cover the part of the folio that lies after the old eof. 5. The folio uptodate state is of interest, since any future pagecache read needs to see zeroed bytes starting at the old eof and extending either to the offset at the end of the folio, or to the new eof (whichever of the two is smaller). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trondmy@kernel.org, trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/5] NFS: Protect against 'eof page pollution' 2026-01-24 22:20 ` Trond Myklebust @ 2026-01-26 13:03 ` Lorenzo Stoakes 0 siblings, 0 replies; 18+ messages in thread From: Lorenzo Stoakes @ 2026-01-26 13:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Sat, Jan 24, 2026 at 05:20:19PM -0500, Trond Myklebust wrote: > > > diff --git a/fs/nfs/file.c b/fs/nfs/file.c > > > index 86e36c630f09..af2fdbfcbbf6 100644 > > > --- a/fs/nfs/file.c > > > +++ b/fs/nfs/file.c > > > @@ -28,6 +28,7 @@ > > > #include <linux/mm.h> > > > #include <linux/pagemap.h> > > > #include <linux/gfp.h> > > > +#include <linux/rmap.h> > > > #include <linux/swap.h> > > > #include <linux/compaction.h> > > > > > > @@ -280,6 +281,37 @@ nfs_file_fsync(struct file *file, loff_t > > > start, loff_t end, int datasync) > > > } > > > EXPORT_SYMBOL_GPL(nfs_file_fsync); > > > > > > +void nfs_truncate_last_folio(struct address_space *mapping, loff_t > > > from, > > > + loff_t to) > > > > So this seems to be a slightly adjusted version of > > pagecache_isize_extend(), > > what was it about that that didn't work for you? > > > > It seems the main differences are - block size alignment (surely you > > still need > > that though?) switching folio_test_dirty() for folio_test_uptodate() > > (I'm not > > sure about this change though?) and adding the trace line. > > 1. NFS is not a block protocol. Reads and writes are byte aligned. > 2. The test for bsize >= PAGE_SIZE is nonsense for a byte aligned > filesystem. > 3. NFS does care about using folio_mkclean() to fix races between an > application that is writing to the folio, and any zeroing of the > data that may result from the file truncation. > 4. The existing folio dirty state isn't of interest here, since NFS > won't extend existing writes to cover the part of the folio that > lies after the old eof. > 5. The folio uptodate state is of interest, since any future > pagecache read needs to see zeroed bytes starting at the old eof > and extending either to the offset at the end of the folio, or to > the new eof (whichever of the two is smaller). Thanks very much for the explanation, much appreciated :) Motivation is that generally it seems to me folio_mkclean() itself is an implementation detail we shouldn't be exporting, but I'm considering how we might abstract this while retaining functionality in existing file systems (nfs and ext4 are the only current users). So given what you've said, either we'd need an abstraction that provided an alternative to pagecache_isize_extend() which could account for your use case (perhaps breaking it into parts), or simply to have an abstraction around the folio_mkclean() / folio_mark_dirty() dance. The latter I was trying to avoid as it doesn't do all that much to un-export the implementation detail but of course ensuring in-tree filesystems work as they do now is a requirement for any such change. Obviously I will cc- you on anything that touches NFS in this regard! Cheers, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/5] NFS: Serialise O_DIRECT i/o and truncate() 2025-09-05 19:35 [PATCH 0/5] Assortment of I/O fixes for the NFS client Trond Myklebust 2025-09-05 19:35 ` [PATCH 1/5] NFS: Protect against 'eof page pollution' Trond Myklebust @ 2025-09-05 19:35 ` Trond Myklebust 2025-09-05 19:35 ` [PATCH 3/5] NFSv4.2: Serialise O_DIRECT i/o and fallocate() Trond Myklebust ` (3 subsequent siblings) 5 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-05 19:35 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that all O_DIRECT reads and writes are complete, and prevent the initiation of new i/o until the setattr operation that will truncate the file is complete. Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/inode.c | 4 +++- fs/nfs/internal.h | 10 ++++++++++ fs/nfs/io.c | 13 ++----------- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 0b141feacc52..49df9debb1a6 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -768,8 +768,10 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, trace_nfs_setattr_enter(inode); /* Write all dirty data */ - if (S_ISREG(inode->i_mode)) + if (S_ISREG(inode->i_mode)) { + nfs_file_block_o_direct(NFS_I(inode)); nfs_sync_inode(inode); + } fattr = nfs_alloc_fattr_with_label(NFS_SERVER(inode)); if (fattr == NULL) { diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 1433ae13dba0..c0a44f389f8f 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -532,6 +532,16 @@ static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi) return test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0; } +/* Must be called with exclusively locked inode->i_rwsem */ +static inline void nfs_file_block_o_direct(struct nfs_inode *nfsi) +{ + if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { + clear_bit(NFS_INO_ODIRECT, &nfsi->flags); + inode_dio_wait(&nfsi->vfs_inode); + } +} + + /* namespace.c */ #define NFS_PATH_CANONICAL 1 extern char *nfs_path(char **p, struct dentry *dentry, diff --git a/fs/nfs/io.c b/fs/nfs/io.c index 3388faf2acb9..d275b0a250bf 100644 --- a/fs/nfs/io.c +++ b/fs/nfs/io.c @@ -14,15 +14,6 @@ #include "internal.h" -/* Call with exclusively locked inode->i_rwsem */ -static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode) -{ - if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { - clear_bit(NFS_INO_ODIRECT, &nfsi->flags); - inode_dio_wait(inode); - } -} - /** * nfs_start_io_read - declare the file is being used for buffered reads * @inode: file inode @@ -57,7 +48,7 @@ nfs_start_io_read(struct inode *inode) err = down_write_killable(&inode->i_rwsem); if (err) return err; - nfs_block_o_direct(nfsi, inode); + nfs_file_block_o_direct(nfsi); downgrade_write(&inode->i_rwsem); return 0; @@ -90,7 +81,7 @@ nfs_start_io_write(struct inode *inode) err = down_write_killable(&inode->i_rwsem); if (!err) - nfs_block_o_direct(NFS_I(inode), inode); + nfs_file_block_o_direct(NFS_I(inode)); return err; } -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/5] NFSv4.2: Serialise O_DIRECT i/o and fallocate() 2025-09-05 19:35 [PATCH 0/5] Assortment of I/O fixes for the NFS client Trond Myklebust 2025-09-05 19:35 ` [PATCH 1/5] NFS: Protect against 'eof page pollution' Trond Myklebust 2025-09-05 19:35 ` [PATCH 2/5] NFS: Serialise O_DIRECT i/o and truncate() Trond Myklebust @ 2025-09-05 19:35 ` Trond Myklebust 2025-09-05 19:35 ` [PATCH 4/5] NFS: nfs_invalidate_folio() must observe the offset and size arguments Trond Myklebust ` (2 subsequent siblings) 5 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-05 19:35 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that all O_DIRECT reads and writes complete before calling fallocate so that we don't race w.r.t. attribute updates. Fixes: 99f237832243 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/nfs42proc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 4420b8740e2f..bd45423668e2 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -114,6 +114,7 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep, exception.inode = inode; exception.state = lock->open_context->state; + nfs_file_block_o_direct(NFS_I(inode)); err = nfs_sync_inode(inode); if (err) goto out; -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/5] NFS: nfs_invalidate_folio() must observe the offset and size arguments 2025-09-05 19:35 [PATCH 0/5] Assortment of I/O fixes for the NFS client Trond Myklebust ` (2 preceding siblings ...) 2025-09-05 19:35 ` [PATCH 3/5] NFSv4.2: Serialise O_DIRECT i/o and fallocate() Trond Myklebust @ 2025-09-05 19:35 ` Trond Myklebust 2025-09-05 19:35 ` [PATCH 5/5] NFS: Fix the marking of the folio as up to date Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust 5 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-05 19:35 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> If we're truncating part of the folio, then we need to write out the data on the part that is not covered by the cancellation. Fixes: d47992f86b30 ("mm: change invalidatepage prototype to accept length") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/file.c | 7 ++++--- fs/nfs/write.c | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index af2fdbfcbbf6..7fa56af4d2cd 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -475,10 +475,11 @@ static void nfs_invalidate_folio(struct folio *folio, size_t offset, dfprintk(PAGECACHE, "NFS: invalidate_folio(%lu, %zu, %zu)\n", folio->index, offset, length); - if (offset != 0 || length < folio_size(folio)) - return; /* Cancel any unstarted writes on this page */ - nfs_wb_folio_cancel(inode, folio); + if (offset != 0 || length < folio_size(folio)) + nfs_wb_folio(inode, folio); + else + nfs_wb_folio_cancel(inode, folio); folio_wait_private_2(folio); /* [DEPRECATED] */ trace_nfs_invalidate_folio(inode, folio_pos(folio) + offset, length); } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 8b7c04737967..e359fbcdc8a0 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -2045,6 +2045,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio) * release it */ nfs_inode_remove_request(req); nfs_unlock_and_release_request(req); + folio_cancel_dirty(folio); } return ret; -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/5] NFS: Fix the marking of the folio as up to date 2025-09-05 19:35 [PATCH 0/5] Assortment of I/O fixes for the NFS client Trond Myklebust ` (3 preceding siblings ...) 2025-09-05 19:35 ` [PATCH 4/5] NFS: nfs_invalidate_folio() must observe the offset and size arguments Trond Myklebust @ 2025-09-05 19:35 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust 5 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-05 19:35 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Since all callers of nfs_page_group_covers_page() have already ensured that there is only one group member, all that is required is to check that the entire folio contains dirty data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/write.c | 52 +++++--------------------------------------------- 1 file changed, 5 insertions(+), 47 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index e359fbcdc8a0..647c53d1418a 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -237,59 +237,17 @@ static void nfs_mapping_set_error(struct folio *folio, int error) } /* - * nfs_page_group_search_locked - * @head - head request of page group - * @page_offset - offset into page + * nfs_page_covers_folio + * @req: struct nfs_page * - * Search page group with head @head to find a request that contains the - * page offset @page_offset. - * - * Returns a pointer to the first matching nfs request, or NULL if no - * match is found. - * - * Must be called with the page group lock held - */ -static struct nfs_page * -nfs_page_group_search_locked(struct nfs_page *head, unsigned int page_offset) -{ - struct nfs_page *req; - - req = head; - do { - if (page_offset >= req->wb_pgbase && - page_offset < (req->wb_pgbase + req->wb_bytes)) - return req; - - req = req->wb_this_page; - } while (req != head); - - return NULL; -} - -/* - * nfs_page_group_covers_page - * @head - head request of page group - * - * Return true if the page group with head @head covers the whole page, - * returns false otherwise + * Return true if the request covers the whole folio. + * Note that the caller should ensure all subrequests have been joined */ static bool nfs_page_group_covers_page(struct nfs_page *req) { unsigned int len = nfs_folio_length(nfs_page_to_folio(req)); - struct nfs_page *tmp; - unsigned int pos = 0; - - nfs_page_group_lock(req); - for (;;) { - tmp = nfs_page_group_search_locked(req->wb_head, pos); - if (!tmp) - break; - pos = tmp->wb_pgbase + tmp->wb_bytes; - } - - nfs_page_group_unlock(req); - return pos >= len; + return req->wb_pgbase == 0 && req->wb_bytes == len; } /* We can set the PG_uptodate flag if we see that a write request -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 0/8] Assortment of I/O fixes for the NFS client 2025-09-05 19:35 [PATCH 0/5] Assortment of I/O fixes for the NFS client Trond Myklebust ` (4 preceding siblings ...) 2025-09-05 19:35 ` [PATCH 5/5] NFS: Fix the marking of the folio as up to date Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 1/8] NFS: Protect against 'eof page pollution' Trond Myklebust ` (7 more replies) 5 siblings, 8 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> The following patch series addresses a series of issues that affect the NFS client's I/O path. The first patch is needed to fix a corner case when using mmap() to map the tail end of the file into memory, and then later extending the file length (see recent xfstests generic/363). The second set is used to ensure correct ordering of O_DIRECT operations with truncate + fallocate. Finally, there are fixes to ensure folio invalidation is correct w.r.t. the VFS documentation, and a cleanup of the code that decides when a folio can be marked as up to date. v2: - Fix an off by one issue in nfs_write_begin() - Address eof page pollution on copy offload and clone range (Thanks, Olga!) - Address O_DIRECT ordering w.r.t. copy offload and clone range Trond Myklebust (8): NFS: Protect against 'eof page pollution' NFSv4.2: Protect copy offload and clone against 'eof page pollution' NFS: Serialise O_DIRECT i/o and truncate() NFSv4.2: Serialise O_DIRECT i/o and fallocate() NFSv4.2: Serialise O_DIRECT i/o and clone range NFSv4.2: Serialise O_DIRECT i/o and copy range NFS: nfs_invalidate_folio() must observe the offset and size arguments NFS: Fix the marking of the folio as up to date fs/nfs/file.c | 40 +++++++++++++++++++++++++++++++--- fs/nfs/inode.c | 13 +++++++++--- fs/nfs/internal.h | 12 +++++++++++ fs/nfs/io.c | 13 ++---------- fs/nfs/nfs42proc.c | 35 ++++++++++++++++++++++-------- fs/nfs/nfs4file.c | 2 ++ fs/nfs/nfstrace.h | 1 + fs/nfs/write.c | 53 ++++++---------------------------------------- 8 files changed, 96 insertions(+), 73 deletions(-) -- 2.51.0 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/8] NFS: Protect against 'eof page pollution' 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 2/8] NFSv4.2: Protect copy offload and clone " Trond Myklebust ` (6 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> This commit fixes the failing xfstest 'generic/363'. When the user mmaps() an area that extends beyond the end of file, and proceeds to write data into the folio that straddles that eof, we're required to discard that folio data if the user calls some function that extends the file length. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/file.c | 33 +++++++++++++++++++++++++++++++++ fs/nfs/inode.c | 9 +++++++-- fs/nfs/internal.h | 2 ++ fs/nfs/nfs42proc.c | 14 +++++++++++--- fs/nfs/nfstrace.h | 1 + 5 files changed, 54 insertions(+), 5 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index 86e36c630f09..a3105f944a0e 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -28,6 +28,7 @@ #include <linux/mm.h> #include <linux/pagemap.h> #include <linux/gfp.h> +#include <linux/rmap.h> #include <linux/swap.h> #include <linux/compaction.h> @@ -280,6 +281,37 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync) } EXPORT_SYMBOL_GPL(nfs_file_fsync); +void nfs_truncate_last_folio(struct address_space *mapping, loff_t from, + loff_t to) +{ + struct folio *folio; + + if (from >= to) + return; + + folio = filemap_lock_folio(mapping, from >> PAGE_SHIFT); + if (IS_ERR(folio)) + return; + + if (folio_mkclean(folio)) + folio_mark_dirty(folio); + + if (folio_test_uptodate(folio)) { + loff_t fpos = folio_pos(folio); + size_t offset = from - fpos; + size_t end = folio_size(folio); + + if (to - fpos < end) + end = to - fpos; + folio_zero_segment(folio, offset, end); + trace_nfs_size_truncate_folio(mapping->host, to); + } + + folio_unlock(folio); + folio_put(folio); +} +EXPORT_SYMBOL_GPL(nfs_truncate_last_folio); + /* * Decide whether a read/modify/write cycle may be more efficient * then a modify/write/read cycle when writing to a page in the @@ -356,6 +388,7 @@ static int nfs_write_begin(const struct kiocb *iocb, dfprintk(PAGECACHE, "NFS: write_begin(%pD2(%lu), %u@%lld)\n", file, mapping->host->i_ino, len, (long long) pos); + nfs_truncate_last_folio(mapping, i_size_read(mapping->host), pos); fgp |= fgf_set_order(len); start: diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 338ef77ae423..0b141feacc52 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -716,6 +716,7 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, { struct inode *inode = d_inode(dentry); struct nfs_fattr *fattr; + loff_t oldsize = i_size_read(inode); int error = 0; nfs_inc_stats(inode, NFSIOS_VFSSETATTR); @@ -731,7 +732,7 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, if (error) return error; - if (attr->ia_size == i_size_read(inode)) + if (attr->ia_size == oldsize) attr->ia_valid &= ~ATTR_SIZE; } @@ -777,8 +778,12 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, } error = NFS_PROTO(inode)->setattr(dentry, fattr, attr); - if (error == 0) + if (error == 0) { + if (attr->ia_valid & ATTR_SIZE) + nfs_truncate_last_folio(inode->i_mapping, oldsize, + attr->ia_size); error = nfs_refresh_inode(inode, fattr); + } nfs_free_fattr(fattr); out: trace_nfs_setattr_exit(inode, error); diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 74d712b58423..1433ae13dba0 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -437,6 +437,8 @@ int nfs_file_release(struct inode *, struct file *); int nfs_lock(struct file *, int, struct file_lock *); int nfs_flock(struct file *, int, struct file_lock *); int nfs_check_flags(int); +void nfs_truncate_last_folio(struct address_space *mapping, loff_t from, + loff_t to); /* inode.c */ extern struct workqueue_struct *nfsiod_workqueue; diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 01c01f45358b..4420b8740e2f 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -137,6 +137,7 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len) .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ALLOCATE], }; struct inode *inode = file_inode(filep); + loff_t oldsize = i_size_read(inode); int err; if (!nfs_server_capable(inode, NFS_CAP_ALLOCATE)) @@ -145,7 +146,11 @@ int nfs42_proc_allocate(struct file *filep, loff_t offset, loff_t len) inode_lock(inode); err = nfs42_proc_fallocate(&msg, filep, offset, len); - if (err == -EOPNOTSUPP) + + if (err == 0) + nfs_truncate_last_folio(inode->i_mapping, oldsize, + offset + len); + else if (err == -EOPNOTSUPP) NFS_SERVER(inode)->caps &= ~(NFS_CAP_ALLOCATE | NFS_CAP_ZERO_RANGE); @@ -183,6 +188,7 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len) .rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_ZERO_RANGE], }; struct inode *inode = file_inode(filep); + loff_t oldsize = i_size_read(inode); int err; if (!nfs_server_capable(inode, NFS_CAP_ZERO_RANGE)) @@ -191,9 +197,11 @@ int nfs42_proc_zero_range(struct file *filep, loff_t offset, loff_t len) inode_lock(inode); err = nfs42_proc_fallocate(&msg, filep, offset, len); - if (err == 0) + if (err == 0) { + nfs_truncate_last_folio(inode->i_mapping, oldsize, + offset + len); truncate_pagecache_range(inode, offset, (offset + len) -1); - if (err == -EOPNOTSUPP) + } else if (err == -EOPNOTSUPP) NFS_SERVER(inode)->caps &= ~NFS_CAP_ZERO_RANGE; inode_unlock(inode); diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h index 96b1323318c2..627115179795 100644 --- a/fs/nfs/nfstrace.h +++ b/fs/nfs/nfstrace.h @@ -272,6 +272,7 @@ DECLARE_EVENT_CLASS(nfs_update_size_class, TP_ARGS(inode, new_size)) DEFINE_NFS_UPDATE_SIZE_EVENT(truncate); +DEFINE_NFS_UPDATE_SIZE_EVENT(truncate_folio); DEFINE_NFS_UPDATE_SIZE_EVENT(wcc); DEFINE_NFS_UPDATE_SIZE_EVENT(update); DEFINE_NFS_UPDATE_SIZE_EVENT(grow); -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/8] NFSv4.2: Protect copy offload and clone against 'eof page pollution' 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 1/8] NFS: Protect against 'eof page pollution' Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 3/8] NFS: Serialise O_DIRECT i/o and truncate() Trond Myklebust ` (5 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> The NFSv4.2 copy offload and clone functions can also end up extending the size of the destination file, so they too need to call nfs_truncate_last_folio(). Reported-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/nfs42proc.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 4420b8740e2f..e2fea37c5348 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -362,22 +362,27 @@ static int process_copy_commit(struct file *dst, loff_t pos_dst, /** * nfs42_copy_dest_done - perform inode cache updates after clone/copy offload - * @inode: pointer to destination inode + * @file: pointer to destination file * @pos: destination offset * @len: copy length + * @oldsize: length of the file prior to clone/copy * * Punch a hole in the inode page cache, so that the NFS client will * know to retrieve new data. * Update the file size if necessary, and then mark the inode as having * invalid cached values for change attribute, ctime, mtime and space used. */ -static void nfs42_copy_dest_done(struct inode *inode, loff_t pos, loff_t len) +static void nfs42_copy_dest_done(struct file *file, loff_t pos, loff_t len, + loff_t oldsize) { + struct inode *inode = file_inode(file); + struct address_space *mapping = file->f_mapping; loff_t newsize = pos + len; loff_t end = newsize - 1; - WARN_ON_ONCE(invalidate_inode_pages2_range(inode->i_mapping, - pos >> PAGE_SHIFT, end >> PAGE_SHIFT)); + nfs_truncate_last_folio(mapping, oldsize, pos); + WARN_ON_ONCE(invalidate_inode_pages2_range(mapping, pos >> PAGE_SHIFT, + end >> PAGE_SHIFT)); spin_lock(&inode->i_lock); if (newsize > i_size_read(inode)) @@ -410,6 +415,7 @@ static ssize_t _nfs42_proc_copy(struct file *src, struct nfs_server *src_server = NFS_SERVER(src_inode); loff_t pos_src = args->src_pos; loff_t pos_dst = args->dst_pos; + loff_t oldsize_dst = i_size_read(dst_inode); size_t count = args->count; ssize_t status; @@ -483,7 +489,7 @@ static ssize_t _nfs42_proc_copy(struct file *src, goto out; } - nfs42_copy_dest_done(dst_inode, pos_dst, res->write_res.count); + nfs42_copy_dest_done(dst, pos_dst, res->write_res.count, oldsize_dst); nfs_invalidate_atime(src_inode); status = res->write_res.count; out: @@ -1250,6 +1256,7 @@ static int _nfs42_proc_clone(struct rpc_message *msg, struct file *src_f, struct nfs42_clone_res res = { .server = server, }; + loff_t oldsize_dst = i_size_read(dst_inode); int status; msg->rpc_argp = &args; @@ -1284,7 +1291,7 @@ static int _nfs42_proc_clone(struct rpc_message *msg, struct file *src_f, /* a zero-length count means clone to EOF in src */ if (count == 0 && res.dst_fattr->valid & NFS_ATTR_FATTR_SIZE) count = nfs_size_to_loff_t(res.dst_fattr->size) - dst_offset; - nfs42_copy_dest_done(dst_inode, dst_offset, count); + nfs42_copy_dest_done(dst_f, dst_offset, count, oldsize_dst); status = nfs_post_op_update_inode(dst_inode, res.dst_fattr); } -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/8] NFS: Serialise O_DIRECT i/o and truncate() 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 1/8] NFS: Protect against 'eof page pollution' Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 2/8] NFSv4.2: Protect copy offload and clone " Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 4/8] NFSv4.2: Serialise O_DIRECT i/o and fallocate() Trond Myklebust ` (4 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that all O_DIRECT reads and writes are complete, and prevent the initiation of new i/o until the setattr operation that will truncate the file is complete. Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/inode.c | 4 +++- fs/nfs/internal.h | 10 ++++++++++ fs/nfs/io.c | 13 ++----------- 3 files changed, 15 insertions(+), 12 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 0b141feacc52..49df9debb1a6 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -768,8 +768,10 @@ nfs_setattr(struct mnt_idmap *idmap, struct dentry *dentry, trace_nfs_setattr_enter(inode); /* Write all dirty data */ - if (S_ISREG(inode->i_mode)) + if (S_ISREG(inode->i_mode)) { + nfs_file_block_o_direct(NFS_I(inode)); nfs_sync_inode(inode); + } fattr = nfs_alloc_fattr_with_label(NFS_SERVER(inode)); if (fattr == NULL) { diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h index 1433ae13dba0..c0a44f389f8f 100644 --- a/fs/nfs/internal.h +++ b/fs/nfs/internal.h @@ -532,6 +532,16 @@ static inline bool nfs_file_io_is_buffered(struct nfs_inode *nfsi) return test_bit(NFS_INO_ODIRECT, &nfsi->flags) == 0; } +/* Must be called with exclusively locked inode->i_rwsem */ +static inline void nfs_file_block_o_direct(struct nfs_inode *nfsi) +{ + if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { + clear_bit(NFS_INO_ODIRECT, &nfsi->flags); + inode_dio_wait(&nfsi->vfs_inode); + } +} + + /* namespace.c */ #define NFS_PATH_CANONICAL 1 extern char *nfs_path(char **p, struct dentry *dentry, diff --git a/fs/nfs/io.c b/fs/nfs/io.c index 3388faf2acb9..d275b0a250bf 100644 --- a/fs/nfs/io.c +++ b/fs/nfs/io.c @@ -14,15 +14,6 @@ #include "internal.h" -/* Call with exclusively locked inode->i_rwsem */ -static void nfs_block_o_direct(struct nfs_inode *nfsi, struct inode *inode) -{ - if (test_bit(NFS_INO_ODIRECT, &nfsi->flags)) { - clear_bit(NFS_INO_ODIRECT, &nfsi->flags); - inode_dio_wait(inode); - } -} - /** * nfs_start_io_read - declare the file is being used for buffered reads * @inode: file inode @@ -57,7 +48,7 @@ nfs_start_io_read(struct inode *inode) err = down_write_killable(&inode->i_rwsem); if (err) return err; - nfs_block_o_direct(nfsi, inode); + nfs_file_block_o_direct(nfsi); downgrade_write(&inode->i_rwsem); return 0; @@ -90,7 +81,7 @@ nfs_start_io_write(struct inode *inode) err = down_write_killable(&inode->i_rwsem); if (!err) - nfs_block_o_direct(NFS_I(inode), inode); + nfs_file_block_o_direct(NFS_I(inode)); return err; } -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/8] NFSv4.2: Serialise O_DIRECT i/o and fallocate() 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust ` (2 preceding siblings ...) 2025-09-06 16:42 ` [PATCH v2 3/8] NFS: Serialise O_DIRECT i/o and truncate() Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 5/8] NFSv4.2: Serialise O_DIRECT i/o and clone range Trond Myklebust ` (3 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that all O_DIRECT reads and writes complete before calling fallocate so that we don't race w.r.t. attribute updates. Fixes: 99f237832243 ("NFSv4.2: Always flush out writes in nfs42_proc_fallocate()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/nfs42proc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index e2fea37c5348..1a169372ca16 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -114,6 +114,7 @@ static int nfs42_proc_fallocate(struct rpc_message *msg, struct file *filep, exception.inode = inode; exception.state = lock->open_context->state; + nfs_file_block_o_direct(NFS_I(inode)); err = nfs_sync_inode(inode); if (err) goto out; -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 5/8] NFSv4.2: Serialise O_DIRECT i/o and clone range 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust ` (3 preceding siblings ...) 2025-09-06 16:42 ` [PATCH v2 4/8] NFSv4.2: Serialise O_DIRECT i/o and fallocate() Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 6/8] NFSv4.2: Serialise O_DIRECT i/o and copy range Trond Myklebust ` (2 subsequent siblings) 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that all O_DIRECT reads and writes complete before cloning a file range, so that both the source and destination are up to date. Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/nfs4file.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c index 1d6b5f4230c9..c9a0d1e420c6 100644 --- a/fs/nfs/nfs4file.c +++ b/fs/nfs/nfs4file.c @@ -278,9 +278,11 @@ static loff_t nfs42_remap_file_range(struct file *src_file, loff_t src_off, lock_two_nondirectories(src_inode, dst_inode); /* flush all pending writes on both src and dst so that server * has the latest data */ + nfs_file_block_o_direct(NFS_I(src_inode)); ret = nfs_sync_inode(src_inode); if (ret) goto out_unlock; + nfs_file_block_o_direct(NFS_I(dst_inode)); ret = nfs_sync_inode(dst_inode); if (ret) goto out_unlock; -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 6/8] NFSv4.2: Serialise O_DIRECT i/o and copy range 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust ` (4 preceding siblings ...) 2025-09-06 16:42 ` [PATCH v2 5/8] NFSv4.2: Serialise O_DIRECT i/o and clone range Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 7/8] NFS: nfs_invalidate_folio() must observe the offset and size arguments Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 8/8] NFS: Fix the marking of the folio as up to date Trond Myklebust 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Ensure that all O_DIRECT reads and writes complete before copying a file range, so that the destination is up to date. Fixes: a5864c999de6 ("NFS: Do not serialise O_DIRECT reads and writes") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/nfs42proc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c index 1a169372ca16..6a0b5871ba3b 100644 --- a/fs/nfs/nfs42proc.c +++ b/fs/nfs/nfs42proc.c @@ -445,6 +445,7 @@ static ssize_t _nfs42_proc_copy(struct file *src, return status; } + nfs_file_block_o_direct(NFS_I(dst_inode)); status = nfs_sync_inode(dst_inode); if (status) return status; -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 7/8] NFS: nfs_invalidate_folio() must observe the offset and size arguments 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust ` (5 preceding siblings ...) 2025-09-06 16:42 ` [PATCH v2 6/8] NFSv4.2: Serialise O_DIRECT i/o and copy range Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 8/8] NFS: Fix the marking of the folio as up to date Trond Myklebust 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> If we're truncating part of the folio, then we need to write out the data on the part that is not covered by the cancellation. Fixes: d47992f86b30 ("mm: change invalidatepage prototype to accept length") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/file.c | 7 ++++--- fs/nfs/write.c | 1 + 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/nfs/file.c b/fs/nfs/file.c index a3105f944a0e..8059ece82468 100644 --- a/fs/nfs/file.c +++ b/fs/nfs/file.c @@ -475,10 +475,11 @@ static void nfs_invalidate_folio(struct folio *folio, size_t offset, dfprintk(PAGECACHE, "NFS: invalidate_folio(%lu, %zu, %zu)\n", folio->index, offset, length); - if (offset != 0 || length < folio_size(folio)) - return; /* Cancel any unstarted writes on this page */ - nfs_wb_folio_cancel(inode, folio); + if (offset != 0 || length < folio_size(folio)) + nfs_wb_folio(inode, folio); + else + nfs_wb_folio_cancel(inode, folio); folio_wait_private_2(folio); /* [DEPRECATED] */ trace_nfs_invalidate_folio(inode, folio_pos(folio) + offset, length); } diff --git a/fs/nfs/write.c b/fs/nfs/write.c index 8b7c04737967..e359fbcdc8a0 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -2045,6 +2045,7 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio) * release it */ nfs_inode_remove_request(req); nfs_unlock_and_release_request(req); + folio_cancel_dirty(folio); } return ret; -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 8/8] NFS: Fix the marking of the folio as up to date 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust ` (6 preceding siblings ...) 2025-09-06 16:42 ` [PATCH v2 7/8] NFS: nfs_invalidate_folio() must observe the offset and size arguments Trond Myklebust @ 2025-09-06 16:42 ` Trond Myklebust 7 siblings, 0 replies; 18+ messages in thread From: Trond Myklebust @ 2025-09-06 16:42 UTC (permalink / raw) To: linux-nfs From: Trond Myklebust <trond.myklebust@hammerspace.com> Since all callers of nfs_page_group_covers_page() have already ensured that there is only one group member, all that is required is to check that the entire folio contains dirty data. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> --- fs/nfs/write.c | 52 +++++--------------------------------------------- 1 file changed, 5 insertions(+), 47 deletions(-) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index e359fbcdc8a0..647c53d1418a 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -237,59 +237,17 @@ static void nfs_mapping_set_error(struct folio *folio, int error) } /* - * nfs_page_group_search_locked - * @head - head request of page group - * @page_offset - offset into page + * nfs_page_covers_folio + * @req: struct nfs_page * - * Search page group with head @head to find a request that contains the - * page offset @page_offset. - * - * Returns a pointer to the first matching nfs request, or NULL if no - * match is found. - * - * Must be called with the page group lock held - */ -static struct nfs_page * -nfs_page_group_search_locked(struct nfs_page *head, unsigned int page_offset) -{ - struct nfs_page *req; - - req = head; - do { - if (page_offset >= req->wb_pgbase && - page_offset < (req->wb_pgbase + req->wb_bytes)) - return req; - - req = req->wb_this_page; - } while (req != head); - - return NULL; -} - -/* - * nfs_page_group_covers_page - * @head - head request of page group - * - * Return true if the page group with head @head covers the whole page, - * returns false otherwise + * Return true if the request covers the whole folio. + * Note that the caller should ensure all subrequests have been joined */ static bool nfs_page_group_covers_page(struct nfs_page *req) { unsigned int len = nfs_folio_length(nfs_page_to_folio(req)); - struct nfs_page *tmp; - unsigned int pos = 0; - - nfs_page_group_lock(req); - for (;;) { - tmp = nfs_page_group_search_locked(req->wb_head, pos); - if (!tmp) - break; - pos = tmp->wb_pgbase + tmp->wb_bytes; - } - - nfs_page_group_unlock(req); - return pos >= len; + return req->wb_pgbase == 0 && req->wb_bytes == len; } /* We can set the PG_uptodate flag if we see that a write request -- 2.51.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-01-26 13:03 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-09-05 19:35 [PATCH 0/5] Assortment of I/O fixes for the NFS client Trond Myklebust 2025-09-05 19:35 ` [PATCH 1/5] NFS: Protect against 'eof page pollution' Trond Myklebust 2026-01-24 21:01 ` Lorenzo Stoakes 2026-01-24 22:20 ` Trond Myklebust 2026-01-26 13:03 ` Lorenzo Stoakes 2025-09-05 19:35 ` [PATCH 2/5] NFS: Serialise O_DIRECT i/o and truncate() Trond Myklebust 2025-09-05 19:35 ` [PATCH 3/5] NFSv4.2: Serialise O_DIRECT i/o and fallocate() Trond Myklebust 2025-09-05 19:35 ` [PATCH 4/5] NFS: nfs_invalidate_folio() must observe the offset and size arguments Trond Myklebust 2025-09-05 19:35 ` [PATCH 5/5] NFS: Fix the marking of the folio as up to date Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 0/8] Assortment of I/O fixes for the NFS client Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 1/8] NFS: Protect against 'eof page pollution' Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 2/8] NFSv4.2: Protect copy offload and clone " Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 3/8] NFS: Serialise O_DIRECT i/o and truncate() Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 4/8] NFSv4.2: Serialise O_DIRECT i/o and fallocate() Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 5/8] NFSv4.2: Serialise O_DIRECT i/o and clone range Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 6/8] NFSv4.2: Serialise O_DIRECT i/o and copy range Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 7/8] NFS: nfs_invalidate_folio() must observe the offset and size arguments Trond Myklebust 2025-09-06 16:42 ` [PATCH v2 8/8] NFS: Fix the marking of the folio as up to date Trond Myklebust
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox