* [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE
@ 2025-04-23 4:25 trondmy
2025-04-23 4:25 ` [RFC PATCH 1/3] filemap: Add a helper for filesystems implementing dropbehind trondmy
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: trondmy @ 2025-04-23 4:25 UTC (permalink / raw)
To: linux-nfs; +Cc: linux-mm, linux-fsdevel
From: Trond Myklebust <trond.myklebust@hammerspace.com>
The following patch set attempts to add support for the RWF_DONTCACHE
flag in preadv2() and pwritev2() on NFS filesystems.
The main issue is allowing support on 2 stage writes (i.e. unstable
WRITE followed by a COMMIT) since those don't follow the current
assumption that the 'dropbehind' flag can be fulfilled as soon as the
writeback lock is dropped.
Trond Myklebust (3):
filemap: Add a helper for filesystems implementing dropbehind
filemap: Mark folios as dropbehind in generic_perform_write()
NFS: Enable the RWF_DONTCACHE flag for the NFS client
fs/nfs/file.c | 2 ++
fs/nfs/nfs4file.c | 2 ++
fs/nfs/write.c | 12 +++++++++++-
include/linux/nfs_page.h | 1 +
include/linux/pagemap.h | 1 +
mm/filemap.c | 21 +++++++++++++++++++++
6 files changed, 38 insertions(+), 1 deletion(-)
--
2.49.0
^ permalink raw reply [flat|nested] 14+ messages in thread
* [RFC PATCH 1/3] filemap: Add a helper for filesystems implementing dropbehind
2025-04-23 4:25 [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE trondmy
@ 2025-04-23 4:25 ` trondmy
2025-04-24 21:30 ` Mike Snitzer
2025-04-23 4:25 ` [RFC PATCH 2/3] filemap: Mark folios as dropbehind in generic_perform_write() trondmy
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2025-04-23 4:25 UTC (permalink / raw)
To: linux-nfs; +Cc: linux-mm, linux-fsdevel
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Add a helper to allow filesystems to attempt to free the 'dropbehind'
folio.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
include/linux/pagemap.h | 1 +
mm/filemap.c | 16 ++++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 26baa78f1ca7..63e2bee9f46b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1225,6 +1225,7 @@ void folio_wait_writeback(struct folio *folio);
int folio_wait_writeback_killable(struct folio *folio);
void end_page_writeback(struct page *page);
void folio_end_writeback(struct folio *folio);
+void folio_end_dropbehind(struct folio *folio);
void folio_wait_stable(struct folio *folio);
void __folio_mark_dirty(struct folio *folio, struct address_space *, int warn);
void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb);
diff --git a/mm/filemap.c b/mm/filemap.c
index b5e784f34d98..12f694880bb8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1589,6 +1589,22 @@ int folio_wait_private_2_killable(struct folio *folio)
}
EXPORT_SYMBOL(folio_wait_private_2_killable);
+/*
+ * Helper for filesystems that want to implement dropbehind, and that
+ * need to keep the folio around after folio_end_writeback, e.g. due to
+ * the need to first commit NFS stable writes.
+ */
+void folio_end_dropbehind(struct folio *folio)
+{
+ if (folio_trylock(folio)) {
+ if (folio->mapping && !folio_test_dirty(folio) &&
+ !folio_test_writeback(folio))
+ folio_unmap_invalidate(folio->mapping, folio, 0);
+ folio_unlock(folio);
+ }
+}
+EXPORT_SYMBOL(folio_end_dropbehind);
+
/*
* If folio was marked as dropbehind, then pages should be dropped when writeback
* completes. Do that now. If we fail, it's likely because of a big folio -
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 2/3] filemap: Mark folios as dropbehind in generic_perform_write()
2025-04-23 4:25 [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE trondmy
2025-04-23 4:25 ` [RFC PATCH 1/3] filemap: Add a helper for filesystems implementing dropbehind trondmy
@ 2025-04-23 4:25 ` trondmy
2025-04-24 21:30 ` Mike Snitzer
2025-04-23 4:25 ` [RFC PATCH 3/3] NFS: Enable the RWF_DONTCACHE flag for the NFS client trondmy
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2025-04-23 4:25 UTC (permalink / raw)
To: linux-nfs; +Cc: linux-mm, linux-fsdevel
From: Trond Myklebust <trond.myklebust@hammerspace.com>
The iocb flags are not passed down to the write_begin() callback that is
allocating the folio, so we need to set the dropbehind folio flag from
inside the generic_perform_write() function itself.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
mm/filemap.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 12f694880bb8..4c383f29e828 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4136,6 +4136,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
copied = copy_folio_from_iter_atomic(folio, offset, bytes, i);
flush_dcache_folio(folio);
+ if (iocb->ki_flags & IOCB_DONTCACHE)
+ folio_set_dropbehind(folio);
+ else if (folio_test_dropbehind(folio))
+ folio_clear_dropbehind(folio);
+
status = a_ops->write_end(file, mapping, pos, bytes, copied,
folio, fsdata);
if (unlikely(status != copied)) {
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [RFC PATCH 3/3] NFS: Enable the RWF_DONTCACHE flag for the NFS client
2025-04-23 4:25 [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE trondmy
2025-04-23 4:25 ` [RFC PATCH 1/3] filemap: Add a helper for filesystems implementing dropbehind trondmy
2025-04-23 4:25 ` [RFC PATCH 2/3] filemap: Mark folios as dropbehind in generic_perform_write() trondmy
@ 2025-04-23 4:25 ` trondmy
2025-04-24 21:31 ` Mike Snitzer
2025-04-23 14:38 ` [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE Chuck Lever
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2025-04-23 4:25 UTC (permalink / raw)
To: linux-nfs; +Cc: linux-mm, linux-fsdevel
From: Trond Myklebust <trond.myklebust@hammerspace.com>
While the NFS readahead code has no problems using the generic code to
manage the dropbehind behaviour enabled by RWF_DONTCACHE, the write code
needs to deal with the fact that NFS writeback uses a 2 step process
(UNSTABLE write followed by COMMIT).
This commit replaces the use of the folio dropbehind flag with a local
NFS request flag that triggers the dropbehind behaviour once the data
has been written to stable storage.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
fs/nfs/file.c | 2 ++
fs/nfs/nfs4file.c | 2 ++
fs/nfs/write.c | 12 +++++++++++-
include/linux/nfs_page.h | 1 +
4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 033feeab8c34..60d47f141acb 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -910,5 +910,7 @@ const struct file_operations nfs_file_operations = {
.splice_write = iter_file_splice_write,
.check_flags = nfs_check_flags,
.setlease = simple_nosetlease,
+
+ .fop_flags = FOP_DONTCACHE,
};
EXPORT_SYMBOL_GPL(nfs_file_operations);
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 1cd9652f3c28..e6726499c585 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -467,4 +467,6 @@ const struct file_operations nfs4_file_operations = {
#else
.llseek = nfs_file_llseek,
#endif
+
+ .fop_flags = FOP_DONTCACHE,
};
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 23df8b214474..e0ac439ab211 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -359,8 +359,12 @@ static void nfs_folio_end_writeback(struct folio *folio)
static void nfs_page_end_writeback(struct nfs_page *req)
{
if (nfs_page_group_sync_on_bit(req, PG_WB_END)) {
+ struct folio *folio = nfs_page_to_folio(req);
+
+ if (folio_test_clear_dropbehind(folio))
+ set_bit(PG_DROPBEHIND, &req->wb_flags);
nfs_unlock_request(req);
- nfs_folio_end_writeback(nfs_page_to_folio(req));
+ nfs_folio_end_writeback(folio);
} else
nfs_unlock_request(req);
}
@@ -813,6 +817,9 @@ static void nfs_inode_remove_request(struct nfs_page *req)
clear_bit(PG_MAPPED, &req->wb_head->wb_flags);
}
spin_unlock(&mapping->i_private_lock);
+
+ if (test_bit(PG_DROPBEHIND, &req->wb_flags))
+ folio_end_dropbehind(folio);
}
if (test_and_clear_bit(PG_INODE_REF, &req->wb_flags)) {
@@ -2093,6 +2100,7 @@ int nfs_wb_folio(struct inode *inode, struct folio *folio)
.range_start = range_start,
.range_end = range_start + len - 1,
};
+ bool dropbehind = folio_test_clear_dropbehind(folio);
int ret;
trace_nfs_writeback_folio(inode, range_start, len);
@@ -2113,6 +2121,8 @@ int nfs_wb_folio(struct inode *inode, struct folio *folio)
goto out_error;
}
out_error:
+ if (dropbehind)
+ folio_set_dropbehind(folio);
trace_nfs_writeback_folio_done(inode, range_start, len, ret);
return ret;
}
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 169b4ae30ff4..1a017b5b476f 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -37,6 +37,7 @@ enum {
PG_REMOVE, /* page group sync bit in write path */
PG_CONTENDED1, /* Is someone waiting for a lock? */
PG_CONTENDED2, /* Is someone waiting for a lock? */
+ PG_DROPBEHIND, /* Implement RWF_DONTCACHE */
};
struct nfs_inode;
--
2.49.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE
2025-04-23 4:25 [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE trondmy
` (2 preceding siblings ...)
2025-04-23 4:25 ` [RFC PATCH 3/3] NFS: Enable the RWF_DONTCACHE flag for the NFS client trondmy
@ 2025-04-23 14:38 ` Chuck Lever
2025-04-23 15:22 ` Matthew Wilcox
2025-04-24 21:29 ` [PATCH 4/3] NFS: add RWF_DONTCACHE support to LOCALIO Mike Snitzer
2025-07-14 6:22 ` [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE Christoph Hellwig
5 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2025-04-23 14:38 UTC (permalink / raw)
To: trondmy, linux-nfs; +Cc: linux-mm, linux-fsdevel
On 4/23/25 12:25 AM, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The following patch set attempts to add support for the RWF_DONTCACHE
> flag in preadv2() and pwritev2() on NFS filesystems.
Hi Trond-
"RFC" in the subject field noted.
The cover letter does not explain why one would want this facility, nor
does it quantify the performance implications.
I can understand not wanting to cache on an NFS server, but don't you
want to maintain a data cache as close to applications as possible?
> The main issue is allowing support on 2 stage writes (i.e. unstable
> WRITE followed by a COMMIT) since those don't follow the current
> assumption that the 'dropbehind' flag can be fulfilled as soon as the
> writeback lock is dropped.
>
> Trond Myklebust (3):
> filemap: Add a helper for filesystems implementing dropbehind
> filemap: Mark folios as dropbehind in generic_perform_write()
> NFS: Enable the RWF_DONTCACHE flag for the NFS client
>
> fs/nfs/file.c | 2 ++
> fs/nfs/nfs4file.c | 2 ++
> fs/nfs/write.c | 12 +++++++++++-
> include/linux/nfs_page.h | 1 +
> include/linux/pagemap.h | 1 +
> mm/filemap.c | 21 +++++++++++++++++++++
> 6 files changed, 38 insertions(+), 1 deletion(-)
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE
2025-04-23 14:38 ` [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE Chuck Lever
@ 2025-04-23 15:22 ` Matthew Wilcox
2025-04-23 15:30 ` Chuck Lever
0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox @ 2025-04-23 15:22 UTC (permalink / raw)
To: Chuck Lever; +Cc: trondmy, linux-nfs, linux-mm, linux-fsdevel
On Wed, Apr 23, 2025 at 10:38:37AM -0400, Chuck Lever wrote:
> On 4/23/25 12:25 AM, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > The following patch set attempts to add support for the RWF_DONTCACHE
> > flag in preadv2() and pwritev2() on NFS filesystems.
>
> Hi Trond-
>
> "RFC" in the subject field noted.
>
> The cover letter does not explain why one would want this facility, nor
> does it quantify the performance implications.
>
> I can understand not wanting to cache on an NFS server, but don't you
> want to maintain a data cache as close to applications as possible?
If you look at the original work for RWF_DONTCACHE, you'll see this is
the application providing the hint that it's doing a streaming access.
It's only applied to folios which are created as a result of this
access, and other accesses to these folios while the folios are in use
clear the flag. So it's kind of like O_DIRECT access, except that it
does go through the page cache so there's none of this funky alignment
requirement on the userspace buffers.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE
2025-04-23 15:22 ` Matthew Wilcox
@ 2025-04-23 15:30 ` Chuck Lever
2025-04-24 16:51 ` Mike Snitzer
0 siblings, 1 reply; 14+ messages in thread
From: Chuck Lever @ 2025-04-23 15:30 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: trondmy, linux-nfs, linux-mm, linux-fsdevel
On 4/23/25 11:22 AM, Matthew Wilcox wrote:
> On Wed, Apr 23, 2025 at 10:38:37AM -0400, Chuck Lever wrote:
>> On 4/23/25 12:25 AM, trondmy@kernel.org wrote:
>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>
>>> The following patch set attempts to add support for the RWF_DONTCACHE
>>> flag in preadv2() and pwritev2() on NFS filesystems.
>>
>> Hi Trond-
>>
>> "RFC" in the subject field noted.
>>
>> The cover letter does not explain why one would want this facility, nor
>> does it quantify the performance implications.
>>
>> I can understand not wanting to cache on an NFS server, but don't you
>> want to maintain a data cache as close to applications as possible?
>
> If you look at the original work for RWF_DONTCACHE, you'll see this is
> the application providing the hint that it's doing a streaming access.
> It's only applied to folios which are created as a result of this
> access, and other accesses to these folios while the folios are in use
> clear the flag. So it's kind of like O_DIRECT access, except that it
> does go through the page cache so there's none of this funky alignment
> requirement on the userspace buffers.
OK, was wondering whether this behavior was opt-in; sounds like it is.
Thanks for setting me straight.
--
Chuck Lever
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE
2025-04-23 15:30 ` Chuck Lever
@ 2025-04-24 16:51 ` Mike Snitzer
2025-04-24 16:59 ` Chuck Lever
0 siblings, 1 reply; 14+ messages in thread
From: Mike Snitzer @ 2025-04-24 16:51 UTC (permalink / raw)
To: Chuck Lever; +Cc: Matthew Wilcox, trondmy, linux-nfs, linux-mm, linux-fsdevel
On Wed, Apr 23, 2025 at 11:30:21AM -0400, Chuck Lever wrote:
> On 4/23/25 11:22 AM, Matthew Wilcox wrote:
> > On Wed, Apr 23, 2025 at 10:38:37AM -0400, Chuck Lever wrote:
> >> On 4/23/25 12:25 AM, trondmy@kernel.org wrote:
> >>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >>>
> >>> The following patch set attempts to add support for the RWF_DONTCACHE
> >>> flag in preadv2() and pwritev2() on NFS filesystems.
> >>
> >> Hi Trond-
> >>
> >> "RFC" in the subject field noted.
> >>
> >> The cover letter does not explain why one would want this facility, nor
> >> does it quantify the performance implications.
> >>
> >> I can understand not wanting to cache on an NFS server, but don't you
> >> want to maintain a data cache as close to applications as possible?
> >
> > If you look at the original work for RWF_DONTCACHE, you'll see this is
> > the application providing the hint that it's doing a streaming access.
> > It's only applied to folios which are created as a result of this
> > access, and other accesses to these folios while the folios are in use
> > clear the flag. So it's kind of like O_DIRECT access, except that it
> > does go through the page cache so there's none of this funky alignment
> > requirement on the userspace buffers.
>
> OK, was wondering whether this behavior was opt-in; sounds like it is.
> Thanks for setting me straight.
Yes, its certainly opt-in (requires setting a flag for each use).
Jens added support in fio relatively recently, see:
https://git.kernel.dk/cgit/fio/commit/?id=43c67b9f3a8808274bc1e0a3b7b70c56bb8a007f
Looking ahead relative to NFSD, as you know we've discussed exposing
per-export config controls to enable use of DONTCACHE. Finer controls
(e.g. only large sequential IO) would be more desirable but I'm not
aware of a simple means to detect such workloads with NFSD.
Could it be that we'd do well to carry through large folio support in
NFSD and expose a configurable threshold that if met or exceeded then
DONTCACHE used?
What is the status of large folio support in NFSD? Is anyone actively
working on it?
Thanks,
Mike
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE
2025-04-24 16:51 ` Mike Snitzer
@ 2025-04-24 16:59 ` Chuck Lever
0 siblings, 0 replies; 14+ messages in thread
From: Chuck Lever @ 2025-04-24 16:59 UTC (permalink / raw)
To: Mike Snitzer; +Cc: Matthew Wilcox, trondmy, linux-nfs, linux-mm, linux-fsdevel
On 4/24/25 12:51 PM, Mike Snitzer wrote:
> On Wed, Apr 23, 2025 at 11:30:21AM -0400, Chuck Lever wrote:
>> On 4/23/25 11:22 AM, Matthew Wilcox wrote:
>>> On Wed, Apr 23, 2025 at 10:38:37AM -0400, Chuck Lever wrote:
>>>> On 4/23/25 12:25 AM, trondmy@kernel.org wrote:
>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>>
>>>>> The following patch set attempts to add support for the RWF_DONTCACHE
>>>>> flag in preadv2() and pwritev2() on NFS filesystems.
>>>>
>>>> Hi Trond-
>>>>
>>>> "RFC" in the subject field noted.
>>>>
>>>> The cover letter does not explain why one would want this facility, nor
>>>> does it quantify the performance implications.
>>>>
>>>> I can understand not wanting to cache on an NFS server, but don't you
>>>> want to maintain a data cache as close to applications as possible?
>>>
>>> If you look at the original work for RWF_DONTCACHE, you'll see this is
>>> the application providing the hint that it's doing a streaming access.
>>> It's only applied to folios which are created as a result of this
>>> access, and other accesses to these folios while the folios are in use
>>> clear the flag. So it's kind of like O_DIRECT access, except that it
>>> does go through the page cache so there's none of this funky alignment
>>> requirement on the userspace buffers.
>>
>> OK, was wondering whether this behavior was opt-in; sounds like it is.
>> Thanks for setting me straight.
>
> Yes, its certainly opt-in (requires setting a flag for each use).
> Jens added support in fio relatively recently, see:
> https://git.kernel.dk/cgit/fio/commit/?id=43c67b9f3a8808274bc1e0a3b7b70c56bb8a007f
>
> Looking ahead relative to NFSD, as you know we've discussed exposing
> per-export config controls to enable use of DONTCACHE. Finer controls
> (e.g. only large sequential IO) would be more desirable but I'm not
> aware of a simple means to detect such workloads with NFSD.
>
> Could it be that we'd do well to carry through large folio support in
> NFSD and expose a configurable threshold that if met or exceeded then
> DONTCACHE used?
>
> What is the status of large folio support in NFSD? Is anyone actively
> working on it?
The nfsd_splice_actor() is the current bottleneck that converts large
folios from the page cache into a pipe full of single pages. The plan
is to measure the differences between NFSD's splice read and vectored
read paths. Hopefully they are close enough that we can remove splice
read. Beepy has said he will look into that performance measurement.
Anna has mentioned some work on large folio support using xdr_buf, but
I haven't reviewed patches there.
And, we need to get DMA API support for bio_vec iov iters to make the
socket and RDMA transports operate roughly equivalently. Leon has met
some resistance from the DMA maintainers, but pretty much every direct
consumer of the DMA API is anxious to get this facility.
Once those pre-requisites are in place, large folio support in NFSD
should be straightforward.
--
Chuck Lever
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/3] NFS: add RWF_DONTCACHE support to LOCALIO
2025-04-23 4:25 [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE trondmy
` (3 preceding siblings ...)
2025-04-23 14:38 ` [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE Chuck Lever
@ 2025-04-24 21:29 ` Mike Snitzer
2025-07-14 6:22 ` [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE Christoph Hellwig
5 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-04-24 21:29 UTC (permalink / raw)
To: trondmy; +Cc: linux-nfs, linux-mm, linux-fsdevel
If DONTCACHE is used by the NFS client set NFS_IOHDR_DONTCACHE. And
update LOCALIO so that it uses DONTCACHE, as a side-effect of setting
IOCB_DONTCACHE, if NFS_IOHDR_DONTCACHE was set.
Tweaked nfs_local_iocb_alloc() so that it uses kiocb_set_rw_flags().
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 27 +++++++++++++++++----------
fs/nfs/pagelist.c | 4 ++++
fs/nfs/read.c | 2 ++
fs/nfs/write.c | 2 ++
include/linux/nfs_page.h | 1 +
include/linux/nfs_xdr.h | 1 +
6 files changed, 27 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index b1911d9b6be21..ee46eb3f65776 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -331,26 +331,30 @@ nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
static struct nfs_local_kiocb *
nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
- struct file *file, gfp_t flags)
+ struct file *file, int type, gfp_t gfp)
{
+ rwf_t flags = 0;
struct nfs_local_kiocb *iocb;
- iocb = kmalloc(sizeof(*iocb), flags);
+ iocb = kmalloc(sizeof(*iocb), gfp);
if (iocb == NULL)
return NULL;
iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
- hdr->page_array.npages, flags);
- if (iocb->bvec == NULL) {
- kfree(iocb);
- return NULL;
- }
+ hdr->page_array.npages, gfp);
+ if (iocb->bvec == NULL)
+ goto out;
if (localio_O_DIRECT_semantics &&
test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) {
iocb->kiocb.ki_filp = file;
iocb->kiocb.ki_flags = IOCB_DIRECT;
- } else
+ } else {
init_sync_kiocb(&iocb->kiocb, file);
+ if (test_bit(NFS_IOHDR_DONTCACHE, &hdr->flags))
+ flags |= RWF_DONTCACHE;
+ if (flags && kiocb_set_rw_flags(&iocb->kiocb, flags, type))
+ goto out;
+ }
iocb->kiocb.ki_pos = hdr->args.offset;
iocb->hdr = hdr;
@@ -358,6 +362,9 @@ nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
iocb->aio_complete_work = NULL;
return iocb;
+out:
+ kfree(iocb);
+ return NULL;
}
static void
@@ -499,7 +506,7 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);
- iocb = nfs_local_iocb_alloc(hdr, file, GFP_KERNEL);
+ iocb = nfs_local_iocb_alloc(hdr, file, READ, GFP_KERNEL);
if (iocb == NULL)
return -ENOMEM;
iocb->localio = localio;
@@ -698,7 +705,7 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
- iocb = nfs_local_iocb_alloc(hdr, file, GFP_NOIO);
+ iocb = nfs_local_iocb_alloc(hdr, file, WRITE, GFP_NOIO);
if (iocb == NULL)
return -ENOMEM;
iocb->localio = localio;
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 11968dcb72431..eefda82c1ece8 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -824,6 +824,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
int io_flags)
{
desc->pg_moreio = 0;
+ desc->pg_dontcache = 0;
desc->pg_inode = inode;
desc->pg_ops = pg_ops;
desc->pg_completion_ops = compl_ops;
@@ -932,6 +933,9 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
return desc->pg_error;
}
+ if (desc->pg_dontcache)
+ set_bit(NFS_IOHDR_DONTCACHE, &hdr->flags);
+
if ((desc->pg_ioflags & FLUSH_COND_STABLE) &&
(desc->pg_moreio || nfs_reqs_to_commit(&cinfo)))
desc->pg_ioflags &= ~FLUSH_COND_STABLE;
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 81bd1b9aba176..51f4eaa1512bb 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -316,6 +316,8 @@ int nfs_read_add_folio(struct nfs_pageio_descriptor *pgio,
nfs_readpage_release(new, error);
goto out;
}
+ if (folio_test_dropbehind(folio))
+ pgio->pg_dontcache = 1;
return 0;
out:
return error;
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index e0ac439ab211b..88b7bd64c7864 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -684,6 +684,8 @@ static int nfs_page_async_flush(struct folio *folio,
static int nfs_do_writepage(struct folio *folio, struct writeback_control *wbc,
struct nfs_pageio_descriptor *pgio)
{
+ if (folio_test_dropbehind(folio))
+ pgio->pg_dontcache = 1;
nfs_pageio_cond_complete(pgio, folio->index);
return nfs_page_async_flush(folio, wbc, pgio);
}
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 1a017b5b476fa..44bd9141820c4 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -118,6 +118,7 @@ struct nfs_pageio_descriptor {
u32 pg_mirror_idx; /* current mirror */
unsigned short pg_maxretrans;
unsigned char pg_moreio : 1;
+ unsigned char pg_dontcache : 1;
};
/* arbitrarily selected limit to number of mirrors */
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 4f0d89893bcb8..3e82dea65c8c9 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1634,6 +1634,7 @@ enum {
NFS_IOHDR_RESEND_MDS,
NFS_IOHDR_UNSTABLE_WRITES,
NFS_IOHDR_ODIRECT,
+ NFS_IOHDR_DONTCACHE,
};
struct nfs_io_completion;
--
2.44.0
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 1/3] filemap: Add a helper for filesystems implementing dropbehind
2025-04-23 4:25 ` [RFC PATCH 1/3] filemap: Add a helper for filesystems implementing dropbehind trondmy
@ 2025-04-24 21:30 ` Mike Snitzer
0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-04-24 21:30 UTC (permalink / raw)
To: trondmy; +Cc: linux-nfs, linux-mm, linux-fsdevel
On Wed, Apr 23, 2025 at 12:25:30AM -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> Add a helper to allow filesystems to attempt to free the 'dropbehind'
> folio.
>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 2/3] filemap: Mark folios as dropbehind in generic_perform_write()
2025-04-23 4:25 ` [RFC PATCH 2/3] filemap: Mark folios as dropbehind in generic_perform_write() trondmy
@ 2025-04-24 21:30 ` Mike Snitzer
0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-04-24 21:30 UTC (permalink / raw)
To: trondmy; +Cc: linux-nfs, linux-mm, linux-fsdevel
On Wed, Apr 23, 2025 at 12:25:31AM -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The iocb flags are not passed down to the write_begin() callback that is
> allocating the folio, so we need to set the dropbehind folio flag from
> inside the generic_perform_write() function itself.
>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 3/3] NFS: Enable the RWF_DONTCACHE flag for the NFS client
2025-04-23 4:25 ` [RFC PATCH 3/3] NFS: Enable the RWF_DONTCACHE flag for the NFS client trondmy
@ 2025-04-24 21:31 ` Mike Snitzer
0 siblings, 0 replies; 14+ messages in thread
From: Mike Snitzer @ 2025-04-24 21:31 UTC (permalink / raw)
To: trondmy; +Cc: linux-nfs, linux-mm, linux-fsdevel
On Wed, Apr 23, 2025 at 12:25:32AM -0400, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> While the NFS readahead code has no problems using the generic code to
> manage the dropbehind behaviour enabled by RWF_DONTCACHE, the write code
> needs to deal with the fact that NFS writeback uses a 2 step process
> (UNSTABLE write followed by COMMIT).
> This commit replaces the use of the folio dropbehind flag with a local
> NFS request flag that triggers the dropbehind behaviour once the data
> has been written to stable storage.
>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE
2025-04-23 4:25 [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE trondmy
` (4 preceding siblings ...)
2025-04-24 21:29 ` [PATCH 4/3] NFS: add RWF_DONTCACHE support to LOCALIO Mike Snitzer
@ 2025-07-14 6:22 ` Christoph Hellwig
5 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2025-07-14 6:22 UTC (permalink / raw)
To: trondmy; +Cc: linux-nfs, linux-mm, linux-fsdevel
I just noticed this RFC series in the nfs testing tree.
Given that is not in linux-next I guess it really just is for testing,
but if you are interestested in RWF_DONTCACHE support for NFS, please
integrated with the series at:
https://lore.kernel.org/linux-fsdevel/20250710101404.362146-1-chentaotao@didiglobal.com/#r
That adds full dontcache support for the write_begin based file systems.
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-07-14 6:22 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-23 4:25 [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE trondmy
2025-04-23 4:25 ` [RFC PATCH 1/3] filemap: Add a helper for filesystems implementing dropbehind trondmy
2025-04-24 21:30 ` Mike Snitzer
2025-04-23 4:25 ` [RFC PATCH 2/3] filemap: Mark folios as dropbehind in generic_perform_write() trondmy
2025-04-24 21:30 ` Mike Snitzer
2025-04-23 4:25 ` [RFC PATCH 3/3] NFS: Enable the RWF_DONTCACHE flag for the NFS client trondmy
2025-04-24 21:31 ` Mike Snitzer
2025-04-23 14:38 ` [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE Chuck Lever
2025-04-23 15:22 ` Matthew Wilcox
2025-04-23 15:30 ` Chuck Lever
2025-04-24 16:51 ` Mike Snitzer
2025-04-24 16:59 ` Chuck Lever
2025-04-24 21:29 ` [PATCH 4/3] NFS: add RWF_DONTCACHE support to LOCALIO Mike Snitzer
2025-07-14 6:22 ` [RFC PATCH 0/3] Initial NFS client support for RWF_DONTCACHE Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).