From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH v5 0/7] NFSD: add "NFSD DIRECT" and "NFSD DONTCACHE" IO modes
Date: Thu, 7 Aug 2025 12:25:37 -0400 [thread overview]
Message-ID: <20250807162544.17191-1-snitzer@kernel.org> (raw)
Hi,
Some workloads benefit from NFSD avoiding the page cache, particularly
those with a working set that is significantly larger than available
system memory. This patchset introduces _optional_ support to
configure the use of O_DIRECT or DONTCACHE for NFSD's READ and WRITE
support. The NFSD default to use page cache is left unchanged.
The performance win associated with using NFSD DIRECT was previously
summarized here:
https://lore.kernel.org/linux-nfs/aEslwqa9iMeZjjlV@kernel.org/
This picture offers a nice summary of performance gains:
https://original.art/NFSD_direct_vs_buffered_IO.jpg
This series builds on what has been staged in the nfsd-testing branch.
This code has proven to work well during my testing. Any suggestions
for further refinement are welcome.
Thanks,
Mike
Changes since v4:
- removed use of iov_iter_is_aligned() in earlier patches, we don't
want any conflict with Keith's patchset that ultimately removes the
iov_iter_is_aligned() interface.
- refactored the final "NFSD: issue WRITEs using O_DIRECT even if IO
is misaligned" patch to have the lightest touch possible on
nfsd_vfs_write() for the default buffered IO case.
- updated patch headers where needed.
- all patches have changed some, so removed all Reviewed-by from Jeff
and Signed-off-by from Chuck.
- Series checked with checkpatch.pl, sparse and verified bisect safe.
Changes since v3:
- fixed nfsd_vfs_write() so work that only needs to happen once, after
IO is submitted, isn't performed each iteration of the for loop,
thanks for catching this Chuck.
- updated NFSD's misaligned READ and WRITE handling to not use
iov_iter_is_aligned() because it will soon be removed upstream.
- Chuck, provided both you and Jeff agree with patch 1's incremental
changes, patch 1 should be folded into what you already have in your
nfsd-testing branch (more justification in patch 1's header)
- dropped the NFSD misaligned DIO NFS reexport patch in favor of
adding STATX_DIOALIGN support to NFS (the patch to add support will
be provided in the next NFS DIRECT v7 patchset that I'll post soon).
Changes since v2:
- fixed patch 2 by moving redundant work out of nfsd_vfs_write()'s for
loop, thanks to Jeff's review.
- added Jeff's Reviewed-by to patches 1-3.
- Left patch 4 in the series because it is pragmatic, but feel free to
drop it if you'd prefer to see this cat skinned a different way.
Changes since v1:
- switched to using an EVENT_CLASS to create nfsd_analyze_{read,write}_dio
- added 4th patch, if user configured use of NFSD_IO_DIRECT then NFS
reexports should use it too (in future, with per-export controls
we'll have the benefit of finer-grained control; but until then we'd
do well to offer comprehensive use of NFSD_IO_DIRECT if it enabled).
Thanks,
Mike
Mike Snitzer (7):
NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support
NFSD: pass nfsd_file to nfsd_iter_read()
NFSD: add io_cache_read controls to debugfs interface
NFSD: add io_cache_write controls to debugfs interface
NFSD: filecache: only get DIO alignment attrs if NFSD_IO_DIRECT enabled
NFSD: issue READs using O_DIRECT even if IO is misaligned
NFSD: issue WRITEs using O_DIRECT even if IO is misaligned
fs/nfsd/debugfs.c | 102 +++++++++++
fs/nfsd/filecache.c | 36 ++++
fs/nfsd/filecache.h | 4 +
fs/nfsd/nfs4xdr.c | 8 +-
fs/nfsd/nfsd.h | 10 +
fs/nfsd/nfsfh.c | 4 +
fs/nfsd/trace.h | 61 +++++++
fs/nfsd/vfs.c | 366 +++++++++++++++++++++++++++++++++++--
fs/nfsd/vfs.h | 2 +-
include/linux/sunrpc/svc.h | 5 +-
10 files changed, 575 insertions(+), 23 deletions(-)
--
2.44.0
next reply other threads:[~2025-08-07 16:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 16:25 Mike Snitzer [this message]
2025-08-07 16:25 ` [PATCH v5 1/7] NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
2025-08-08 11:49 ` Jeff Layton
2025-08-07 16:25 ` [PATCH v5 2/7] NFSD: pass nfsd_file to nfsd_iter_read() Mike Snitzer
2025-08-08 11:51 ` Jeff Layton
2025-08-07 16:25 ` [PATCH v5 3/7] NFSD: add io_cache_read controls to debugfs interface Mike Snitzer
2025-08-08 12:05 ` Jeff Layton
2025-08-08 12:05 ` Jeff Layton
2025-08-08 17:58 ` Chuck Lever
2025-08-07 16:25 ` [PATCH v5 4/7] NFSD: add io_cache_write " Mike Snitzer
2025-08-08 12:03 ` Jeff Layton
2025-08-08 17:58 ` Chuck Lever
2025-08-08 18:10 ` Mike Snitzer
2025-08-07 16:25 ` [PATCH v5 5/7] NFSD: filecache: only get DIO alignment attrs if NFSD_IO_DIRECT enabled Mike Snitzer
2025-08-08 12:05 ` Jeff Layton
2025-08-08 17:59 ` Chuck Lever
2025-08-08 18:12 ` Mike Snitzer
2025-08-07 16:25 ` [PATCH v5 6/7] NFSD: issue READs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-08-08 12:16 ` Jeff Layton
2025-08-08 17:59 ` Chuck Lever
2025-08-08 18:19 ` Mike Snitzer
2025-08-07 16:25 ` [PATCH v5 7/7] NFSD: issue WRITEs " Mike Snitzer
2025-08-08 2:30 ` Mike Snitzer
2025-08-08 14:21 ` Chuck Lever
2025-08-08 12:20 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250807162544.17191-1-snitzer@kernel.org \
--to=snitzer@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.