From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, Jeff Layton <jlayton@kernel.org>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH v9 9/9] NFSD: use /end/ of rq_pages for misaligned DIO READ's start_extra page
Date: Wed, 3 Sep 2025 16:51:21 -0400 [thread overview]
Message-ID: <20250903205121.41380-10-snitzer@kernel.org> (raw)
In-Reply-To: <20250903205121.41380-1-snitzer@kernel.org>
This commit works around what seems like a flexfiles+rpcrdma bug, and
Chuck Lever clarified that this shouldn't be needed:
"Yes, the extra page needs to come from rq_pages. But I don't see
why it should come from the /end/ of rq_pages."
However, when using NFSD DIRECT for READ and NFS 4.2 client with pNFS
flexfiles (and client gets a layout to use a v3 DS) over RDMA it is
easy to see data mismatch when NFSD handles a misaligned DIO READ. If
the same misaligned DIO READ is issued directly to the v3 DS over RDMA
(so flexfiles is _not_ used) then no data mismatch occurs.
Therefore, until this bug can be found, must use a 'start_extra' page
from rq_pages that follows the NFS client requested READ payload (RDMA
memory) if/when expanding the misaligned READ requires reading an
extra partial page at the start of the READ so that its DIO-aligned.
Otherwise if the 'start_extra' page is taken from the beginning of
rq_pages the pNFS flexfiles client will see data mismatch corruption.
As found, and then this fix of using the end of rq_pages verified,
using the 'dt' utility:
dt of=/mnt/share1/dt_a.test passes=1 bs=47008 count=2 \
iotype=sequential pattern=iot onerr=abort oncerr=abort
see: https://github.com/RobinTMiller/dt.git
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/vfs.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5b3c6072b6f5c..e9ddeec3c9a32 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1263,7 +1263,7 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
if (read_dio.start_extra) {
len = read_dio.start_extra;
bvec_set_page(&rqstp->rq_bvec[v],
- *(rqstp->rq_next_page++),
+ NULL, /* set below */
len, PAGE_SIZE - len);
total -= len;
++v;
@@ -1288,6 +1288,11 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
base = 0;
}
WARN_ON_ONCE(v > rqstp->rq_maxpages);
+ /* FIXME: having the start_extra page come from the end of
+ * rq_pages[] works around what seems to be a flexfiles+rpcrdma bug.
+ */
+ if ((kiocb.ki_flags & IOCB_DIRECT) && read_dio.start_extra)
+ rqstp->rq_bvec[0].bv_page = *(rqstp->rq_next_page++);
trace_nfsd_read_vector(rqstp, fhp, offset, in_count);
iov_iter_bvec(&iter, ITER_DEST, rqstp->rq_bvec, v, in_count);
--
2.44.0
prev parent reply other threads:[~2025-09-03 20:51 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-03 20:51 [PATCH v9 0/9] NFSD: add "NFSD DIRECT" and "NFSD DONTCACHE" IO modes Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 1/9] NFSD: filecache: add STATX_DIOALIGN and STATX_DIO_READ_ALIGN support Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 2/9] NFSD: pass nfsd_file to nfsd_iter_read() Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 3/9] NFSD: add io_cache_read controls to debugfs interface Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 4/9] NFSD: add io_cache_write " Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 5/9] NFSD: issue READs using O_DIRECT even if IO is misaligned Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 6/9] NFSD: issue WRITEs " Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 7/9] NFSD: add nfsd_analyze_read_dio and nfsd_analyze_write_dio trace events Mike Snitzer
2025-09-03 20:51 ` [PATCH v9 8/9] NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst Mike Snitzer
2025-09-03 20:51 ` Mike Snitzer [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250903205121.41380-10-snitzer@kernel.org \
--to=snitzer@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox