From: Mike Snitzer <snitzer@kernel.org>
To: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>,
linux-nfs@vger.kernel.org
Subject: [v6.18-rcX PATCH 5/3] nfs/localio: do not issue misaligned DIO out-of-order
Date: Wed, 29 Oct 2025 19:19:30 -0400 [thread overview]
Message-ID: <aQKhAksYqPjOzUNv@kernel.org> (raw)
In-Reply-To: <aP-xXB_ht8F1i5YQ@kernel.org>
From https://lore.kernel.org/linux-nfs/aQHASIumLJyOoZGH@infradead.org/
On Wed, Oct 29, 2025 at 12:20:40AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 27, 2025 at 12:18:30PM -0400, Mike Snitzer wrote:
> > LOCALIO's misaligned DIO will issue head/tail followed by O_DIRECT
> > middle (via AIO completion of that aligned middle). So out of order
> > relative to file offset.
>
> That's in general a really bad idea. It will obviously work, but
> both on SSDs and out of place write file systems it is a sure way
> to increase your garbage collection overhead a lot down the line.
Fix this by never issuing misaligned DIO out-of-order. This fix means
the DIO-aligned segment will only use AIO completion if there is no
misaligned end segment. Otherwise, all 3 segments of a misaligned DIO
will be issued without AIO completion to ensure file offset increases
properly for all partial READ or WRITE situations.
Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 83 +++++++++++++++++-------------------------------
1 file changed, 29 insertions(+), 54 deletions(-)
Anna, apologies for stringing fixes together like this; and that this
same commit c817248fc831 has so many follow-on Fixes is not lost on
me. But the full series of commit c817248fc831 fixes is composed of:
[v6.18-rcX PATCH 1/3] nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support
[v6.18-rcX PATCH 2/3] nfs/localio: add refcounting for each iocb IO associated with NFS pgio header
[v6.18-rcX PATCH 3/3] nfs/localio: backfill missing partial read support for misaligned DIO
[v6.18-rcX PATCH 4/3] nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion
[v6.18-rcX PATCH 5/3] nfs/localio: do not issue misaligned DIO out-of-order
NOTE: PATCH 4/3's use of IOCBD_DSYNC|IOCB_SYNC _is_ conservative, but I
will audit and adjust this further (informed by NFSD Direct's ongoing
evolution for handling this same situaiton) for the v6.19 merge window.
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ca9df8d09c2d..018fa332aae4 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -40,7 +40,6 @@ struct nfs_local_kiocb {
void (*aio_complete_work)(struct work_struct *);
struct nfsd_file *localio;
/* Begin mostly DIO-specific members */
- size_t end_len;
short int end_iter_index;
atomic_t n_iters;
bool iter_is_dio_aligned[NFSLOCAL_MAX_IOS];
@@ -411,27 +410,8 @@ nfs_local_iters_setup_dio(struct nfs_local_kiocb *iocb, int rw,
++n_iters;
}
- /* Setup misaligned end?
- * If so, the end is purposely setup to be issued using buffered IO
- * before the middle (which will use DIO, if DIO-aligned, with AIO).
- * This creates problems if/when the end results in short read or write.
- * So must save index and length of end to handle this corner case.
- */
- if (local_dio->end_len) {
- iov_iter_bvec(&iters[n_iters], rw, iocb->bvec, nvecs, len);
- iocb->offset[n_iters] = local_dio->end_offset;
- iov_iter_advance(&iters[n_iters],
- local_dio->start_len + local_dio->middle_len);
- iocb->iter_is_dio_aligned[n_iters] = false;
- /* Save index and length of end */
- iocb->end_iter_index = n_iters;
- iocb->end_len = local_dio->end_len;
- atomic_inc(&iocb->n_iters);
- ++n_iters;
- }
-
- /* Setup DIO-aligned middle to be issued last, to allow for
- * DIO with AIO completion (see nfs_local_call_{read,write}).
+ /* Setup DIO-aligned middle, if there is no misaligned end (below)
+ * then AIO completion is used, see nfs_local_call_{read,write}
*/
iov_iter_bvec(&iters[n_iters], rw, iocb->bvec, nvecs, len);
if (local_dio->start_len)
@@ -448,8 +428,21 @@ nfs_local_iters_setup_dio(struct nfs_local_kiocb *iocb, int rw,
iocb->hdr->args.offset, len, local_dio);
return 0; /* no DIO-aligned IO possible */
}
+ iocb->end_iter_index = n_iters;
++n_iters;
+ /* Setup misaligned end? */
+ if (local_dio->end_len) {
+ iov_iter_bvec(&iters[n_iters], rw, iocb->bvec, nvecs, len);
+ iocb->offset[n_iters] = local_dio->end_offset;
+ iov_iter_advance(&iters[n_iters],
+ local_dio->start_len + local_dio->middle_len);
+ iocb->iter_is_dio_aligned[n_iters] = false;
+ atomic_inc(&iocb->n_iters);
+ iocb->end_iter_index = n_iters;
+ ++n_iters;
+ }
+
return n_iters;
}
@@ -636,27 +629,18 @@ static void nfs_local_call_read(struct work_struct *work)
/* DIO-aligned middle is always issued last with AIO completion */
if (iocb->iter_is_dio_aligned[i]) {
iocb->kiocb.ki_flags |= IOCB_DIRECT;
- iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
- iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+ /* Only use AIO completion if DIO-aligned segment is last */
+ if (i == iocb->end_iter_index) {
+ iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+ iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+ }
}
iocb->kiocb.ki_pos = iocb->offset[i];
status = filp->f_op->read_iter(&iocb->kiocb, &iocb->iters[i]);
if (status != -EIOCBQUEUED) {
- if (unlikely(status >= 0 && status < iocb->iters[i].count)) {
- /* partial read */
- if (i == iocb->end_iter_index) {
- /* Must not account DIO partial end, otherwise (due
- * to end being issued before middle): the partial
- * read accounting in nfs_local_read_done()
- * would incorrectly advance hdr->args.offset
- */
- status = 0;
- } else {
- /* Partial read at start or middle, force done */
- force_done = true;
- }
- }
+ if (unlikely(status >= 0 && status < iocb->iters[i].count))
+ force_done = true; /* Partial read */
if (nfs_local_pgio_done(iocb, status, force_done)) {
nfs_local_read_iocb_done(iocb);
break;
@@ -854,27 +838,18 @@ static void nfs_local_call_write(struct work_struct *work)
/* DIO-aligned middle is always issued last with AIO completion */
if (iocb->iter_is_dio_aligned[i]) {
iocb->kiocb.ki_flags |= IOCB_DIRECT;
- iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
- iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+ /* Only use AIO completion if DIO-aligned segment is last */
+ if (i == iocb->end_iter_index) {
+ iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+ iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+ }
}
iocb->kiocb.ki_pos = iocb->offset[i];
status = filp->f_op->write_iter(&iocb->kiocb, &iocb->iters[i]);
if (status != -EIOCBQUEUED) {
- if (unlikely(status >= 0 && status < iocb->iters[i].count)) {
- /* partial write */
- if (i == iocb->end_iter_index) {
- /* Must not account DIO partial end, otherwise (due
- * to end being issued before middle): the partial
- * write accounting in nfs_local_write_done()
- * would incorrectly advance hdr->args.offset
- */
- status = 0;
- } else {
- /* Partial write at start or middle, force done */
- force_done = true;
- }
- }
+ if (unlikely(status >= 0 && status < iocb->iters[i].count))
+ force_done = true; /* Partial write */
if (nfs_local_pgio_done(iocb, status, force_done)) {
nfs_local_write_iocb_done(iocb);
break;
--
2.44.0
next prev parent reply other threads:[~2025-10-29 23:19 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-19 9:29 [Bug report] xfstests generic/323 over NFS hit BUG: KASAN: slab-use-after-free in nfs_local_call_read on 6.18.0-rc1 Yongcheng Yang
2025-10-19 15:18 ` Trond Myklebust
2025-10-19 16:26 ` Mike Snitzer
2025-10-20 18:24 ` Mike Snitzer
2025-10-27 13:08 ` [v6.18-rcX PATCH 0/3] nfs/localio: fixes for recent misaligned DIO changes Mike Snitzer
2025-10-27 13:08 ` [v6.18-rcX PATCH 1/3] nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support Mike Snitzer
2025-10-27 13:08 ` [v6.18-rcX PATCH 2/3] nfs/localio: add refcounting for each iocb IO associated with NFS pgio header Mike Snitzer
2025-10-27 13:19 ` Christoph Hellwig
2025-10-27 13:55 ` Mike Snitzer
2025-10-27 14:45 ` Christoph Hellwig
2025-10-27 13:08 ` [v6.18-rcX PATCH 3/3] nfs/localio: backfill missing partial read support for misaligned DIO Mike Snitzer
2025-10-27 17:52 ` [v6.18-rcX PATCH 4/3] nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion Mike Snitzer
2025-10-29 23:19 ` Mike Snitzer [this message]
2025-10-31 1:50 ` [v6.18-rcX PATCH 5/3] nfs/localio: do not issue misaligned DIO out-of-order Mike Snitzer
2025-10-31 13:33 ` Anna Schumaker
2025-11-04 18:02 ` [v6.18-rcX PATCH v2] " Mike Snitzer
2025-11-06 2:50 ` Mike Snitzer
2025-11-06 3:03 ` [v6.18-rcX PATCH v3 5/3] " Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQKhAksYqPjOzUNv@kernel.org \
--to=snitzer@kernel.org \
--cc=anna.schumaker@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@hammerspace.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).