linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>,
	linux-nfs@vger.kernel.org
Subject: [v6.18-rcX PATCH 5/3] nfs/localio: do not issue misaligned DIO out-of-order
Date: Wed, 29 Oct 2025 19:19:30 -0400	[thread overview]
Message-ID: <aQKhAksYqPjOzUNv@kernel.org> (raw)
In-Reply-To: <aP-xXB_ht8F1i5YQ@kernel.org>

From https://lore.kernel.org/linux-nfs/aQHASIumLJyOoZGH@infradead.org/

On Wed, Oct 29, 2025 at 12:20:40AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 27, 2025 at 12:18:30PM -0400, Mike Snitzer wrote:
> > LOCALIO's misaligned DIO will issue head/tail followed by O_DIRECT
> > middle (via AIO completion of that aligned middle). So out of order
> > relative to file offset.
>
> That's in general a really bad idea.  It will obviously work, but
> both on SSDs and out of place write file systems it is a sure way
> to increase your garbage collection overhead a lot down the line.

Fix this by never issuing misaligned DIO out-of-order. This fix means
the DIO-aligned segment will only use AIO completion if there is no
misaligned end segment. Otherwise, all 3 segments of a misaligned DIO
will be issued without AIO completion to ensure file offset increases
properly for all partial READ or WRITE situations.

Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE")
Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/localio.c | 83 +++++++++++++++++-------------------------------
 1 file changed, 29 insertions(+), 54 deletions(-)

Anna, apologies for stringing fixes together like this; and that this
same commit c817248fc831 has so many follow-on Fixes is not lost on
me.  But the full series of commit c817248fc831 fixes is composed of:

[v6.18-rcX PATCH 1/3] nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support
[v6.18-rcX PATCH 2/3] nfs/localio: add refcounting for each iocb IO associated with NFS pgio header
[v6.18-rcX PATCH 3/3] nfs/localio: backfill missing partial read support for misaligned DIO
[v6.18-rcX PATCH 4/3] nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion
[v6.18-rcX PATCH 5/3] nfs/localio: do not issue misaligned DIO out-of-order

NOTE: PATCH 4/3's use of IOCBD_DSYNC|IOCB_SYNC _is_ conservative, but I
will audit and adjust this further (informed by NFSD Direct's ongoing
evolution for handling this same situaiton) for the v6.19 merge window.

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ca9df8d09c2d..018fa332aae4 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -40,7 +40,6 @@ struct nfs_local_kiocb {
 	void (*aio_complete_work)(struct work_struct *);
 	struct nfsd_file	*localio;
 	/* Begin mostly DIO-specific members */
-	size_t                  end_len;
 	short int		end_iter_index;
 	atomic_t		n_iters;
 	bool			iter_is_dio_aligned[NFSLOCAL_MAX_IOS];
@@ -411,27 +410,8 @@ nfs_local_iters_setup_dio(struct nfs_local_kiocb *iocb, int rw,
 		++n_iters;
 	}
 
-	/* Setup misaligned end?
-	 * If so, the end is purposely setup to be issued using buffered IO
-	 * before the middle (which will use DIO, if DIO-aligned, with AIO).
-	 * This creates problems if/when the end results in short read or write.
-	 * So must save index and length of end to handle this corner case.
-	 */
-	if (local_dio->end_len) {
-		iov_iter_bvec(&iters[n_iters], rw, iocb->bvec, nvecs, len);
-		iocb->offset[n_iters] = local_dio->end_offset;
-		iov_iter_advance(&iters[n_iters],
-			local_dio->start_len + local_dio->middle_len);
-		iocb->iter_is_dio_aligned[n_iters] = false;
-		/* Save index and length of end */
-		iocb->end_iter_index = n_iters;
-		iocb->end_len = local_dio->end_len;
-		atomic_inc(&iocb->n_iters);
-		++n_iters;
-	}
-
-	/* Setup DIO-aligned middle to be issued last, to allow for
-	 * DIO with AIO completion (see nfs_local_call_{read,write}).
+	/* Setup DIO-aligned middle, if there is no misaligned end (below)
+	 * then AIO completion is used, see nfs_local_call_{read,write}
 	 */
 	iov_iter_bvec(&iters[n_iters], rw, iocb->bvec, nvecs, len);
 	if (local_dio->start_len)
@@ -448,8 +428,21 @@ nfs_local_iters_setup_dio(struct nfs_local_kiocb *iocb, int rw,
 			iocb->hdr->args.offset, len, local_dio);
 		return 0; /* no DIO-aligned IO possible */
 	}
+	iocb->end_iter_index = n_iters;
 	++n_iters;
 
+	/* Setup misaligned end? */
+	if (local_dio->end_len) {
+		iov_iter_bvec(&iters[n_iters], rw, iocb->bvec, nvecs, len);
+		iocb->offset[n_iters] = local_dio->end_offset;
+		iov_iter_advance(&iters[n_iters],
+			local_dio->start_len + local_dio->middle_len);
+		iocb->iter_is_dio_aligned[n_iters] = false;
+		atomic_inc(&iocb->n_iters);
+		iocb->end_iter_index = n_iters;
+		++n_iters;
+	}
+
 	return n_iters;
 }
 
@@ -636,27 +629,18 @@ static void nfs_local_call_read(struct work_struct *work)
 		/* DIO-aligned middle is always issued last with AIO completion */
 		if (iocb->iter_is_dio_aligned[i]) {
 			iocb->kiocb.ki_flags |= IOCB_DIRECT;
-			iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
-			iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+			/* Only use AIO completion if DIO-aligned segment is last */
+			if (i == iocb->end_iter_index) {
+				iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+				iocb->aio_complete_work = nfs_local_read_aio_complete_work;
+			}
 		}
 
 		iocb->kiocb.ki_pos = iocb->offset[i];
 		status = filp->f_op->read_iter(&iocb->kiocb, &iocb->iters[i]);
 		if (status != -EIOCBQUEUED) {
-			if (unlikely(status >= 0 && status < iocb->iters[i].count)) {
-				/* partial read */
-				if (i == iocb->end_iter_index) {
-					/* Must not account DIO partial end, otherwise (due
-					 * to end being issued before middle): the partial
-					 * read accounting in nfs_local_read_done()
-					 * would incorrectly advance hdr->args.offset
-					 */
-					status = 0;
-				} else {
-					/* Partial read at start or middle, force done */
-					force_done = true;
-				}
-			}
+			if (unlikely(status >= 0 && status < iocb->iters[i].count))
+				force_done = true; /* Partial read */
 			if (nfs_local_pgio_done(iocb, status, force_done)) {
 				nfs_local_read_iocb_done(iocb);
 				break;
@@ -854,27 +838,18 @@ static void nfs_local_call_write(struct work_struct *work)
 		/* DIO-aligned middle is always issued last with AIO completion */
 		if (iocb->iter_is_dio_aligned[i]) {
 			iocb->kiocb.ki_flags |= IOCB_DIRECT;
-			iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
-			iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+			/* Only use AIO completion if DIO-aligned segment is last */
+			if (i == iocb->end_iter_index) {
+				iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+				iocb->aio_complete_work = nfs_local_write_aio_complete_work;
+			}
 		}
 
 		iocb->kiocb.ki_pos = iocb->offset[i];
 		status = filp->f_op->write_iter(&iocb->kiocb, &iocb->iters[i]);
 		if (status != -EIOCBQUEUED) {
-			if (unlikely(status >= 0 && status < iocb->iters[i].count)) {
-				/* partial write */
-				if (i == iocb->end_iter_index) {
-					/* Must not account DIO partial end, otherwise (due
-					 * to end being issued before middle): the partial
-					 * write accounting in nfs_local_write_done()
-					 * would incorrectly advance hdr->args.offset
-					 */
-					status = 0;
-				} else {
-					/* Partial write at start or middle, force done */
-					force_done = true;
-				}
-			}
+			if (unlikely(status >= 0 && status < iocb->iters[i].count))
+				force_done = true; /* Partial write */
 			if (nfs_local_pgio_done(iocb, status, force_done)) {
 				nfs_local_write_iocb_done(iocb);
 				break;
-- 
2.44.0


  reply	other threads:[~2025-10-29 23:19 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-19  9:29 [Bug report] xfstests generic/323 over NFS hit BUG: KASAN: slab-use-after-free in nfs_local_call_read on 6.18.0-rc1 Yongcheng Yang
2025-10-19 15:18 ` Trond Myklebust
2025-10-19 16:26   ` Mike Snitzer
2025-10-20 18:24     ` Mike Snitzer
2025-10-27 13:08       ` [v6.18-rcX PATCH 0/3] nfs/localio: fixes for recent misaligned DIO changes Mike Snitzer
2025-10-27 13:08       ` [v6.18-rcX PATCH 1/3] nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE support Mike Snitzer
2025-10-27 13:08       ` [v6.18-rcX PATCH 2/3] nfs/localio: add refcounting for each iocb IO associated with NFS pgio header Mike Snitzer
2025-10-27 13:19         ` Christoph Hellwig
2025-10-27 13:55           ` Mike Snitzer
2025-10-27 14:45             ` Christoph Hellwig
2025-10-27 13:08       ` [v6.18-rcX PATCH 3/3] nfs/localio: backfill missing partial read support for misaligned DIO Mike Snitzer
2025-10-27 17:52       ` [v6.18-rcX PATCH 4/3] nfs/localio: Ensure DIO WRITE's IO on stable storage upon completion Mike Snitzer
2025-10-29 23:19         ` Mike Snitzer [this message]
2025-10-31  1:50           ` [v6.18-rcX PATCH 5/3] nfs/localio: do not issue misaligned DIO out-of-order Mike Snitzer
2025-10-31 13:33             ` Anna Schumaker
2025-11-04 18:02               ` [v6.18-rcX PATCH v2] " Mike Snitzer
2025-11-06  2:50                 ` Mike Snitzer
2025-11-06  3:03                   ` [v6.18-rcX PATCH v3 5/3] " Mike Snitzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQKhAksYqPjOzUNv@kernel.org \
    --to=snitzer@kernel.org \
    --cc=anna.schumaker@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).