From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>, linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/3] nfsd: avoid using DONTCACHE for misaligned DIO's buffered IO fallback
Date: Tue, 4 Nov 2025 12:35:16 -0500 [thread overview]
Message-ID: <aQo5VOxRpr-HORf3@kernel.org> (raw)
In-Reply-To: <038374d0-6f09-4440-bd99-fbeef8f6d683@oracle.com>
On Tue, Nov 04, 2025 at 12:23:02PM -0500, Chuck Lever wrote:
> On 11/4/25 11:42 AM, Mike Snitzer wrote:
> > Also, use buffered IO (without DONTCACHE) if READ is less than 32K.
> > But do use DONTCACHE if an entire WRITE is misaligned, this preserves
> > intent of NFSD_IO_DIRECT.
>
> These two changes need to be separate patches.
They are linked, otherwise if READ uses DONTCACHE for the small IO
it'll kill any benefit to RMW.
Unless I'm misunderstanding which two changes you're referring to?
The "But do use DONTCACHE if an entire WRITE is misaligned" just
amounts to a comment tweak in nfsd_direct_write (last hunk below)
> > The misaligned ends of a misaligned DIO WRITE will use buffered IO
> > (without DONTCACHE) but the middle DIO-aligned segment with use direct
> > IO. This provides ideal performance for streaming misaligned DIO
> > (e.g. IO500's IOR_HARD) because buffered IO is used to benefit RMW.
> >
> > On one capable testbed, this commit improved IOR_HARD WRITE
> > performance from 0.3433GB/s to 1.26GB/s.
> >
> > Signed-off-by: Mike Snitzer <snitzer@hammerspace.com>
> > ---
> > fs/nfsd/vfs.c | 28 +++++++++++++++++++++++-----
> > 1 file changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 701dd261c252..9403ec8bb2da 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -104,6 +104,7 @@ nfserrno (int errno)
> > { nfserr_perm, -ENOKEY },
> > { nfserr_no_grace, -ENOGRACE},
> > { nfserr_io, -EBADMSG },
> > + { nfserr_eagain, -ENOTBLK },
> > };
> > int i;
> >
> > @@ -1099,13 +1100,18 @@ nfsd_direct_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > size_t len;
> >
> > init_sync_kiocb(&kiocb, nf->nf_file);
> > - kiocb.ki_flags |= IOCB_DIRECT;
> >
> > /* Read a properly-aligned region of bytes into rq_bvec */
> > dio_start = round_down(offset, nf->nf_dio_read_offset_align);
> > dio_end = round_up((u64)offset + *count, nf->nf_dio_read_offset_align);
> >
> > + /* Don't use expanded DIO READ for IO less than 32K */
> > + if ((*count < (32 << 10)) &&
> > + (((offset - dio_start) > 0) || ((dio_end - (offset + *count)) > 0)))
> > + return nfserrno(-ENOTBLK); /* fallback to buffered */
>
> Why not just return a specific nfserr code here? No need to go through
> nfserrno.
Could, I just tethered it to ENOTBLK given the history of such things
elsewhere for direct to buffered fallback. But yes, could just as
easily simply return nfserr_eagain (or some other better suggestion).
> > +
> > kiocb.ki_pos = dio_start;
> > + kiocb.ki_flags |= IOCB_DIRECT;
> >
> > v = 0;
> > total = dio_end - dio_start;
> > @@ -1184,10 +1190,13 @@ __be32 nfsd_iter_read(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > break;
> > case NFSD_IO_DIRECT:
> > /* When dio_read_offset_align is zero, dio is not supported */
> > - if (nf->nf_dio_read_offset_align && !rqstp->rq_res.page_len)
> > - return nfsd_direct_read(rqstp, fhp, nf, offset,
> > + if (nf->nf_dio_read_offset_align && !rqstp->rq_res.page_len) {
> > + __be32 nfserr = nfsd_direct_read(rqstp, fhp, nf, offset,
> > count, eof);
> > - fallthrough;
> > + if (nfserr != nfserr_eagain)
> > + return nfserr;
> > + }
> > + break; /* fallback to buffered */
> > case NFSD_IO_DONTCACHE:
> > if (file->f_op->fop_flags & FOP_DONTCACHE)
> > kiocb.ki_flags = IOCB_DONTCACHE;
> > @@ -1347,6 +1356,15 @@ nfsd_write_dio_iters_init(struct bio_vec *bvec, unsigned int nvecs,
> > ++args->nsegs;
> > }
> >
> > + /*
> > + * Don't use IOCB_DONTCACHE if misaligned DIO WRITE (args->nsegs > 1),
> > + * because it compromises unaligned segments' RMW IO being able to
> > + * benefit from buffered IO (especially important for streaming
> > + * misaligned DIO WRITE performance).
> > + */
> > + if (args->nsegs > 1 && (args->flags_buffered & IOCB_DONTCACHE))
> > + args->flags_buffered &= ~IOCB_DONTCACHE;
> > +
> > return;
> >
> > no_dio:
> > @@ -1400,7 +1418,7 @@ nfsd_direct_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >
> > /*
> > * IOCB_DONTCACHE preserves the intent of NFSD_IO_DIRECT when
> > - * writing unaligned segments or handling fallback I/O.
> > + * falling back to buffered IO if entire WRITE is unaligned.
> > */
> > args.flags_buffered = kiocb->ki_flags;
> > if (args.nf->nf_file->f_op->fop_flags & FOP_DONTCACHE)
>
>
> --
> Chuck Lever
next prev parent reply other threads:[~2025-11-04 17:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 16:42 [PATCH 0/3] NFSD: additional NFSD Direct changes Mike Snitzer
2025-11-04 16:42 ` [PATCH 1/3] nfsd: avoid using DONTCACHE for misaligned DIO's buffered IO fallback Mike Snitzer
2025-11-04 17:23 ` Chuck Lever
2025-11-04 17:35 ` Mike Snitzer [this message]
2025-11-04 19:33 ` Chuck Lever
2025-11-04 18:11 ` [PATCH v2 " Mike Snitzer
2025-11-05 6:19 ` [PATCH v3 1/3] NFSD: avoid DONTCACHE for misaligned ends of misaligned DIO WRITE Mike Snitzer
2025-11-05 14:58 ` Chuck Lever
2025-11-05 17:33 ` Mike Snitzer
2025-11-04 16:42 ` [PATCH 2/3] NFSD: add new NFSD_IO_DIRECT variants that may override stable_how Mike Snitzer
2025-11-04 16:42 ` [PATCH 3/3] NFSD: update Documentation/filesystems/nfs/nfsd-io-modes.rst Mike Snitzer
2025-11-04 17:25 ` Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aQo5VOxRpr-HORf3@kernel.org \
--to=snitzer@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=jlayton@kernel.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.