All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <cel@kernel.org>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>
Subject: Re: [PATCH v6 0/5] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
Date: Wed, 22 Oct 2025 17:56:55 -0400	[thread overview]
Message-ID: <aPlTJ6oxreVOihPc@kernel.org> (raw)
In-Reply-To: <20251022192208.1682-1-cel@kernel.org>

On Wed, Oct 22, 2025 at 03:22:03PM -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Following on https://lore.kernel.org/linux-nfs/aPAci7O_XK1ljaum@kernel.org/
> this series includes the patches needed to make NFSD Direct WRITE
> work.
> 
> The "large" piece of work left to do:
> > > > Wouldn't it make sense to track the alignment when building the bio_vec
> > > > array instead of doing another walk here touching all cache lines?
> > > 
> > > Yes, that is the conventional wisdom that justified Keith removing
> > > iov_iter_aligned.  But for NFSD's WRITE payload the bio_vec is built
> > > outside of the fs/nfsd/vfs.c code.  Could be there is a natural way to
> > > clean this up (to make the sunrpc layer to conditionally care about
> > > alignment) but I didn't confront that yet.
> > 
> > Well, for the block code it's also build outside the layer consuming it.
> > But Keith showed that you can easily communicate that information and
> > avoid extra loops touching the cache lines.

I had started to reply to the other thread relative to this point, but
I'll cherry pick it here:

On Wed, Oct 22, 2025 at 10:37:35AM -0400, Chuck Lever wrote:
> Again, is there a reason to push this suggestion out to a later merge
> window? It seems reasonable to tackle it now.

If you'd be OK with us tackling it as an incremental change attributed
to whoever takes the lead, then it can happen now or whenever someone
can get to it.

IMO it is worthy of incremental effort because it is genuinely
destabilizing line of development -- any -EINVAL that could result
from misaligned DIO being issued to the underlying filesystem can
cause serious problems with IO retry stalls.. at least in my
experience with pNFS flexfiles (as implemented by Hammerspace).
And that is why I was logging -EINVAL is important; now removed in
this v6. So you've removed checking that'd save us from bugs growing
to be way more painful _before_ optimization work that might expose us
to -EINVAL. To be clear: I am perfectly fine with removing that
seemingly heavy -EINVAL handling from nfsd_issue_write_dio like you
did _because_ nfsd_iov_iter_aligned_bvec() does a comprehensive job of
avoiding the possibility of misaligned DIO being issued to underlying
filesystem.

As cliche as "perfection is the enemy of good" is, it does genuinely
govern how I engineer solutions.

That doesn't mean I'm not seeking perfection, but for me: perfection
is accomplished iteratively by honing good to great to perfect.

Bounding each step towards perfection helps reach it.

Apologies if all that caused your eyes to roll into the back of your
head.  I have this additional optimization work on my TODO, I just
haven't been able to circle back to it.

> (Perhaps to get us started, Christoph, do you know of specific code that
> shows how this code could be reorganized?)

It is at the time of the write payload's bvec creation where alignment
would need to be assessed.

Keith's changes in these commits provide context (albeit opaque):

fec2e705729d block: check for valid bio while splitting
743bf2e0c49c block: add size alignment to bio_iov_iter_get_pages
20a0e6276edb block: align the bio after building it
5ff3f74e145a block: simplify direct io validity check
7eac33186957 iomap: simplify direct io validity check
9eab1d4e0d15 block: remove bdev_iter_is_aligned
69d7ed5b9ef6 blk-integrity: use simpler alignment check
b475272f03ca iov_iter: remove iov_iter_is_aligned

If you look at the entirety of those commits' changes with:
git diff fec2e705729d^..b475272f03ca

You can see that Keith removed calls to iov_iter_is_aligned and
bdev_iter_is_aligned by doing checking earlier at the time of creation
of "the thing" (for our case that'd be the bio_vec entries sunrpc adds
to the iov_iter that gets passed as write payload to vfs.c?).

Mike

> Changes since v5:
> * Add a patch to make FILE_SYNC WRITEs persist timestamps
> * Address some of Christoph's review comments
> 
> Changes since v4:
> * Split out refactoring nfsd_buffered_write() into a separate patch
> * Expand patch description of 1/4
> * Don't set IOCB_SYNC flag
> 
> Changes since v3:
> * Address checkpatch.pl nits in 2/3
> * Add an untested patch to mark ingress RDMA Read chunks
> 
> Chuck Lever (3):
>   NFSD: Make FILE_SYNC WRITEs comply with spec
>   NFSD: Enable return of an updated stable_how to NFS clients
>   svcrdma: Mark Read chunks
> 
> Mike Snitzer (2):
>   NFSD: Refactor nfsd_vfs_write()
>   NFSD: Implement NFSD_IO_DIRECT for NFS WRITE
> 
>  fs/nfsd/debugfs.c                       |   1 +
>  fs/nfsd/nfs3proc.c                      |   2 +-
>  fs/nfsd/nfs4proc.c                      |   2 +-
>  fs/nfsd/nfsproc.c                       |   3 +-
>  fs/nfsd/trace.h                         |   1 +
>  fs/nfsd/vfs.c                           | 219 ++++++++++++++++++++++--
>  fs/nfsd/vfs.h                           |   6 +-
>  fs/nfsd/xdr3.h                          |   2 +-
>  net/sunrpc/xprtrdma/svc_rdma_recvfrom.c |   5 +
>  9 files changed, 224 insertions(+), 17 deletions(-)
> 
> -- 
> 2.51.0
> 
> 

      parent reply	other threads:[~2025-10-22 21:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-22 19:22 [PATCH v6 0/5] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-10-22 19:22 ` [PATCH v6 1/5] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever
2025-10-22 19:22 ` [PATCH v6 2/5] NFSD: Enable return of an updated stable_how to NFS clients Chuck Lever
2025-10-22 19:22 ` [PATCH v6 3/5] NFSD: Refactor nfsd_vfs_write() Chuck Lever
2025-10-22 19:22 ` [PATCH v6 4/5] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-10-22 19:27   ` Chuck Lever
2025-10-22 21:09     ` Mike Snitzer
2025-10-22 20:58   ` Mike Snitzer
2025-10-22 21:09     ` Chuck Lever
2025-10-23 16:59   ` kernel test robot
2025-10-23 19:37   ` Chuck Lever
2025-10-23 21:23     ` Mike Snitzer
2025-10-24 14:24       ` Chuck Lever
2025-10-24 18:30         ` Mike Snitzer
2025-10-22 19:22 ` [PATCH v6 5/5] svcrdma: Mark Read chunks Chuck Lever
2025-10-22 21:25   ` Mike Snitzer
2025-10-23 13:00     ` Chuck Lever
2025-10-22 21:56 ` Mike Snitzer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPlTJ6oxreVOihPc@kernel.org \
    --to=snitzer@kernel.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.