All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Snitzer <snitzer@kernel.org>
To: Chuck Lever <cel@kernel.org>
Cc: NeilBrown <neil@brown.name>, Jeff Layton <jlayton@kernel.org>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <dai.ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
	linux-nfs@vger.kernel.org, Chuck Lever <chuck.lever@oracle.com>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v7 14/14] NFSD: Initialize separate ki_flags
Date: Fri, 24 Oct 2025 19:56:59 -0400	[thread overview]
Message-ID: <aPwSS9NlfqPFqfn2@kernel.org> (raw)
In-Reply-To: <5017c8dc-9c14-4a92-a259-6e4cdc67d250@kernel.org>

On Fri, Oct 24, 2025 at 05:16:52PM -0400, Chuck Lever wrote:
> On 10/24/25 4:37 PM, Mike Snitzer wrote:
> > On Fri, Oct 24, 2025 at 03:34:00PM -0400, Chuck Lever wrote:
> >> On 10/24/25 2:13 PM, Mike Snitzer wrote:
> >>>>  	/*
> >>>> -	 * IOCB_SYNC + IOCB_DIRECT requests that iter_write should persist
> >>>> -	 * both written data and dirty time stamps.
> >>>> -	 *
> >>>> -	 * When falling back to buffered I/O or handling the unaligned
> >>>> -	 * first and last segments, the data and time stamps must be
> >>>> -	 * durable before nfsd_vfs_write() returns to its caller, matching
> >>>> -	 * the behavior of direct I/O.
> >>>> +	 * IOCB_SYNC + IOCB_DIRECT requests that iter_write should
> >>>> +	 * persist both written data and dirty time stamps.
> >>>>  	 */
> >>>> -	kiocb->ki_flags |= IOCB_SYNC | IOCB_DSYNC;
> >>>> +	args.flags_direct = kiocb->ki_flags | IOCB_SYNC | IOCB_DIRECT;
> >>> AFAIK we still need: IOCB_DIRECT | IOCB_DSYNC | IOCB_SYNC
> >>>
> >>> IOCB_DIRECT | IOCB_DSYNC was recently put under a microscope relative
> >>> to XFS performance and the resulting improvement was merged for 6.18
> >>> with commit c91d38b57f ("xfs: rework datasync tracking and execution")
> >>
> >> This looks like an xfs-specific fix. I'm reluctant to apply a fix for
> >> a specific file system implementation in what's supposed to be generic
> >> code.
> >>
> >> If (IOCB_DIRECT | IOCB_DSYNC | IOCB_SYNC) /is/ correct for all file
> >> systems, then it needs an explanatory code comment, which I'm not yet
> >> qualified to write. I don't see any textual material in previous
> >> incarnations of this code that might help get me started.
> > 
> > The XFS specific performance improvement isn't the point.  The point
> > is that applications (like I think DB2 is what started all this with
> > Jan Kara and the XFS filesystem) results in the use of
> > O_DIRECT+O_DSYNC.
> > 
> > It is a clear reality that other filesystems are catering to
> > O_DIRECT+O_DSYNC. And given our findings with Christoph that buffered
> > IO needs O_DSYNC+O_SYNC, I'd rather we not expose ourselves to not
> > having O_DSYNC set.
> > 
> > Particularly because any filesystem NFSD is writing to _can_ also
> > fallback to using buffered IO if O_DIRECT set (NFSD is doing exactly
> > that). Which we _know_ (from Christoph) that having O_DSYNC set is
> > important when we fallback to using buffered IO (like we do for the
> > misaligned head and/or tail).
> > 
> > Please let's not make the same mistake twice.
> 
> To be clear, I'm not refusing to add IOCB_DSYNC with IOCB_DIRECT, I'm
> just confused about why it is necessary.
> 
> Direct and buffered I/O in the direct write path now each have their own
> set of ki_flags. The ki_flags used for buffered writes has SYNC and
> DSYNC set. So, for fallback I/O and for the unaligned segments of the
> buffer, both flags are set. I think we are in agreement on that.
> 
> You might be referring above to this email from Christoph:
> 
> > > I think IOCB_SYNC would be needed with O_DIRECT to force timestamp
> > > updates. Otherwise, IOCB_SYNC is relevant only when the function is
> > > forced to fall back to some form of write through the page cache.
> >
> > Well, IOCB_SYNC is only needed to commit timestamps.  O_DSYNC is
> > always required if you want to commit to stable storage.  As said
> > above I don't really understand from the patch why we want to do
> > that, but IFF we want to do that, we need IOCB_DSYNC bother for
> > direct and buffered I/O.
> 
> He says "we need IOCB_DSYNC both... for direct and buffered I/O". Fair
> enough, but why does IOCB_DIRECT, which is essentially a synchronous
> write already, need to explicitly set IOCB_DSYNC? All I want is
> something I can distill into a code comment. "Force a FUA after each
> direct write" or something like that.

Christoph said here:
https://lore.kernel.org/linux-nfs/aPnChLocfNsu_UN7@infradead.org/

"Well, IOCB_SYNC is only needed to commit timestamps.  O_DSYNC is
always required if you want to commit to stable storage."

^ This should be all you need to say?

Why/how that is also the semantics of O_DIRECT is lost on me at the
moment...

Though this message from Christoph gets into why/how things got less
than ideal due to Linux's historic handling of these flags:
https://lore.kernel.org/linux-nfs/aPnBIGeFjrZLbxBG@infradead.org/

Christoph admits "this is all a bit odd".

> I'm really surprised that IOCB_DIRECT does not imply IOCB_DSYNC, and
> there doesn't seem to be any clear documentation about the semantics
> of these flags. Thus I believe a code comment here is warranted.

I'm with you, it is all "clear as mud" and unintuitive for me too.
But I'm just playing the cards dealt.

Really, even ignoring all the quirkiness of this: that O_DIRECT can
fallback to buffered, and we need IOCB_DSYNC|IOCB_SYNC for our use of
buffered IO when NFSD_IO_DIRECT configured to ensure data has hit
stable storage -- that's enough justification.  Bit circular but
compelling to prove the need.. albeit wordy and a lot to unpack.

Mike

  reply	other threads:[~2025-10-24 23:57 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-24 14:42 [PATCH v7 00/14] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-10-24 14:42 ` [PATCH v7 01/14] NFSD: Make FILE_SYNC WRITEs comply with spec Chuck Lever
2025-10-24 15:21   ` Jeff Layton
2025-10-27  8:02   ` Christoph Hellwig
2025-10-24 14:42 ` [PATCH v7 02/14] NFSD: Enable return of an updated stable_how to NFS clients Chuck Lever
2025-10-27  8:03   ` Christoph Hellwig
2025-10-24 14:42 ` [PATCH v7 03/14] NFSD: Refactor nfsd_vfs_write() Chuck Lever
2025-10-27  8:04   ` Christoph Hellwig
2025-10-24 14:42 ` [PATCH v7 04/14] NFSD: Implement NFSD_IO_DIRECT for NFS WRITE Chuck Lever
2025-10-24 17:12   ` Mike Snitzer
2025-10-24 17:24     ` Chuck Lever
2025-10-26  0:03   ` kernel test robot
2025-10-26  1:16   ` kernel test robot
2025-10-24 14:42 ` [PATCH v7 05/14] NFSD: @stable for direct writes is always NFS_FILE_SYNC Chuck Lever
2025-10-24 15:22   ` Jeff Layton
2025-10-24 15:23     ` Chuck Lever
2025-10-27  8:05   ` Christoph Hellwig
2025-10-27 13:23     ` Chuck Lever
2025-10-27 13:27       ` Christoph Hellwig
2025-10-27 14:31         ` Mike Snitzer
2025-10-27 14:36           ` Christoph Hellwig
2025-10-27 14:58             ` Mike Snitzer
2025-10-27 15:04               ` Chuck Lever
2025-10-27 15:19                 ` Mike Snitzer
2025-10-27 15:05               ` Christoph Hellwig
2025-10-24 14:42 ` [PATCH v7 06/14] NFSD: Always set IOCB_SYNC in direct write path Chuck Lever
2025-10-24 15:22   ` Jeff Layton
2025-10-27  8:08   ` Christoph Hellwig
2025-10-27 10:38     ` Jeff Layton
2025-10-27 10:40       ` Christoph Hellwig
2025-10-24 14:42 ` [PATCH v7 07/14] NFSD: Remove specific error handling Chuck Lever
2025-10-24 15:22   ` Jeff Layton
2025-10-24 14:43 ` [PATCH v7 08/14] NFSD: Remove alignment size checking Chuck Lever
2025-10-24 15:22   ` Jeff Layton
2025-10-27  8:09   ` Christoph Hellwig
2025-10-27 13:25     ` Chuck Lever
2025-10-27 13:30       ` Christoph Hellwig
2025-10-24 14:43 ` [PATCH v7 09/14] NFSD: Remove the len_mask check Chuck Lever
2025-10-24 15:23   ` Jeff Layton
2025-10-24 17:16   ` Mike Snitzer
2025-10-24 17:22     ` Chuck Lever
2025-10-24 14:43 ` [PATCH v7 10/14] NFSD: Clean up synopsis of nfsd_iov_iter_aligned_bvec() Chuck Lever
2025-10-24 15:24   ` Jeff Layton
2025-10-24 14:43 ` [PATCH v7 11/14] NFSD: Clean up struct nfsd_write_dio Chuck Lever
2025-10-24 15:26   ` Jeff Layton
2025-10-24 17:20   ` Mike Snitzer
2025-10-24 14:43 ` [PATCH v7 12/14] NFSD: Introduce struct nfsd_write_dio_seg Chuck Lever
2025-10-24 15:30   ` Jeff Layton
2025-10-24 15:37     ` Chuck Lever
2025-10-24 17:57   ` Mike Snitzer
2025-10-24 14:43 ` [PATCH v7 13/14] NFSD: Clean up direct write fall back error flow Chuck Lever
2025-10-24 15:32   ` Jeff Layton
2025-10-24 18:01   ` Mike Snitzer
2025-10-24 14:43 ` [PATCH v7 14/14] NFSD: Initialize separate ki_flags Chuck Lever
2025-10-24 15:34   ` Jeff Layton
2025-10-24 18:13   ` Mike Snitzer
2025-10-24 19:34     ` Chuck Lever
2025-10-24 20:37       ` Mike Snitzer
2025-10-24 21:16         ` Chuck Lever
2025-10-24 23:56           ` Mike Snitzer [this message]
2025-10-27  8:15             ` Christoph Hellwig
2025-10-27 10:50               ` Jeff Layton
2025-10-27 10:55                 ` Christoph Hellwig
2025-10-27 13:48                 ` Chuck Lever
2025-10-27 13:49                   ` Christoph Hellwig
2025-10-27 16:18                   ` Mike Snitzer
2025-10-27 16:59                     ` Mike Snitzer
2025-10-29  7:20                     ` Christoph Hellwig
2025-10-27 16:05                 ` Mike Snitzer
2025-10-27 17:57                   ` Chuck Lever
2025-10-28  3:26                     ` Mike Snitzer
2025-10-28 15:37                       ` Chuck Lever
2025-10-28 16:04                         ` Mike Snitzer
2025-10-28 18:48                           ` Chuck Lever
2025-10-28 23:56                             ` Mike Snitzer
2025-10-29 15:22                               ` Chuck Lever
2025-10-29 16:54                                 ` Mike Snitzer
2025-10-29  7:37                         ` Christoph Hellwig
2025-10-29  7:32                       ` Christoph Hellwig
2025-10-29  7:25                     ` Christoph Hellwig
2025-10-27  8:14         ` Christoph Hellwig
2025-10-27  8:12       ` Christoph Hellwig
2025-10-27 13:27         ` Chuck Lever
2025-10-27 13:30           ` Chuck Lever
2025-10-27 13:31             ` Christoph Hellwig
2025-10-27 14:11         ` Chuck Lever
2025-10-27 14:45           ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPwSS9NlfqPFqfn2@kernel.org \
    --to=snitzer@kernel.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=hch@lst.de \
    --cc=jlayton@kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.