Re: Increase in XFS journal flushes with (direct_write;fdatasync)+

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <dgc@kernel.org>
To: "Pankaj Raghav (Samsung)" <pankaj.raghav@linux.dev>
Cc: Andres Freund <andres@anarazel.de>,
	linux-xfs@vger.kernel.org, Carlos Maiolino <cem@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	Christian Brauner <brauner@kernel.org>,
	gost.dev@samsung.com, p.raghav@samsung.com
Subject: Re: Increase in XFS journal flushes with (direct_write;fdatasync)+
Date: Tue, 12 May 2026 15:25:41 +1000	[thread overview]
Message-ID: <agK51XboX2fc2ll7@dread> (raw)
In-Reply-To: <paibiwpjxhdmfrjq3tn7gbdvgtarm4yvtid7qu5zchhyptdlhx@w7lwvjhevsj6>

On Thu, May 07, 2026 at 10:34:43PM +0200, Pankaj Raghav (Samsung) wrote:
> On Wed, May 06, 2026 at 09:26:25AM -0400, Andres Freund wrote:
> > Hi,
> > 
> > While looking at performance issues on Samsung client drives due to slow FUA,
> > I tried to reproduce older numbers on a recent kernel.  And couldn't, at first
> > - but not because the problem went away, but because the fdatasync numbers
> > (which shouldn't use FUA) got *much* worse.
> > 
> > These drives have FUA writes that are slower than full flushes, making
> > O_DIRECT|O_DSYNC writes perform poorly and fdatasync() comparatively better.
> > 
> > 
> > What I'm seeing is that with recent kernels the fdatasync() performance is
> > roughly as bad as the O_DSYNC, whereas previously it was > 2x as
> > fasts. blktrace showed that there are ongoing FUA writes during a workload
> > with just overwriting writes and an fdatasync after every write.
> > 
> > 
> > At first I thought it was a regression between 7.0..7.1-rc2, but that turned
> > out to be only because the 7.0 machine did not have lazytime enabled. After
> > fixing that discrepancy, the regression is also visible in 7.0.  I have
> > confirmed it's not visible in 6.18.
> 
> I was able to reproduce this issue. The commit causing the issue is
> indeed from nonblocking timestamps series as you indicated 
> (fs: add support for non-blocking timestamp updates).
> 
> In inode_update_cmtime, we have the following changes as a part of the
> series:
> ...
> mtime_changed = !timespec64_equal(&now, &mtime);
> 	if (mtime_changed || !timespec64_equal(&now, &ctime))
> 		dirty = inode_time_dirty_flag(inode);  // #1
> 
> 	/*
> 	 * Pure timestamp updates can be recorded in the inode without blocking
> 	 * by not dirtying the inode.  But when the file system requires
> 	 * i_version updates, the update of i_version can still block.
> 	 * Error out if we'd actually have to update i_version or don't support
> 	 * lazytime.
> 	 */
> 	if (IS_I_VERSION(inode)) {
> 		if (flags & IOCB_NOWAIT) {
> 			if (!(inode->i_sb->s_flags & SB_LAZYTIME) ||
> 			    inode_iversion_need_inc(inode))
> 				return -EAGAIN;
> 		} else {
> 			if (inode_maybe_inc_iversion(inode, !!dirty)) //#2
> 				dirty |= I_DIRTY_SYNC;
> 		}
> 	}
> 
> ...

Ugh. Given we don't support i_version as an externally visible
change_cookie any more (we have multigrained timestamps for that
now), why do we still set SB_I_VERSION and jump
through all these complex hoops to set maintain something we don't
actually need?

i.e. going back to ip->di_version++ whenever the inode is logged to
maintain the on-disk change version would make things so much
simpler here...

Cheers,

Dave.
-- 
Dave Chinner
dgc@kernel.org

     prev parent reply	other threads:[~2026-05-12  5:25 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <IS8F8EYS5pW4UU5a3jxOTy-f18EgkDa_2zAUswRgTm6NVtvmajAaQyu9CDxkTelDnfXfCl7L_692C77zRAxwFQ==@protonmail.internalid>
2026-05-06 13:26 ` Increase in XFS journal flushes with (direct_write;fdatasync)+ Andres Freund
2026-05-06 15:05   ` Carlos Maiolino
2026-05-07 20:34   ` Pankaj Raghav (Samsung)
2026-05-08  8:10     ` Christoph Hellwig
2026-05-08  8:29       ` Pankaj Raghav
2026-05-08  8:43         ` Christoph Hellwig
2026-05-08 11:42           ` Jeff Layton
2026-05-08 11:47             ` Pankaj Raghav
2026-05-11  8:56               ` Christoph Hellwig
2026-05-11 10:31                 ` Pankaj Raghav
2026-05-12  5:25     ` Dave Chinner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agK51XboX2fc2ll7@dread \
    --to=dgc@kernel.org \
    --cc=andres@anarazel.de \
    --cc=brauner@kernel.org \
    --cc=cem@kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=pankaj.raghav@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.