linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, Jeff Layton <jlayton@redhat.com>
Subject: Re: [PATCH] xfs: rewrite the fdatasync optimization
Date: Fri, 9 Feb 2018 10:17:25 +1100	[thread overview]
Message-ID: <20180208231725.GH20266@dastard> (raw)
In-Reply-To: <20180206072356.GB1649@lst.de>

On Tue, Feb 06, 2018 at 08:23:56AM +0100, Christoph Hellwig wrote:
> On Tue, Feb 06, 2018 at 09:17:26AM +1100, Dave Chinner wrote:
> > > Currently we need to the ilock over the log force in xfs_fsync so that we
> > > can protect ili_fsync_fields against incorrect manipulation.
> > > 
> > > But if instead we add new XFS_ILOG_VERSION pseudo log area similar to the
> > > timestamp one we can use that to just record the last dirty / fdatasync
> > > dirty lsn as long as the inode is pinned, and clear it when unpinning to
> > > avoid holding the ilock over I/O.
> > 
> > I thought that NFS requires the iversion changes to be made stable
> > at the same time as the data changes they correspond to is made
> > stable? i.e. NFS requires us to stabilise the on disk iversion field
> > during fdatasync.
> 
> Yes,

So you're agreeing with me that if iversion is changed, then
fdatasync needs to force the log.

> and this patch doesn't change that behavior.

I think it does.

> The reason why the
> new XFS_ILOG_VERSION flag is needed is to be able to check if
> an fdatasync needs to flush out an inode log item.

If we don't change iversion because it's not been queried, then in
the current code then XFS_ILOG_CORE will not be set and so the inode
remains only timestamp dirty and does not get flushed by fdatasync.
If we change iversion, then XFS_ILOG_CORE gets set, and the next
fdatasync will correctly flush the inode log item.

That's exactly the behaviour we need to meet the above NFS
requirements - the iversion change is associated with the new data
being written, not the timestamp change we are making as a result of
the data change.

If we look at your code now, when we change iversion we'll set
XFS_ILOG_VERSION and that will now behave identically to the
timestamp code w.r.t. fdatasync. i.e. we will *never* flush pure
iversion/timestamp changes on fdatasync.

That's not the behaviour NFS requires - we'll stabilise data but we
will not stabilise the iversion change that goes along with that
data change. Hence if we crash after fdatasync, there'll be new data
on disk and an old iversion....

So, AFAICT, this patch does change behviour and it breaks the
iversion semantics that the NFS server requires for data
modification.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2018-02-08 23:17 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-05  7:39 [PATCH] xfs: rewrite the fdatasync optimization Christoph Hellwig
2018-02-05 22:17 ` Dave Chinner
2018-02-06  7:23   ` Christoph Hellwig
2018-02-08 23:17     ` Dave Chinner [this message]
2018-02-13 15:04       ` Christoph Hellwig
2018-02-13 23:26         ` Dave Chinner
2018-02-14 15:53           ` Christoph Hellwig
2018-02-14 22:43             ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180208231725.GH20266@dastard \
    --to=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jlayton@redhat.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).