From: Jeff Layton <jlayton@kernel.org>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Darrick J . Wong" <darrick.wong@oracle.com>,
Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH] xfs: fix i_version handling in xfs
Date: Tue, 16 Aug 2022 11:58:06 -0400 [thread overview]
Message-ID: <e77fd4d19815fd661dbdb04ab27e687ff7e727eb.camel@kernel.org> (raw)
In-Reply-To: <Yvu7DHDWl4g1KsI5@magnolia>
On Tue, 2022-08-16 at 08:43 -0700, Darrick J. Wong wrote:
> On Tue, Aug 16, 2022 at 09:17:36AM -0400, Jeff Layton wrote:
> > The i_version in xfs_trans_log_inode is bumped for any inode update,
> > including atime-only updates due to reads. We don't want to record those
> > in the i_version, as they don't represent "real" changes. Remove that
> > callsite.
> >
> > In xfs_vn_update_time, if S_VERSION is flagged, then attempt to bump the
> > i_version and turn on XFS_ILOG_CORE if it happens. In
> > xfs_trans_ichgtime, update the i_version if the mtime or ctime are being
> > updated.
>
> What about operations that don't touch the mtime but change the file
> metadata anyway? There are a few of those, like the blockgc garbage
> collector, deduperange, and the defrag tool.
>
Do those change the c/mtime at all?
It's possible we're missing some places that should change the i_version
as well. We may need some more call sites.
> Zooming out a bit -- what does i_version signal, concretely? I thought
> it was used by nfs (and maybe ceph?) to signal to clients that the file
> on the server has moved on, and the client needs to invalidate its
> caches. I thought afs had a similar generation counter, though it's
> only used to cache file data, not metadata? Does an i_version change
> cause all of them to invalidate caches, or is there more behavior I
> don't know about?
>
For NFS, it indicates a change to the change attribute indicates that
there has been a change to the data or metadata for the file. atime
changes due to reads are specifically exempted from this, but we do bump
the i_version if someone (e.g.) changes the atime via utimes().
The NFS client will generally invalidate its caches for the inode when
it notices a change attribute change.
FWIW, AFS may not meet this standard since it doesn't generally
increment the counter on metadata changes. It may turn out that we don't
want to expose this to the AFS client due to that (or maybe come up with
some way to indicate this difference).
> Does that mean that we should bump i_version for any file data or
> attribute that could be queried or observed by userspace? In which case
> I suppose this change is still correct, even if it relaxes i_version
> updates from "any change to the inode whatsoever" to "any change that
> would bump mtime". Unless FIEMAP is part of "attributes observed by
> userspace".
>
> (The other downside I can see is that now we have to remember to bump
> timestamps for every new file operation we add, unlike the current code
> which is centrally located in xfs_trans_log_inode.)
>
The main reason for the change attribute in NFS was that NFSv3 is
plagued with cache-coherency problems due to coarse-grained timestamp
granularity. It was conceived as a way to indicate that the inode had
changed without relying on timestamps.
In practice, we want to bump the i_version counter whenever the ctime or
mtime would be changed.
> --D
>
> > Cc: Darrick J. Wong <darrick.wong@oracle.com>
> > Cc: Dave Chinner <david@fromorbit.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/xfs/libxfs/xfs_trans_inode.c | 17 +++--------------
> > fs/xfs/xfs_iops.c | 4 ++++
> > 2 files changed, 7 insertions(+), 14 deletions(-)
> >
> > diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> > index 8b5547073379..78bf7f491462 100644
> > --- a/fs/xfs/libxfs/xfs_trans_inode.c
> > +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> > @@ -71,6 +71,8 @@ xfs_trans_ichgtime(
> > inode->i_ctime = tv;
> > if (flags & XFS_ICHGTIME_CREATE)
> > ip->i_crtime = tv;
> > + if (flags & (XFS_ICHGTIME_MOD|XFS_ICHGTIME_CHG))
> > + inode_inc_iversion(inode);
> > }
> >
> > /*
> > @@ -116,20 +118,7 @@ xfs_trans_log_inode(
> > spin_unlock(&inode->i_lock);
> > }
> >
> > - /*
> > - * First time we log the inode in a transaction, bump the inode change
> > - * counter if it is configured for this to occur. While we have the
> > - * inode locked exclusively for metadata modification, we can usually
> > - * avoid setting XFS_ILOG_CORE if no one has queried the value since
> > - * the last time it was incremented. If we have XFS_ILOG_CORE already
> > - * set however, then go ahead and bump the i_version counter
> > - * unconditionally.
> > - */
> > - if (!test_and_set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags)) {
> > - if (IS_I_VERSION(inode) &&
> > - inode_maybe_inc_iversion(inode, flags & XFS_ILOG_CORE))
> > - iversion_flags = XFS_ILOG_CORE;
> > - }
> > + set_bit(XFS_LI_DIRTY, &iip->ili_item.li_flags);
> >
> > /*
> > * If we're updating the inode core or the timestamps and it's possible
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index 45518b8c613c..162e044c7f56 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -718,6 +718,7 @@ xfs_setattr_nonsize(
> > }
> >
> > setattr_copy(mnt_userns, inode, iattr);
> > + inode_inc_iversion(inode);
> > xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> >
> > XFS_STATS_INC(mp, xs_ig_attrchg);
> > @@ -943,6 +944,7 @@ xfs_setattr_size(
> >
> > ASSERT(!(iattr->ia_valid & (ATTR_UID | ATTR_GID)));
> > setattr_copy(mnt_userns, inode, iattr);
> > + inode_inc_iversion(inode);
> > xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
> >
> > XFS_STATS_INC(mp, xs_ig_attrchg);
> > @@ -1047,6 +1049,8 @@ xfs_vn_update_time(
> > inode->i_mtime = *now;
> > if (flags & S_ATIME)
> > inode->i_atime = *now;
> > + if ((flags & S_VERSION) && inode_maybe_inc_iversion(inode, false))
> > + log_flags |= XFS_ILOG_CORE;
> >
> > xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
> > xfs_trans_log_inode(tp, ip, log_flags);
> > --
> > 2.37.2
> >
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2022-08-16 16:00 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-16 13:17 [PATCH] xfs: fix i_version handling in xfs Jeff Layton
2022-08-16 15:43 ` Darrick J. Wong
2022-08-16 15:58 ` Jeff Layton [this message]
2022-08-16 22:42 ` Dave Chinner
2022-08-16 23:57 ` Dave Chinner
2022-08-17 12:02 ` Jeff Layton
2022-08-18 1:07 ` Dave Chinner
2022-08-18 11:12 ` Jeff Layton
2022-08-18 0:34 ` NeilBrown
2022-08-18 1:32 ` Dave Chinner
2022-08-18 1:52 ` NeilBrown
2022-08-18 2:22 ` Trond Myklebust
2022-08-18 3:00 ` Dave Chinner
2022-08-19 0:35 ` NeilBrown
2022-08-18 11:00 ` Jeff Layton
2022-08-18 23:43 ` NeilBrown
2022-08-18 1:11 ` Trond Myklebust
2022-08-18 3:37 ` Dave Chinner
2022-08-18 4:15 ` Trond Myklebust
2022-08-18 11:03 ` Jeff Layton
2022-08-23 0:05 ` Dave Chinner
2022-08-23 1:33 ` Trond Myklebust
2022-08-16 17:14 ` David Wysochanski
2022-08-16 23:37 ` Dave Chinner
2022-08-17 12:10 ` Jeff Layton
2022-08-17 21:57 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e77fd4d19815fd661dbdb04ab27e687ff7e727eb.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).