From: Jeff Layton <jlayton@kernel.org>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
Chuck Lever <chuck.lever@oracle.com>, Neil Brown <neilb@suse.de>,
Olga Kornievskaia <kolga@netapp.com>,
Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>,
Chandan Babu R <chandan.babu@oracle.com>,
"Darrick J. Wong" <djwong@kernel.org>,
Dave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,
Linus Torvalds <torvalds@linux-foundation.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [PATCH v8 1/5] fs: add infrastructure for multigrain timestamps
Date: Fri, 22 Sep 2023 14:22:52 -0400 [thread overview]
Message-ID: <f4c7e8e58db56741ae38bef6909852b52cd3df5b.camel@kernel.org> (raw)
In-Reply-To: <20230922173136.qpodogsb26wq3ujj@moria.home.lan>
On Fri, 2023-09-22 at 13:31 -0400, Kent Overstreet wrote:
> On Fri, Sep 22, 2023 at 01:14:40PM -0400, Jeff Layton wrote:
> > The VFS always uses coarse-grained timestamps when updating the ctime
> > and mtime after a change. This has the benefit of allowing filesystems
> > to optimize away a lot metadata updates, down to around 1 per jiffy,
> > even when a file is under heavy writes.
> >
> > Unfortunately, this has always been an issue when we're exporting via
> > NFS, which traditionally relied on timestamps to validate caches. A lot
> > of changes can happen in a jiffy, and that can lead to cache-coherency
> > issues between hosts.
> >
> > NFSv4 added a dedicated change attribute that must change value after
> > any change to an inode. Some filesystems (btrfs, ext4 and tmpfs) utilize
> > the i_version field for this, but the NFSv4 spec allows a server to
> > generate this value from the inode's ctime.
> >
> > What we need is a way to only use fine-grained timestamps when they are
> > being actively queried.
> >
> > POSIX generally mandates that when the the mtime changes, the ctime must
> > also change. The kernel always stores normalized ctime values, so only
> > the first 30 bits of the tv_nsec field are ever used.
> >
> > Use the 31st bit of the ctime tv_nsec field to indicate that something
> > has queried the inode for the mtime or ctime. When this flag is set,
> > on the next mtime or ctime update, the kernel will fetch a fine-grained
> > timestamp instead of the usual coarse-grained one.
> >
> > Filesytems can opt into this behavior by setting the FS_MGTIME flag in
> > the fstype. Filesystems that don't set this flag will continue to use
> > coarse-grained timestamps.
>
> Interesting...
>
> So in bcachefs, for most inode fields the btree inode is the "master
> copy"; we do inode updates via btree transactions, and then on
> successful transaction commit we update the VFS inode to match.
>
> (exceptions: i_size, i_blocks)
>
> I'd been contemplating switching to that model for timestamp updates as
> well, since that would allow us to get rid of our
> super_operations.write_inode method - except we probably wouldn't want
> to do that since it would likely make timestamp updates too expensive.
>
> And now with your scheme of stashing extra state in timespec, I'm glad
> we didn't.
>
> Still, timestamp updates are a bit messier than I'd like, would be
> lovely to figure out a way to clean that up - right now we have an
> awkward mix of "sometimes timestamp updates happen in a btree
> transaction first, other times just the VFS inode is updated and marked
> dirty".
>
> xfs doesn't have .write_inode, so it's probably time to study what it
> does...
A few months ago, we talked briefly and I asked about an i_version
counter for bcachefs. You were going to look into it, and I wasn't sure
if you had implemented one. If you haven't, then this may be a simpler
alternative.
For now, these aren't much good for anything other than faking up a
change attribute for NFSv4, but they should be fine for that and you
wouldn't need to grow your on-disk inode to accommodate them.
Cheers,
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-09-22 18:22 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-22 17:14 [PATCH v8 0/5] fs: multigrain timestamps for XFS's change_cookie Jeff Layton
2023-09-22 17:14 ` [PATCH v8 1/5] fs: add infrastructure for multigrain timestamps Jeff Layton
2023-09-22 17:31 ` Kent Overstreet
2023-09-22 18:22 ` Jeff Layton [this message]
2023-09-22 17:14 ` [PATCH v8 2/5] fs: optimize away some fine-grained timestamp updates Jeff Layton
2023-09-22 17:14 ` [PATCH v8 3/5] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2023-09-22 17:14 ` [PATCH v8 4/5] fs: add timestamp_truncate_to_gran helper Jeff Layton
2023-09-22 17:14 ` [PATCH v8 5/5] xfs: switch to multigrain timestamps Jeff Layton
2023-09-23 7:15 ` [PATCH v8 0/5] fs: multigrain timestamps for XFS's change_cookie Amir Goldstein
2023-09-23 10:22 ` Jeff Layton
2023-09-23 14:58 ` Amir Goldstein
2023-09-25 10:08 ` Jeff Layton
2023-09-23 10:46 ` Jeff Layton
2023-09-23 14:52 ` Amir Goldstein
2023-09-24 22:18 ` Dave Chinner
2023-09-25 10:14 ` Jeff Layton
2023-09-25 22:32 ` Dave Chinner
2023-09-26 11:31 ` Jeff Layton
2023-09-26 23:33 ` Dave Chinner
2023-09-27 10:26 ` Jeff Layton
2023-09-23 20:43 ` Amir Goldstein
2023-09-24 11:31 ` Christian Brauner
2023-09-24 22:44 ` NeilBrown
2023-09-25 10:17 ` Jeff Layton
2023-09-26 12:10 ` Christian Brauner
2023-09-26 12:18 ` Christian Brauner
2023-09-26 12:51 ` Jeff Layton
2023-09-26 14:29 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f4c7e8e58db56741ae38bef6909852b52cd3df5b.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=Dai.Ngo@oracle.com \
--cc=brauner@kernel.org \
--cc=chandan.babu@oracle.com \
--cc=chuck.lever@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=jack@suse.cz \
--cc=kent.overstreet@linux.dev \
--cc=kolga@netapp.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=tom@talpey.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).