All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Dave Chinner <david@fromorbit.com>,
	Linus Torvalds <torvalds@linux-foundation.org>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.de>, David Howells <dhowells@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing
Date: Tue, 31 Oct 2023 07:26:37 -0400	[thread overview]
Message-ID: <b4c04efdde3bc7d107d0bdc68e100a94942aca3c.camel@kernel.org> (raw)
In-Reply-To: <ZUBbj8XsA6uW8ZDK@dread.disaster.area>

On Tue, 2023-10-31 at 12:42 +1100, Dave Chinner wrote:
> On Mon, Oct 30, 2023 at 01:11:56PM -1000, Linus Torvalds wrote:
> > On Mon, 30 Oct 2023 at 12:37, Dave Chinner <david@fromorbit.com> wrote:
> > > 
> > > If XFS can ignore relatime or lazytime persistent updates for given
> > > situations, then *we don't need to make periodic on-disk updates of
> > > atime*. This makes the whole problem of "persistent atime update bumps
> > > i_version" go away because then we *aren't making persistent atime
> > > updates* except when some other persistent modification that bumps
> > > [cm]time occurs.
> > 
> > Well, I think this should be split into two independent questions:
> > 
> >  (a) are relatime or lazytime atime updates persistent if nothing else changes?
> 
> They only become persistent after 24 hours or, in the case of
> relatime, immediately persistent if mtime < atime (i.e. read after a
> modification). Those are the only times that the VFS triggers
> persistent writeback of atime, and it's the latter case (mtime <
> atime) that is the specific trigger that exposed the problem with
> atime bumping i_version in the first place.
> 
> >  (b) do atime updates _ever_ update i_version *regardless* of relatime
> > or lazytime?
> > 
> > and honestly, I think the best answer to (b) would be that "no,
> > i_version should simply not change for atime updates". And I think
> > that answer is what it is because no user of i_version seems to want
> > it.
> 
> As I keep repeating: Repeatedly stating that "atime should not bump
> i_version" does not address the questions I'm asking *at all*.
> 
> > Now, the reason it's a single question for you is that apparently for
> > XFS, the only thing that matters is "inode was written to disk" and
> > that "di_changecount" value is thus related to the persistence of
> > atime updates, but splitting di_changecount out to be a separate thing
> > from i_version seems to be on the table, so I think those two things
> > really could be independent issues.
> 
> Wrong way around - we'd have to split i_version out from
> di_changecount. It's i_version that has changed semantics, not
> di_changecount, and di_changecount behaviour must remain unchanged.
> 

I have to take issue with your characterization of this. The
requirements for NFS's change counter have not changed. Clearly there
was a breakdown in communications when it was first implemented in Linux
that caused atime updates to get counted in the i_version value, but
that was never intentional and never by design.

I'm simply trying to correct this historical mistake.

> What I really don't want to do is implement a new i_version field in
> the XFS on-disk format. What this redefinition of i_version
> semantics has made clear is that i_version is *user defined
> metadata*, not internal filesystem metadata that is defined by the
> filesystem on-disk format.
> 
> User defined persistent metadata *belongs in xattrs*, not in the
> core filesystem on-disk formats. If the VFS wants to define and
> manage i_version behaviour, smeantics and persistence independently
> of the filesystems that manage the persistent storage (as it clearly
> does!) then we should treat it just like any other VFS defined inode
> metadata (e.g. per inode objects like security constraints, ACLs,
> fsverity digests, fscrypt keys, etc). i.e. it should be in a named
> xattr, not directly implemented in the filesystem on-disk format
> deinfitions.
> 
> Then the application can change the meaning of the metadata whenever
> and however it likes. Then filesystem developers just don't need
> to care about it at all because the VFS specific persistent metadata
> is not part of the on-disk format we need to maintain
> cross-platform forwards and backwards compatibility for.
> 
> > > But I don't want to do this unconditionally - for systems not
> > > running anything that samples i_version we want relatime/lazytime
> > > to behave as they are supposed to and do periodic persistent updates
> > > as per normal. Principle of least surprise and all that jazz.
> > 
> > Well - see above: I think in a perfect world, we'd simply never change
> > i_version at all for any atime updates, and relatime/lazytime simply
> > wouldn't be an issue at all wrt i_version.
> 
> Right, that's what I'd like, especially as the new definition of
> i_version - "only change when [cm]time changes" - means that the VFS
> i_version is really now just a glorified timestamp.
> 
> > Wouldn't _that_ be the trule "least surprising" behavior? Considering
> > that nobody wants i_version to change for what are otherwise pure
> > reads (that's kind of the *definition* of atime, after all).
> 
> So, if you don't like the idea of us ignoring relatime/lazytime
> conditionally, are we allowed to simply ignore them *all the time*
> and do all timestamp updates in the manner that causes users the
> least amount of pain?
> 
> I mean, relatime only exists because atime updates cause users pain.
> lazytime only exists because relatime doesn't address the pain that
> timestamp updates cause mixed read/write or pure O_DSYNC overwrite
> workloads pain. noatime is a pain because it loses all atime
> updates.
> 
> There is no "one size is right for everyone", so why not just let
> filesystems do what is most efficient from an internal IO and
> persistence POV whilst still maintaining the majority of "expected"
> behaviours?
> 
> Keep in mind, though, that this is all moot if we can get rid of
> i_version entirely....
> 
> > Now, the annoyance here is that *both* (a) and (b) then have that
> > impact of "i_version no longer tracks di_changecount".
> 
> .... and what is annoying is that that the new i_version just a
> glorified ctime change counter. What we should be fixing is ctime -
> integrating this change counting into ctime would allow us to make
> i_version go away entirely. i.e. We don't need a persistent ctime
> change counter if the ctime has sufficient resolution or persistent
> encoding that it does not need an external persistent change
> counter.
> 
> That was reasoning behind the multi-grain timestamps. While the mgts
> implementation was flawed, the reasoning behind it certainly isn't.
> We should be trying to get rid of i_version by integrating it into
> ctime updates, not arguing how atime vs i_version should work.
> 
> > So I don't think the issue here is "i_version" per se. I think in a
> > vacuum, the best option of i_version is pretty obvious.  But if you
> > want i_version to track di_changecount, *then* you end up with that
> > situation where the persistence of atime matters, and i_version needs
> > to update whenever a (persistent) atime update happens.
> 
> Yet I don't want i_version to track di_changecount.
> 
> I want to *stop supporting i_version altogether* in XFS.
> 
> I want i_version as filesystem internal metadata to die completely.
> 
> I don't want to change the on disk format to add a new i_version
> field because we'll be straight back in this same siutation when the
> next i_version bug is found and semantics get changed yet again.
> 
> Hence if we can encode the necessary change attributes into ctime,
> we can drop VFS i_version support altogether.  Then the "atime bumps
> i_version" problem also goes away because then we *don't use
> i_version*.
> 
> But if we can't get the VFS to do this with ctime, at least we have
> the abstractions available to us (i.e. timestamp granularity and
> statx change cookie) to allow XFS to implement this sort of
> ctime-with-integrated-change-counter internally to the filesystem
> and be able to drop i_version support.... 
> 
> [....]

> > This really is all *entirely* an artifact of that "bi_changecount" vs
> > "i_version" being tied together. You did seem to imply that you'd be
> > ok with having "bi_changecount" be split from i_version, ie from an
> > earlier email in this thread:
> > 
> >  "Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get
> >   the change cookie, we really don't need to expose di_changecount in
> >   i_version at all - we could simply copy an internal di_changecount
> >   value into the statx cookie field in xfs_vn_getattr() and there
> >   would be almost no change of behaviour from the perspective of NFS
> >   and IMA at all"
> 
> .... which is what I was talking about here.
> 
> i.e. I was not talking about splitting i_version from di_changecount
> - I was talking about being able to stop supporting the VFS
> i_version counter entirely and still having NFS and IMA work
> correctly.
> 
> Continually bring the argument back to "atime vs i_version" misses
> the bigger issues around this new i_version definition and
> implementation, and that the real solution should be to fix ctime
> updates to make i_version at the VFS level go away forever.
> 

Ok, so your proposal is to keep using coarse grained timestamps, but use
the "insignificant" bits in them to store a change counter?

That sounds complex and fraught with peril, but I'm willing to listen to
some specifics about how that would work.
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2023-10-31 11:26 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 17:41 [PATCH RFC 0/9] fs: multigrain timestamps (redux) Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 1/9] fs: switch timespec64 fields in inode to discrete integers Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Jeff Layton
2023-10-18 19:18   ` Linus Torvalds
2023-10-18 20:47     ` Jeff Layton
2023-10-18 21:31       ` Linus Torvalds
2023-10-18 21:52         ` Jeff Layton
2023-10-19  9:29           ` Christian Brauner
2023-10-19 11:28             ` Jeff Layton
2023-10-19 22:02               ` Dave Chinner
2023-10-20 12:12                 ` Jeff Layton
2023-10-20 20:06                   ` Linus Torvalds
2023-10-20 20:20                     ` Linus Torvalds
2023-10-20 21:05                     ` Jeff Layton
2023-10-22 22:17                   ` Dave Chinner
2023-10-23 14:45                     ` Jeff Layton
2023-10-23 23:26                       ` Dave Chinner
2023-10-24  0:18                         ` Linus Torvalds
2023-10-24  3:40                           ` Dave Chinner
2023-10-24  4:10                             ` Linus Torvalds
2023-10-24  7:08                             ` Amir Goldstein
2023-10-24 18:40                               ` Jeff Layton
2023-10-25  8:05                                 ` Dave Chinner
2023-10-25 10:41                                   ` Amir Goldstein
2023-10-25 12:25                                   ` Jeff Layton
2023-10-26  2:20                                     ` Dave Chinner
2023-10-26  5:42                                       ` Amir Goldstein
2023-10-27 10:35                                       ` Jeff Layton
2023-10-30 22:37                                         ` Dave Chinner
2023-10-30 23:11                                           ` Linus Torvalds
2023-10-31  1:42                                             ` Dave Chinner
2023-10-31  7:03                                               ` Amir Goldstein
2023-10-31 10:30                                                 ` Christian Brauner
2023-10-31 11:29                                                 ` Jeff Layton
2023-10-31 21:57                                                   ` Dave Chinner
2023-10-31 23:02                                                     ` Darrick J. Wong
2023-10-31 23:47                                                       ` Dave Chinner
2023-11-01 10:16                                                     ` Jan Kara
2023-11-01 11:38                                                       ` Amir Goldstein
2023-11-02 10:17                                                         ` Jeff Layton
2023-11-01 20:10                                                       ` Linus Torvalds
2023-11-01 21:34                                                         ` Trond Myklebust
2023-11-01 22:23                                                           ` Linus Torvalds
2023-11-01 22:45                                                             ` Trond Myklebust
2023-11-01 23:29                                                           ` Dave Chinner
2023-11-02 10:29                                                             ` Jeff Layton
2023-11-02 10:15                                                         ` Jeff Layton
2023-10-31 23:12                                                 ` Darrick J. Wong
2023-11-01  8:08                                                   ` Amir Goldstein
2023-10-31 11:26                                               ` Jeff Layton [this message]
2023-10-31 19:43                                                 ` John Stoffel
2023-10-31 11:04                                           ` Jeff Layton
2023-10-31 12:22                                             ` Jan Kara
2023-10-31 12:55                                               ` Jeff Layton
2023-10-30 23:34                                         ` ronnie sahlberg
2023-10-24 14:24                             ` Jeff Layton
2023-10-24 19:06                           ` Jeff Layton
2023-10-24 19:40                             ` Linus Torvalds
2023-10-24 20:19                               ` Jeff Layton
2023-10-31 10:26               ` Christian Brauner
2023-10-31 13:55                 ` Jeff Layton
2023-10-19 22:00   ` Thomas Gleixner
2023-10-19 22:41     ` Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 3/9] timekeeping: add new debugfs file to count multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 4/9] fs: add infrastructure for " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 5/9] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 6/9] xfs: switch to multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 7/9] ext4: " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 8/9] btrfs: convert " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 9/9] tmpfs: add support for " Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b4c04efdde3bc7d107d0bdc68e100a94942aca3c.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hughd@google.com \
    --cc=jack@suse.de \
    --cc=josef@toxicpanda.com \
    --cc=jstultz@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.