linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: ronnie sahlberg <ronniesahlberg@gmail.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Dave Chinner <david@fromorbit.com>,
	Amir Goldstein <amir73il@gmail.com>,
	 Linus Torvalds <torvalds@linux-foundation.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	 Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	 John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	 Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	 "Theodore Ts'o" <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Chris Mason <clm@fb.com>,  Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.de>,  David Howells <dhowells@redhat.com>,
	linux-fsdevel@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org,  linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	 linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing
Date: Tue, 31 Oct 2023 09:34:17 +1000	[thread overview]
Message-ID: <CAN05THQZKe5e+KCWX4gbz6MH633q6wTbNbukLpeTPKcRSWFo-w@mail.gmail.com> (raw)
In-Reply-To: <6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org>

On Fri, 27 Oct 2023 at 20:36, Jeff Layton <jlayton@kernel.org> wrote:
>
> On Thu, 2023-10-26 at 13:20 +1100, Dave Chinner wrote:
> > On Wed, Oct 25, 2023 at 08:25:35AM -0400, Jeff Layton wrote:
> > > On Wed, 2023-10-25 at 19:05 +1100, Dave Chinner wrote:
> > > > On Tue, Oct 24, 2023 at 02:40:06PM -0400, Jeff Layton wrote:
> > > > > On Tue, 2023-10-24 at 10:08 +0300, Amir Goldstein wrote:
> > > > > > On Tue, Oct 24, 2023 at 6:40 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > > > >
> > > > > > > On Mon, Oct 23, 2023 at 02:18:12PM -1000, Linus Torvalds wrote:
> > > > > > > > On Mon, 23 Oct 2023 at 13:26, Dave Chinner <david@fromorbit.com> wrote:
> > > > > > Does xfs_repair guarantee that changes of atime, or any inode changes
> > > > > > for that matter, update i_version? No, it does not.
> > > > > > So IMO, "atime does not update i_version" is not an "on-disk format change",
> > > > > > it is a runtime behavior change, just like lazytime is.
> > > > >
> > > > > This would certainly be my preference. I don't want to break any
> > > > > existing users though.
> > > >
> > > > That's why I'm trying to get some kind of consensus on what
> > > > rules and/or atime configurations people are happy for me to break
> > > > to make it look to users like there's a viable working change
> > > > attribute being supplied by XFS without needing to change the on
> > > > disk format.
> > > >
> > >
> > > I agree that the only bone of contention is whether to count atime
> > > updates against the change attribute. I think we have consensus that all
> > > in-kernel users do _not_ want atime updates counted against the change
> > > attribute. The only real question is these "legacy" users of
> > > di_changecount.
> >
> > Please stop refering to "legacy users" of di_changecount. Whether
> > there are users or not is irrelevant - it is defined by the current
> > on-disk format specification, and as such there may be applications
> > we do not know about making use of the current behaviour.
> >
> > It's like a linux syscall - we can't remove them because there may
> > be some user we don't know about still using that old syscall. We
> > simply don't make changes that can potentially break user
> > applications like that.
> >
> > The on disk format is the same - there is software out that we don't
> > know about that expects a certain behaviour based on the
> > specification. We don't break the on disk format by making silent
> > behavioural changes - we require a feature flag to indicate
> > behaviour has changed so that applications can take appropriate
> > actions with stuff they don't understand.
> >
> > The example for this is the BIGTIME timestamp format change. The on
> > disk inode structure is physically unchanged, but the contents of
> > the timestamp fields are encoded very differently. Sure, the older
> > kernels can read the timestamp data without any sort of problem
> > occurring, except for the fact the timestamps now appear to be
> > completely corrupted.
> >
> > Changing the meaning of ithe contents of di_changecount is no
> > different. It might look OK and nothing crashes, but nothing can be
> > inferred from the value in the field because we don't know how it
> > has been modified.
> >
> > Hence we can't just change the meaning, encoding or behaviour of an
> > on disk field that would result in existing kernels and applications
> > doing the wrong thing with that field (either read or write) without
> > adding a feature flag to indicate what behaviour that field should
> > have.
> >
> > > > > Perhaps this ought to be a mkfs option? Existing XFS filesystems could
> > > > > still behave with the legacy behavior, but we could make mkfs.xfs build
> > > > > filesystems by default that work like NFS requires.
> > > >
> > > > If we require mkfs to set a flag to change behaviour, then we're
> > > > talking about making an explicit on-disk format change to select the
> > > > optional behaviour. That's precisely what I want to avoid.
> > > >
> > >
> > > Right. The on-disk di_changecount would have a (subtly) different
> > > meaning at that point.
> > >
> > > It's not a change that requires drastic retooling though. If we were to
> > > do this, we wouldn't need to grow the on-disk inode. Booting to an older
> > > kernel would cause the behavior to revert. That's sub-optimal, but not
> > > fatal.
> >
> > See above: redefining the contents, behaviour or encoding of an on
> > disk field is a change of the on-disk format specification.
> >
> > The rules for on disk format changes that we work to were set in
> > place long before I started working on XFS.  They are sane, well
> > thought out rules that have stood the test of time and massive new
> > feature introductions (CRCs, reflink, rmap, etc). And they only work
> > because we don't allow anyone to bend them for convenience, short
> > cuts or expediting their pet project.
> >
> > > What I don't quite understand is how these tools are accessing
> > > di_changecount?
> >
> > As I keep saying: this is largely irrelevant to the problem at hand.
> >
> > > XFS only accesses the di_changecount to propagate the value to and from
> > > the i_version,
> >
> > Yes.  XFS has a strong separation between on-disk structures and
> > in-memory values, and i_version is simply the in-memory field we use
> > to store the current di_changecount value.  We force bump i_version
> > every time we modify the inode core regardless of whether anyone has
> > queried i_version because that's what di_changecount requires. i.e.
> > the filesystem controls the contents of i_version, not the VFS.
> >
> > Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get
> > the change cookie, we really don't need to expose di_changecount in
> > i_version at all - we could simply copy an internal di_changecount
> > value into the statx cookie field in xfs_vn_getattr() and there
> > would be almost no change of behaviour from the perspective of NFS
> > and IMA at all.
> >
> > > and there is nothing besides NFSD and IMA that queries
> > > the i_version value in-kernel. So, this must be done via some sort of
> > > userland tool that is directly accessing the block device (or some 3rd
> > > party kernel module).
> >
> > Yup, both of those sort of applications exist. e.g. the DMAPI kernel
> > module allows direct access to inode metadata through a custom
> > bulkstat formatter implementation - it returns different information
> > comapred to the standard XFS one in the upstream kernel.
> >
> > > In earlier discussions you alluded to some repair and/or analysis tools
> > > that depended on this counter.
> >
> > Yes, and one of those "tools" is *me*.
> >
> > I frequently look at the di_changecount when doing forensic and/or
> > failure analysis on filesystem corpses.  SOE analysis, relative
> > modification activity, etc all give insight into what happened to
> > the filesystem to get it into the state it is currently in, and
> > di_changecount provides information no other metadata in the inode
> > contains.
> >
> > > I took a quick look in xfsprogs, but I
> > > didn't see anything there. Is there a library or something that these
> > > tools use to get at this value?
> >
> > xfs_db is the tool I use for this, such as:
> >
> > $ sudo xfs_db -c "sb 0" -c "a rootino" -c "p v3.change_count" /dev/mapper/fast
> > v3.change_count = 35
> > $
> >
> > The root inode in this filesystem has a change count of 35. The root
> > inode has 32 dirents in it, which means that no entries have ever
> > been removed or renamed. This sort of insight into the past history
> > of inode metadata is largely impossible to get any other way, and
> > it's been the difference between understanding failure and having no
> > clue more than once.
> >
> > Most block device parsing applications simply write their own
> > decoder that walks the on-disk format. That's pretty trivial to do,
> > developers can get all the information needed to do this from the
> > on-disk format specification documentation we keep on kernel.org...
> >
>
> Fair enough. I'm not here to tell you that you guys that you need to
> change how di_changecount works. If it's too valuable to keep it
> counting atime-only updates, then so be it.
>
> If that's the case however, and given that the multigrain timestamp work
> is effectively dead, then I don't see an alternative to growing the on-
> disk inode. Do you?

Would a new mount option be a viable alternative?
A new option that would when used change the semantics of these fields
to what NFS needs?
With the caveat: using this mount option may break other special tools
that depend on the default
semantics.


> --
> Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2023-10-30 23:34 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 17:41 [PATCH RFC 0/9] fs: multigrain timestamps (redux) Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 1/9] fs: switch timespec64 fields in inode to discrete integers Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Jeff Layton
2023-10-18 19:18   ` Linus Torvalds
2023-10-18 20:47     ` Jeff Layton
2023-10-18 21:31       ` Linus Torvalds
2023-10-18 21:52         ` Jeff Layton
2023-10-19  9:29           ` Christian Brauner
2023-10-19 11:28             ` Jeff Layton
2023-10-19 22:02               ` Dave Chinner
2023-10-20 12:12                 ` Jeff Layton
2023-10-20 20:06                   ` Linus Torvalds
2023-10-20 20:20                     ` Linus Torvalds
2023-10-20 21:05                     ` Jeff Layton
2023-10-22 22:17                   ` Dave Chinner
2023-10-23 14:45                     ` Jeff Layton
2023-10-23 23:26                       ` Dave Chinner
2023-10-24  0:18                         ` Linus Torvalds
2023-10-24  3:40                           ` Dave Chinner
2023-10-24  4:10                             ` Linus Torvalds
2023-10-24  7:08                             ` Amir Goldstein
2023-10-24 18:40                               ` Jeff Layton
2023-10-25  8:05                                 ` Dave Chinner
2023-10-25 10:41                                   ` Amir Goldstein
2023-10-25 12:25                                   ` Jeff Layton
2023-10-26  2:20                                     ` Dave Chinner
2023-10-26  5:42                                       ` Amir Goldstein
2023-10-27 10:35                                       ` Jeff Layton
2023-10-30 22:37                                         ` Dave Chinner
2023-10-30 23:11                                           ` Linus Torvalds
2023-10-31  1:42                                             ` Dave Chinner
2023-10-31  7:03                                               ` Amir Goldstein
2023-10-31 10:30                                                 ` Christian Brauner
2023-10-31 11:29                                                 ` Jeff Layton
2023-10-31 21:57                                                   ` Dave Chinner
2023-10-31 23:02                                                     ` Darrick J. Wong
2023-10-31 23:47                                                       ` Dave Chinner
2023-11-01 10:16                                                     ` Jan Kara
2023-11-01 11:38                                                       ` Amir Goldstein
2023-11-02 10:17                                                         ` Jeff Layton
2023-11-01 20:10                                                       ` Linus Torvalds
2023-11-01 21:34                                                         ` Trond Myklebust
2023-11-01 22:23                                                           ` Linus Torvalds
2023-11-01 22:45                                                             ` Trond Myklebust
2023-11-01 23:29                                                           ` Dave Chinner
2023-11-02 10:29                                                             ` Jeff Layton
2023-11-02 10:15                                                         ` Jeff Layton
2023-10-31 23:12                                                 ` Darrick J. Wong
2023-11-01  8:08                                                   ` Amir Goldstein
2023-10-31 11:26                                               ` Jeff Layton
2023-10-31 19:43                                                 ` John Stoffel
2023-10-31 11:04                                           ` Jeff Layton
2023-10-31 12:22                                             ` Jan Kara
2023-10-31 12:55                                               ` Jeff Layton
2023-10-30 23:34                                         ` ronnie sahlberg [this message]
2023-10-24 14:24                             ` Jeff Layton
2023-10-24 19:06                           ` Jeff Layton
2023-10-24 19:40                             ` Linus Torvalds
2023-10-24 20:19                               ` Jeff Layton
2023-10-31 10:26               ` Christian Brauner
2023-10-31 13:55                 ` Jeff Layton
2023-10-19 22:00   ` Thomas Gleixner
2023-10-19 22:41     ` Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 3/9] timekeeping: add new debugfs file to count multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 4/9] fs: add infrastructure for " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 5/9] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 6/9] xfs: switch to multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 7/9] ext4: " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 8/9] btrfs: convert " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 9/9] tmpfs: add support for " Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN05THQZKe5e+KCWX4gbz6MH633q6wTbNbukLpeTPKcRSWFo-w@mail.gmail.com \
    --to=ronniesahlberg@gmail.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hughd@google.com \
    --cc=jack@suse.de \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=jstultz@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).