All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Amir Goldstein <amir73il@gmail.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	Christian Brauner <brauner@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	John Stultz <jstultz@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Boyd <sboyd@kernel.org>,
	Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	Theodore Ts'o <tytso@mit.edu>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>, Hugh Dickins <hughd@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jan Kara <jack@suse.de>, David Howells <dhowells@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org
Subject: Re: [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing
Date: Fri, 27 Oct 2023 06:35:58 -0400	[thread overview]
Message-ID: <6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org> (raw)
In-Reply-To: <ZTnNCytHLGoJY9ds@dread.disaster.area>

On Thu, 2023-10-26 at 13:20 +1100, Dave Chinner wrote:
> On Wed, Oct 25, 2023 at 08:25:35AM -0400, Jeff Layton wrote:
> > On Wed, 2023-10-25 at 19:05 +1100, Dave Chinner wrote:
> > > On Tue, Oct 24, 2023 at 02:40:06PM -0400, Jeff Layton wrote:
> > > > On Tue, 2023-10-24 at 10:08 +0300, Amir Goldstein wrote:
> > > > > On Tue, Oct 24, 2023 at 6:40 AM Dave Chinner <david@fromorbit.com> wrote:
> > > > > > 
> > > > > > On Mon, Oct 23, 2023 at 02:18:12PM -1000, Linus Torvalds wrote:
> > > > > > > On Mon, 23 Oct 2023 at 13:26, Dave Chinner <david@fromorbit.com> wrote:
> > > > > Does xfs_repair guarantee that changes of atime, or any inode changes
> > > > > for that matter, update i_version? No, it does not.
> > > > > So IMO, "atime does not update i_version" is not an "on-disk format change",
> > > > > it is a runtime behavior change, just like lazytime is.
> > > > 
> > > > This would certainly be my preference. I don't want to break any
> > > > existing users though.
> > > 
> > > That's why I'm trying to get some kind of consensus on what
> > > rules and/or atime configurations people are happy for me to break
> > > to make it look to users like there's a viable working change
> > > attribute being supplied by XFS without needing to change the on
> > > disk format.
> > > 
> > 
> > I agree that the only bone of contention is whether to count atime
> > updates against the change attribute. I think we have consensus that all
> > in-kernel users do _not_ want atime updates counted against the change
> > attribute. The only real question is these "legacy" users of
> > di_changecount.
> 
> Please stop refering to "legacy users" of di_changecount. Whether
> there are users or not is irrelevant - it is defined by the current
> on-disk format specification, and as such there may be applications
> we do not know about making use of the current behaviour.
> 
> It's like a linux syscall - we can't remove them because there may
> be some user we don't know about still using that old syscall. We
> simply don't make changes that can potentially break user
> applications like that.
> 
> The on disk format is the same - there is software out that we don't
> know about that expects a certain behaviour based on the
> specification. We don't break the on disk format by making silent
> behavioural changes - we require a feature flag to indicate
> behaviour has changed so that applications can take appropriate
> actions with stuff they don't understand.
> 
> The example for this is the BIGTIME timestamp format change. The on
> disk inode structure is physically unchanged, but the contents of
> the timestamp fields are encoded very differently. Sure, the older
> kernels can read the timestamp data without any sort of problem
> occurring, except for the fact the timestamps now appear to be
> completely corrupted.
> 
> Changing the meaning of ithe contents of di_changecount is no
> different. It might look OK and nothing crashes, but nothing can be
> inferred from the value in the field because we don't know how it
> has been modified.
> 
> Hence we can't just change the meaning, encoding or behaviour of an
> on disk field that would result in existing kernels and applications
> doing the wrong thing with that field (either read or write) without
> adding a feature flag to indicate what behaviour that field should
> have.
> 
> > > > Perhaps this ought to be a mkfs option? Existing XFS filesystems could
> > > > still behave with the legacy behavior, but we could make mkfs.xfs build
> > > > filesystems by default that work like NFS requires.
> > > 
> > > If we require mkfs to set a flag to change behaviour, then we're
> > > talking about making an explicit on-disk format change to select the
> > > optional behaviour. That's precisely what I want to avoid.
> > > 
> > 
> > Right. The on-disk di_changecount would have a (subtly) different
> > meaning at that point.
> > 
> > It's not a change that requires drastic retooling though. If we were to
> > do this, we wouldn't need to grow the on-disk inode. Booting to an older
> > kernel would cause the behavior to revert. That's sub-optimal, but not
> > fatal.
> 
> See above: redefining the contents, behaviour or encoding of an on
> disk field is a change of the on-disk format specification.
> 
> The rules for on disk format changes that we work to were set in
> place long before I started working on XFS.  They are sane, well
> thought out rules that have stood the test of time and massive new
> feature introductions (CRCs, reflink, rmap, etc). And they only work
> because we don't allow anyone to bend them for convenience, short
> cuts or expediting their pet project.
> 
> > What I don't quite understand is how these tools are accessing
> > di_changecount?
> 
> As I keep saying: this is largely irrelevant to the problem at hand.
> 
> > XFS only accesses the di_changecount to propagate the value to and from
> > the i_version,
> 
> Yes.  XFS has a strong separation between on-disk structures and
> in-memory values, and i_version is simply the in-memory field we use
> to store the current di_changecount value.  We force bump i_version
> every time we modify the inode core regardless of whether anyone has
> queried i_version because that's what di_changecount requires. i.e.
> the filesystem controls the contents of i_version, not the VFS.
> 
> Now that NFS is using a proper abstraction (i.e. vfs_statx()) to get
> the change cookie, we really don't need to expose di_changecount in
> i_version at all - we could simply copy an internal di_changecount
> value into the statx cookie field in xfs_vn_getattr() and there
> would be almost no change of behaviour from the perspective of NFS
> and IMA at all.
> 
> > and there is nothing besides NFSD and IMA that queries
> > the i_version value in-kernel. So, this must be done via some sort of
> > userland tool that is directly accessing the block device (or some 3rd
> > party kernel module).
> 
> Yup, both of those sort of applications exist. e.g. the DMAPI kernel
> module allows direct access to inode metadata through a custom
> bulkstat formatter implementation - it returns different information
> comapred to the standard XFS one in the upstream kernel.
> 
> > In earlier discussions you alluded to some repair and/or analysis tools
> > that depended on this counter.
> 
> Yes, and one of those "tools" is *me*.
> 
> I frequently look at the di_changecount when doing forensic and/or
> failure analysis on filesystem corpses.  SOE analysis, relative
> modification activity, etc all give insight into what happened to
> the filesystem to get it into the state it is currently in, and
> di_changecount provides information no other metadata in the inode
> contains.
> 
> > I took a quick look in xfsprogs, but I
> > didn't see anything there. Is there a library or something that these
> > tools use to get at this value?
> 
> xfs_db is the tool I use for this, such as:
> 
> $ sudo xfs_db -c "sb 0" -c "a rootino" -c "p v3.change_count" /dev/mapper/fast
> v3.change_count = 35
> $
> 
> The root inode in this filesystem has a change count of 35. The root
> inode has 32 dirents in it, which means that no entries have ever
> been removed or renamed. This sort of insight into the past history
> of inode metadata is largely impossible to get any other way, and
> it's been the difference between understanding failure and having no
> clue more than once.
> 
> Most block device parsing applications simply write their own
> decoder that walks the on-disk format. That's pretty trivial to do,
> developers can get all the information needed to do this from the
> on-disk format specification documentation we keep on kernel.org...
> 

Fair enough. I'm not here to tell you that you guys that you need to
change how di_changecount works. If it's too valuable to keep it
counting atime-only updates, then so be it.

If that's the case however, and given that the multigrain timestamp work
is effectively dead, then I don't see an alternative to growing the on-
disk inode. Do you?
-- 
Jeff Layton <jlayton@kernel.org>

  parent reply	other threads:[~2023-10-27 10:36 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-18 17:41 [PATCH RFC 0/9] fs: multigrain timestamps (redux) Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 1/9] fs: switch timespec64 fields in inode to discrete integers Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 2/9] timekeeping: new interfaces for multigrain timestamp handing Jeff Layton
2023-10-18 19:18   ` Linus Torvalds
2023-10-18 20:47     ` Jeff Layton
2023-10-18 21:31       ` Linus Torvalds
2023-10-18 21:52         ` Jeff Layton
2023-10-19  9:29           ` Christian Brauner
2023-10-19 11:28             ` Jeff Layton
2023-10-19 22:02               ` Dave Chinner
2023-10-20 12:12                 ` Jeff Layton
2023-10-20 20:06                   ` Linus Torvalds
2023-10-20 20:20                     ` Linus Torvalds
2023-10-20 21:05                     ` Jeff Layton
2023-10-22 22:17                   ` Dave Chinner
2023-10-23 14:45                     ` Jeff Layton
2023-10-23 23:26                       ` Dave Chinner
2023-10-24  0:18                         ` Linus Torvalds
2023-10-24  3:40                           ` Dave Chinner
2023-10-24  4:10                             ` Linus Torvalds
2023-10-24  7:08                             ` Amir Goldstein
2023-10-24 18:40                               ` Jeff Layton
2023-10-25  8:05                                 ` Dave Chinner
2023-10-25 10:41                                   ` Amir Goldstein
2023-10-25 12:25                                   ` Jeff Layton
2023-10-26  2:20                                     ` Dave Chinner
2023-10-26  5:42                                       ` Amir Goldstein
2023-10-27 10:35                                       ` Jeff Layton [this message]
2023-10-30 22:37                                         ` Dave Chinner
2023-10-30 23:11                                           ` Linus Torvalds
2023-10-31  1:42                                             ` Dave Chinner
2023-10-31  7:03                                               ` Amir Goldstein
2023-10-31 10:30                                                 ` Christian Brauner
2023-10-31 11:29                                                 ` Jeff Layton
2023-10-31 21:57                                                   ` Dave Chinner
2023-10-31 23:02                                                     ` Darrick J. Wong
2023-10-31 23:47                                                       ` Dave Chinner
2023-11-01 10:16                                                     ` Jan Kara
2023-11-01 11:38                                                       ` Amir Goldstein
2023-11-02 10:17                                                         ` Jeff Layton
2023-11-01 20:10                                                       ` Linus Torvalds
2023-11-01 21:34                                                         ` Trond Myklebust
2023-11-01 22:23                                                           ` Linus Torvalds
2023-11-01 22:45                                                             ` Trond Myklebust
2023-11-01 23:29                                                           ` Dave Chinner
2023-11-02 10:29                                                             ` Jeff Layton
2023-11-02 10:15                                                         ` Jeff Layton
2023-10-31 23:12                                                 ` Darrick J. Wong
2023-11-01  8:08                                                   ` Amir Goldstein
2023-10-31 11:26                                               ` Jeff Layton
2023-10-31 19:43                                                 ` John Stoffel
2023-10-31 11:04                                           ` Jeff Layton
2023-10-31 12:22                                             ` Jan Kara
2023-10-31 12:55                                               ` Jeff Layton
2023-10-30 23:34                                         ` ronnie sahlberg
2023-10-24 14:24                             ` Jeff Layton
2023-10-24 19:06                           ` Jeff Layton
2023-10-24 19:40                             ` Linus Torvalds
2023-10-24 20:19                               ` Jeff Layton
2023-10-31 10:26               ` Christian Brauner
2023-10-31 13:55                 ` Jeff Layton
2023-10-19 22:00   ` Thomas Gleixner
2023-10-19 22:41     ` Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 3/9] timekeeping: add new debugfs file to count multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 4/9] fs: add infrastructure for " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 5/9] fs: have setattr_copy handle multigrain timestamps appropriately Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 6/9] xfs: switch to multigrain timestamps Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 7/9] ext4: " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 8/9] btrfs: convert " Jeff Layton
2023-10-18 17:41 ` [PATCH RFC 9/9] tmpfs: add support for " Jeff Layton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6df5ea54463526a3d898ed2bd8a005166caa9381.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=adilger.kernel@dilger.ca \
    --cc=akpm@linux-foundation.org \
    --cc=amir73il@gmail.com \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=dhowells@redhat.com \
    --cc=djwong@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hughd@google.com \
    --cc=jack@suse.de \
    --cc=josef@toxicpanda.com \
    --cc=jstultz@google.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=sboyd@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.