From: bfields@fieldses.org (J. Bruce Fields)
To: Jeff Layton <jlayton@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
NeilBrown <neilb@suse.de>,
adilger.kernel@dilger.ca, djwong@kernel.org, david@fromorbit.com,
trondmy@hammerspace.com, viro@zeniv.linux.org.uk,
zohar@linux.ibm.com, xiubli@redhat.com, chuck.lever@oracle.com,
lczerner@redhat.com, brauner@kernel.org, fweimer@redhat.com,
linux-man@vger.kernel.org, linux-api@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, ceph-devel@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-xfs@vger.kernel.org
Subject: Re: [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field
Date: Thu, 8 Sep 2022 11:56:05 -0400 [thread overview]
Message-ID: <20220908155605.GD8951@fieldses.org> (raw)
In-Reply-To: <02928a8c5718590bea5739b13d6b6ebe66cac577.camel@kernel.org>
On Thu, Sep 08, 2022 at 11:44:33AM -0400, Jeff Layton wrote:
> On Thu, 2022-09-08 at 11:21 -0400, Theodore Ts'o wrote:
> > On Thu, Sep 08, 2022 at 10:33:26AM +0200, Jan Kara wrote:
> > > It boils down to the fact that we don't want to call mark_inode_dirty()
> > > from IOCB_NOWAIT path because for lots of filesystems that means journal
> > > operation and there are high chances that may block.
> > >
> > > Presumably we could treat inode dirtying after i_version change similarly
> > > to how we handle timestamp updates with lazytime mount option (i.e., not
> > > dirty the inode immediately but only with a delay) but then the time window
> > > for i_version inconsistencies due to a crash would be much larger.
> >
> > Perhaps this is a radical suggestion, but there seems to be a lot of
> > the problems which are due to the concern "what if the file system
> > crashes" (and so we need to worry about making sure that any
> > increments to i_version MUST be persisted after it is incremented).
> >
> > Well, if we assume that unclean shutdowns are rare, then perhaps we
> > shouldn't be optimizing for that case. So.... what if a file system
> > had a counter which got incremented each time its journal is replayed
> > representing an unclean shutdown. That shouldn't happen often, but if
> > it does, there might be any number of i_version updates that may have
> > gotten lost. So in that case, the NFS client should invalidate all of
> > its caches.
> >
> > If the i_version field was large enough, we could just prefix the
> > "unclean shutdown counter" with the existing i_version number when it
> > is sent over the NFS protocol to the client. But if that field is too
> > small, and if (as I understand things) NFS just needs to know when
> > i_version is different, we could just simply hash the "unclean
> > shtudown counter" with the inode's "i_version counter", and let that
> > be the version which is sent from the NFS client to the server.
> >
> > If we could do that, then it doesn't become critical that every single
> > i_version bump has to be persisted to disk, and we could treat it like
> > a lazytime update; it's guaranteed to updated when we do an clean
> > unmount of the file system (and when the file system is frozen), but
> > on a crash, there is no guaranteee that all i_version bumps will be
> > persisted, but we do have this "unclean shutdown" counter to deal with
> > that case.
> >
> > Would this make life easier for folks?
> >
> > - Ted
>
> Thanks for chiming in, Ted. That's part of the problem, but we're
> actually not too worried about that case:
>
> nfsd mixes the ctime in with i_version, so you'd have to crash+clock
> jump backward by juuuust enough to allow you to get the i_version and
> ctime into a state it was before the crash, but with different data.
> We're assuming that that is difficult to achieve in practice.
But a change in the clock could still cause our returned change
attribute to go backwards (even without a crash). Not sure how to
evaluate the risk, but it was enough that Trond hasn't been comfortable
with nfsd advertising NFS4_CHANGE_TYPE_IS_MONOTONIC.
Ted's idea would be sufficient to allow us to turn that flag on, which I
think allows some client-side optimizations.
> The issue with a reboot counter (or similar) is that on an unclean crash
> the NFS client would end up invalidating every inode in the cache, as
> all of the i_versions would change. That's probably excessive.
But if we use the crash counter on write instead of read, we don't
invalidate caches unnecessarily. And I think the monotonicity would
still be close enough for our purposes?
> The bigger issue (at the moment) is atomicity: when we fetch an
> i_version, the natural inclination is to associate that with the state
> of the inode at some point in time, so we need this to be updated
> atomically with certain other attributes of the inode. That's the part
> I'm trying to sort through at the moment.
That may be, but I still suspect the crash counter would help.
--b.
next prev parent reply other threads:[~2022-09-08 15:56 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-07 11:16 [man-pages RFC PATCH v4] statx, inode: document the new STATX_INO_VERSION field Jeff Layton
2022-09-07 11:37 ` NeilBrown
2022-09-07 12:20 ` J. Bruce Fields
2022-09-07 12:58 ` Jeff Layton
2022-09-07 12:47 ` Jeff Layton
2022-09-07 12:52 ` J. Bruce Fields
2022-09-07 13:12 ` Jeff Layton
2022-09-07 13:51 ` Jan Kara
2022-09-07 14:43 ` Jeff Layton
2022-09-08 0:44 ` NeilBrown
2022-09-08 8:33 ` Jan Kara
2022-09-08 15:21 ` Theodore Ts'o
2022-09-08 15:44 ` J. Bruce Fields
2022-09-08 15:44 ` Jeff Layton
2022-09-08 15:56 ` J. Bruce Fields [this message]
2022-09-08 16:15 ` Chuck Lever III
2022-09-08 17:40 ` Jeff Layton
2022-09-08 18:22 ` J. Bruce Fields
2022-09-08 19:07 ` Jeff Layton
2022-09-08 23:01 ` NeilBrown
2022-09-08 23:23 ` Jeff Layton
2022-09-08 23:45 ` NeilBrown
2022-09-09 15:45 ` J. Bruce Fields
2022-09-09 16:36 ` Jeff Layton
2022-09-10 14:56 ` J. Bruce Fields
2022-09-12 11:42 ` Jeff Layton
2022-09-12 12:13 ` Florian Weimer
2022-09-12 12:55 ` Jeff Layton
2022-09-12 13:20 ` Florian Weimer
2022-09-12 13:49 ` Jeff Layton
2022-09-12 13:51 ` J. Bruce Fields
2022-09-12 14:02 ` Jeff Layton
2022-09-12 14:47 ` J. Bruce Fields
2022-09-12 14:15 ` Trond Myklebust
2022-09-12 14:50 ` J. Bruce Fields
2022-09-12 14:56 ` Trond Myklebust
2022-09-12 15:32 ` Trond Myklebust
2022-09-12 15:49 ` Jeff Layton
2022-09-12 12:54 ` J. Bruce Fields
2022-09-12 12:59 ` Jeff Layton
2022-09-13 0:29 ` John Stoffel
2022-09-13 0:41 ` Dave Chinner
2022-09-13 1:49 ` NeilBrown
2022-09-13 2:41 ` Dave Chinner
2022-09-13 3:30 ` NeilBrown
2022-09-13 9:38 ` Theodore Ts'o
2022-09-13 19:02 ` J. Bruce Fields
2022-09-13 23:19 ` NeilBrown
2022-09-14 0:08 ` J. Bruce Fields
2022-09-09 20:34 ` John Stoffel
2022-09-10 22:13 ` NeilBrown
2022-09-12 10:43 ` Jeff Layton
2022-09-12 13:42 ` J. Bruce Fields
2022-09-12 23:14 ` NeilBrown
2022-09-15 14:06 ` J. Bruce Fields
2022-09-15 15:08 ` Trond Myklebust
2022-09-15 16:45 ` Jeff Layton
2022-09-15 17:49 ` Trond Myklebust
2022-09-15 18:11 ` Jeff Layton
2022-09-15 19:03 ` Trond Myklebust
2022-09-15 19:25 ` Jeff Layton
2022-09-15 22:23 ` NeilBrown
2022-09-16 6:54 ` Theodore Ts'o
2022-09-16 11:36 ` Jeff Layton
2022-09-16 15:11 ` Jeff Layton
2022-09-18 23:53 ` Dave Chinner
2022-09-19 13:13 ` Jeff Layton
2022-09-20 0:16 ` Dave Chinner
2022-09-20 10:26 ` Jeff Layton
2022-09-21 0:00 ` Dave Chinner
2022-09-21 10:33 ` Jeff Layton
2022-09-21 21:41 ` Dave Chinner
2022-09-22 10:18 ` Jeff Layton
2022-09-22 20:18 ` Jeff Layton
2022-09-23 9:56 ` Jan Kara
2022-09-23 10:19 ` Jeff Layton
2022-09-23 13:44 ` Trond Myklebust
2022-09-23 13:50 ` Jeff Layton
2022-09-23 14:58 ` Frank Filz
2022-09-26 22:43 ` NeilBrown
2022-09-27 11:14 ` Jeff Layton
2022-09-27 13:18 ` Jeff Layton
2022-09-15 15:41 ` Jeff Layton
2022-09-15 22:42 ` NeilBrown
2022-09-16 11:32 ` Jeff Layton
2022-09-09 12:11 ` Theodore Ts'o
2022-09-09 12:47 ` Jeff Layton
2022-09-09 13:48 ` Theodore Ts'o
2022-09-09 14:43 ` Jeff Layton
2022-09-09 14:58 ` Theodore Ts'o
2022-09-08 22:55 ` NeilBrown
2022-09-08 23:59 ` Trond Myklebust
2022-09-09 0:51 ` NeilBrown
2022-09-09 1:05 ` Trond Myklebust
2022-09-09 1:07 ` NeilBrown
2022-09-09 1:10 ` Trond Myklebust
2022-09-09 2:14 ` Trond Myklebust
2022-09-09 6:41 ` NeilBrown
2022-09-10 12:39 ` Jeff Layton
2022-09-10 22:53 ` NeilBrown
2022-09-12 10:25 ` Jeff Layton
2022-09-12 23:29 ` NeilBrown
2022-09-13 1:15 ` Dave Chinner
2022-09-13 1:41 ` NeilBrown
2022-09-13 19:01 ` Jeff Layton
2022-09-13 23:24 ` NeilBrown
2022-09-14 11:51 ` Jeff Layton
2022-09-14 22:45 ` NeilBrown
2022-09-14 23:02 ` NeilBrown
2022-09-08 22:40 ` NeilBrown
2022-09-07 13:55 ` Trond Myklebust
2022-09-07 14:05 ` Jeff Layton
2022-09-07 15:04 ` Trond Myklebust
2022-09-07 15:11 ` Jeff Layton
2022-09-08 0:40 ` NeilBrown
2022-09-08 11:34 ` Jeff Layton
2022-09-08 22:29 ` NeilBrown
2022-09-09 11:53 ` Jeff Layton
2022-09-10 22:58 ` NeilBrown
2022-09-10 19:46 ` Al Viro
2022-09-10 23:00 ` NeilBrown
2022-09-08 0:31 ` NeilBrown
2022-09-08 0:41 ` Trond Myklebust
2022-09-08 0:53 ` NeilBrown
2022-09-08 11:37 ` Jeff Layton
2022-09-08 12:40 ` Trond Myklebust
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220908155605.GD8951@fieldses.org \
--to=bfields@fieldses.org \
--cc=adilger.kernel@dilger.ca \
--cc=brauner@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=chuck.lever@oracle.com \
--cc=david@fromorbit.com \
--cc=djwong@kernel.org \
--cc=fweimer@redhat.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=lczerner@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-man@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=trondmy@hammerspace.com \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=xiubli@redhat.com \
--cc=zohar@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.