From: Jeff Layton <jlayton@redhat.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jan Kara <jack@suse.cz>, Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization
Date: Thu, 30 Mar 2017 14:35:32 -0400 [thread overview]
Message-ID: <1490898932.2667.1.camel@redhat.com> (raw)
In-Reply-To: <20170330161231.GA9824@fieldses.org>
On Thu, 2017-03-30 at 12:12 -0400, J. Bruce Fields wrote:
> On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote:
> > On Thu, 2017-03-30 at 08:47 +0200, Jan Kara wrote:
> > > Hum, so are we fine if i_version just changes (increases) for all inodes
> > > after a server crash? If I understand its use right, it would mean
> > > invalidation of all client's caches but that is not such a big deal given
> > > how frequent server crashes should be, right?
>
> Even if it's rare, it may be really painful when all your clients are
> forced to throw out and repopulate their caches after a crash. But,
> yes, maybe we can live with it.
>
Yeah, assuming that normal reboots wouldn't cause this, then I don't see
it as being too bad.
> > > Because if above is acceptable we could make reported i_version to be a sum
> > > of "superblock crash counter" and "inode i_version". We increment
> > > "superblock crash counter" whenever we detect unclean filesystem shutdown.
> > > That way after a crash we are guaranteed each inode will report new
> > > i_version (the sum would probably have to look like "superblock crash
> > > counter" * 65536 + "inode i_version" so that we avoid reusing possible
> > > i_version numbers we gave away but did not write to disk but still...).
> > > Thoughts?
>
> How hard is this for filesystems to support? Do they need an on-disk
> format change to keep track of the crash counter? Maybe not, maybe the
> high bits of the i_version counters are all they need.
>
Yeah, I imagine we'd need a on-disk change for this unless there's
something already present that we could use in place of a crash counter.
> > That does sound like a good idea. This is a 64 bit value, so we should
> > be able to carve out some upper bits for a crash counter without risking
> > wrapping.
> >
> > The other constraint here is that we'd like any later version of the
> > counter to be larger than any earlier value that was handed out. I think
> > this idea would still satisfy that.
>
> I guess we just want to have some back-of-the-envelope estimates of
> maximum number of i_version increments possible between crashes and
> maximum number of crashes possible over lifetime of a filesystem, to
> decide how to split up the bits.
>
> I wonder if we could get away with using the new crash counter only for
> *new* values of the i_version? After a crash, use the on disk i_version
> as is, and put off using the new crash counter until the next time the
> file's modified.
>
That sounds difficult to get right. Suppose I have an inode that has not
been updated in a long time. Someone writes to it and then queries the
i_version. How do I know whether there were crashes since the last time
I updated it? Or am I misunderstanding what you're proposing here?
> That would still eliminate the risk of accidental reuse of an old
> i_version value. It still leaves some cases where the client could fail
> to notice an update indefinitely. All these cases I think have to
> assume that a writer made some changes that it failed to ever sync, so
> as long as we care only about close-to-open semantics perhaps those
> cases don't matter.
>
> I wonder if repeated crashes can lead to any odd corner cases.
>
--
Jeff Layton <jlayton@redhat.com>
next prev parent reply other threads:[~2017-03-30 18:35 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36 ` J. Bruce Fields
2017-03-04 0:09 ` Jeff Layton
2017-03-03 23:55 ` NeilBrown
2017-03-04 1:58 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04 0:03 ` NeilBrown
2017-03-04 0:43 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton
2016-12-21 17:03 ` Jeff Layton
2016-12-22 8:38 ` Amir Goldstein
2016-12-22 13:27 ` Jeff Layton
2017-03-04 0:00 ` NeilBrown
2017-03-04 0:00 ` NeilBrown
2016-12-22 8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42 ` Jeff Layton
2017-03-20 21:43 ` J. Bruce Fields
2017-03-21 13:45 ` Christoph Hellwig
2017-03-21 16:30 ` J. Bruce Fields
2017-03-21 17:23 ` Jeff Layton
2017-03-21 17:37 ` J. Bruce Fields
2017-03-21 17:51 ` J. Bruce Fields
2017-03-21 18:30 ` J. Bruce Fields
2017-03-21 18:30 ` J. Bruce Fields
2017-03-21 18:46 ` Jeff Layton
2017-03-21 19:13 ` J. Bruce Fields
2017-03-21 21:54 ` Jeff Layton
2017-03-21 21:54 ` Jeff Layton
2017-03-29 11:15 ` Jan Kara
2017-03-29 17:54 ` Jeff Layton
2017-03-29 17:54 ` Jeff Layton
2017-03-29 23:41 ` Dave Chinner
2017-03-30 11:24 ` Jeff Layton
2017-04-04 18:38 ` J. Bruce Fields
2017-03-30 6:47 ` Jan Kara
2017-03-30 11:11 ` Jeff Layton
2017-03-30 16:12 ` J. Bruce Fields
2017-03-30 18:35 ` Jeff Layton [this message]
2017-03-30 21:11 ` Boaz Harrosh
2017-03-30 21:11 ` Boaz Harrosh
2017-04-04 18:31 ` J. Bruce Fields
2017-04-04 18:31 ` J. Bruce Fields
2017-04-05 1:43 ` NeilBrown
2017-04-05 8:05 ` Jan Kara
2017-04-05 18:14 ` J. Bruce Fields
2017-05-11 18:59 ` J. Bruce Fields
2017-05-11 22:22 ` NeilBrown
2017-05-12 16:21 ` J. Bruce Fields
2017-05-12 16:21 ` J. Bruce Fields
2017-10-30 13:21 ` Jeff Layton
2017-05-12 8:27 ` Jan Kara
2017-05-12 15:56 ` J. Bruce Fields
2017-05-12 11:01 ` Jeff Layton
2017-05-12 15:57 ` J. Bruce Fields
2017-04-06 1:12 ` NeilBrown
2017-04-06 1:12 ` NeilBrown
2017-04-06 1:12 ` NeilBrown
2017-04-06 7:22 ` Jan Kara
2017-04-05 17:26 ` J. Bruce Fields
2017-04-01 23:05 ` Dave Chinner
2017-04-03 14:00 ` Jan Kara
2017-04-04 12:34 ` Dave Chinner
2017-04-04 17:53 ` J. Bruce Fields
2017-04-04 17:53 ` J. Bruce Fields
2017-04-05 1:26 ` NeilBrown
2017-03-21 21:45 ` Dave Chinner
2017-03-22 19:53 ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04 0:53 ` Jeff Layton
2017-03-08 17:29 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1490898932.2667.1.camel@redhat.com \
--to=jlayton@redhat.com \
--cc=bfields@fieldses.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.