From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
Jeff Layton <jlayton@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-xfs@vger.kernel.org
Subject: Re: [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization
Date: Tue, 4 Apr 2017 22:34:14 +1000 [thread overview]
Message-ID: <20170404123414.GA23007@dastard> (raw)
In-Reply-To: <20170403140055.GF15168@quack2.suse.cz>
On Mon, Apr 03, 2017 at 04:00:55PM +0200, Jan Kara wrote:
> On Sun 02-04-17 09:05:26, Dave Chinner wrote:
> > On Thu, Mar 30, 2017 at 12:12:31PM -0400, J. Bruce Fields wrote:
> > > On Thu, Mar 30, 2017 at 07:11:48AM -0400, Jeff Layton wrote:
> > > > On Thu, 2017-03-30 at 08:47 +0200, Jan Kara wrote:
> > > > > Because if above is acceptable we could make reported i_version to be a sum
> > > > > of "superblock crash counter" and "inode i_version". We increment
> > > > > "superblock crash counter" whenever we detect unclean filesystem shutdown.
> > > > > That way after a crash we are guaranteed each inode will report new
> > > > > i_version (the sum would probably have to look like "superblock crash
> > > > > counter" * 65536 + "inode i_version" so that we avoid reusing possible
> > > > > i_version numbers we gave away but did not write to disk but still...).
> > > > > Thoughts?
> > >
> > > How hard is this for filesystems to support? Do they need an on-disk
> > > format change to keep track of the crash counter?
> >
> > Yes. We'll need version counter in the superblock, and we'll need to
> > know what the increment semantics are.
> >
> > The big question is how do we know there was a crash? The only thing
> > a journalling filesystem knows at mount time is whether it is clean
> > or requires recovery. Filesystems can require recovery for many
> > reasons that don't involve a crash (e.g. root fs is never unmounted
> > cleanly, so always requires recovery). Further, some filesystems may
> > not even know there was a crash at mount time because their
> > architecture always leaves a consistent filesystem on disk (e.g. COW
> > filesystems)....
>
> What filesystems can or cannot easily do obviously differs. Ext4 has a
> recovery flag set in superblock on RW mount/remount and cleared on
> umount/RO remount.
Even this doesn't help. A recent bug that was reported to the XFS
list - turns out that systemd can't remount-ro the root
filesystem sucessfully on shutdown because there are open write fds
on the root filesystem when it attempts the remount. So it just
reboots without a remount-ro. This uncovered a bug in grub in
that it (still!) thinks sync(1) is sufficient to get all the
metadata that points to a kernel image onto disk in places it can
read. XFS, like ext4, leaves it in the journal and so the system then fails to
boot because systemd didn't remount-ro the root fs and hence the
journal was never flushed before reboot and so grub can't find the
kernel and so everything fails....
> This flag being set on mount would imply incrementing the crash
> counter. It should be pretty easy for each filesystem to implement
> such flag and the counter but I agree it requires an on-disk
> format change.
Yup, anything we want that is persistent and consistent across
filesystems will need on-disk format changes. Hence we need a solid
specification first, not to mention tests to validate correct
behaviour across all filesystems in xfstests...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2017-04-04 12:34 UTC|newest]
Thread overview: 99+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-21 17:03 [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 01/30] lustre: don't set f_version in ll_readdir Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 02/30] ecryptfs: remove unnecessary i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 03/30] ceph: remove the bump of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 04/30] f2fs: don't bother setting i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 05/30] hpfs: don't bother with the i_version counter Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 06/30] jfs: remove initialization of " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 07/30] nilfs2: remove inode->i_version initialization Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 08/30] orangefs: remove initialization of i_version Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 09/30] reiserfs: remove unneeded i_version bump Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 10/30] ntfs: remove i_version handling Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 11/30] fs: new API for handling i_version Jeff Layton
2017-03-03 22:36 ` J. Bruce Fields
2017-03-04 0:09 ` Jeff Layton
2017-03-03 23:55 ` NeilBrown
2017-03-04 1:58 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 12/30] fat: convert to new i_version API Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 13/30] affs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 14/30] afs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 15/30] btrfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 16/30] exofs: switch " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 17/30] ext2: convert " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 18/30] ext4: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 19/30] nfs: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 20/30] nfsd: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 21/30] ocfs2: " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 22/30] ufs: use " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 23/30] xfs: convert to " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 24/30] IMA: switch IMA over " Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 25/30] fs: add a "force" parameter to inode_inc_iversion Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 26/30] fs: only set S_VERSION when updating times if it has been queried Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 27/30] xfs: avoid setting XFS_ILOG_CORE if i_version doesn't need incrementing Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 28/30] btrfs: only dirty the inode in btrfs_update_time if something was changed Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 29/30] fs: track whether the i_version has been queried with an i_state flag Jeff Layton
2017-03-04 0:03 ` NeilBrown
2017-03-04 0:43 ` Jeff Layton
2016-12-21 17:03 ` [RFC PATCH v1 30/30] fs: convert i_version counter over to an atomic64_t Jeff Layton
2016-12-21 17:03 ` Jeff Layton
2016-12-22 8:38 ` Amir Goldstein
2016-12-22 13:27 ` Jeff Layton
2017-03-04 0:00 ` NeilBrown
2017-03-04 0:00 ` NeilBrown
2016-12-22 8:45 ` [RFC PATCH v1 00/30] fs: inode->i_version rework and optimization Christoph Hellwig
2016-12-22 14:42 ` Jeff Layton
2017-03-20 21:43 ` J. Bruce Fields
2017-03-21 13:45 ` Christoph Hellwig
2017-03-21 16:30 ` J. Bruce Fields
2017-03-21 17:23 ` Jeff Layton
2017-03-21 17:37 ` J. Bruce Fields
2017-03-21 17:51 ` J. Bruce Fields
2017-03-21 18:30 ` J. Bruce Fields
2017-03-21 18:30 ` J. Bruce Fields
2017-03-21 18:46 ` Jeff Layton
2017-03-21 19:13 ` J. Bruce Fields
2017-03-21 21:54 ` Jeff Layton
2017-03-21 21:54 ` Jeff Layton
2017-03-29 11:15 ` Jan Kara
2017-03-29 17:54 ` Jeff Layton
2017-03-29 17:54 ` Jeff Layton
2017-03-29 23:41 ` Dave Chinner
2017-03-30 11:24 ` Jeff Layton
2017-04-04 18:38 ` J. Bruce Fields
2017-03-30 6:47 ` Jan Kara
2017-03-30 11:11 ` Jeff Layton
2017-03-30 16:12 ` J. Bruce Fields
2017-03-30 18:35 ` Jeff Layton
2017-03-30 21:11 ` Boaz Harrosh
2017-03-30 21:11 ` Boaz Harrosh
2017-04-04 18:31 ` J. Bruce Fields
2017-04-04 18:31 ` J. Bruce Fields
2017-04-05 1:43 ` NeilBrown
2017-04-05 8:05 ` Jan Kara
2017-04-05 18:14 ` J. Bruce Fields
2017-05-11 18:59 ` J. Bruce Fields
2017-05-11 22:22 ` NeilBrown
2017-05-12 16:21 ` J. Bruce Fields
2017-05-12 16:21 ` J. Bruce Fields
2017-10-30 13:21 ` Jeff Layton
2017-05-12 8:27 ` Jan Kara
2017-05-12 15:56 ` J. Bruce Fields
2017-05-12 11:01 ` Jeff Layton
2017-05-12 15:57 ` J. Bruce Fields
2017-04-06 1:12 ` NeilBrown
2017-04-06 1:12 ` NeilBrown
2017-04-06 1:12 ` NeilBrown
2017-04-06 7:22 ` Jan Kara
2017-04-05 17:26 ` J. Bruce Fields
2017-04-01 23:05 ` Dave Chinner
2017-04-03 14:00 ` Jan Kara
2017-04-04 12:34 ` Dave Chinner [this message]
2017-04-04 17:53 ` J. Bruce Fields
2017-04-04 17:53 ` J. Bruce Fields
2017-04-05 1:26 ` NeilBrown
2017-03-21 21:45 ` Dave Chinner
2017-03-22 19:53 ` Jeff Layton
2017-03-03 23:00 ` J. Bruce Fields
2017-03-03 23:00 ` J. Bruce Fields
2017-03-04 0:53 ` Jeff Layton
2017-03-08 17:29 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170404123414.GA23007@dastard \
--to=david@fromorbit.com \
--cc=bfields@fieldses.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jlayton@redhat.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.