From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org, Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH RFC 0/13] fs: generic filesystem shutdown handling
Date: Wed, 14 Aug 2024 10:09:52 +1000 [thread overview]
Message-ID: <Zrv10AMSDwkPHl1U@dread.disaster.area> (raw)
In-Reply-To: <20240808143222.4m56qw5jujorqrfv@quack3>
On Thu, Aug 08, 2024 at 04:32:22PM +0200, Jan Kara wrote:
> On Thu 08-08-24 09:18:51, Dave Chinner wrote:
> > On Wed, Aug 07, 2024 at 08:29:45PM +0200, Jan Kara wrote:
> > > Hello,
> > >
> > > this patch series implements generic handling of filesystem shutdown. The idea
> > > is very simple: Have a superblock flag, which when set, will make VFS refuse
> > > modifications to the filesystem. The patch series consists of several parts.
> > > Patches 1-6 cleanup handling of SB_I_ flags which is currently messy (different
> > > flags seem to have different locks protecting them although they are modified
> > > by plain stores). Patches 7-12 gradually convert code to be able to handle
> > > errors from sb_start_write() / sb_start_pagefault(). Patch 13 then shows how
> > > filesystems can use this generic flag. Additionally, we could remove some
> > > shutdown checks from within ext4 code and rely on checks in VFS but I didn't
> > > want to complicate the series with ext4 specific things.
> >
> > Overall this looks good. Two things that I noticed that we should
> > nail down before anything else:
> >
> > 1. The original definition of a 'shutdown filesystem' (i.e. from the
> > XFS origins) is that a shutdown filesystem must *never* do -physical
> > IO- after the shutdown is initiated. This is a protection mechanism
> > for the underlying storage to prevent potential propagation of
> > problems in the storage media once a serious issue has been
> > detected. (e.g. suspect physical media can be made worse by
> > continually trying to read it.) It also allows the block device to
> > go away and we won't try to access issue new IO to it once the
> > ->shutdown call has been complete.
> >
> > IOWs, XFS implements a "no new IO after shutdown" architecture, and
> > this is also largely what ext4 implements as well.
>
> Thanks for sharing this. I wasn't aware that "no new IO after shutdown" is
> the goal. I knew this is required for modifications but I wasn't sure how
> strict this was for writes.
>
> > However, this isn't what this generic shutdown infrastructure
> > implements. It only prevents new user modifications from being
> > started - it is effectively a "instant RO" mechanism rather than an
> > "instant no more IO" architecture.
> >
> > Hence we have an impedence mismatch between existing shutdown
> > implementations that currently return -EIO on shutdown for all
> > operations (both read and write) and this generic implementation
> > which returns -EROFS only for write operations.
> >
> > Hence the proposed generic shutdown model doesn't really solve the
> > inconsistent shutdown behaviour problem across filesystems - it just
> > adds a new inconsistency between existing filesystem shutdown
> > implementations and the generic infrastructure.
>
> OK, understood. I also agree it would be good to keep this no-IO semantics
> when implementing the generic solution. I'm just pondering how to achieve
> that in a maintainable way. For the write path what I've done looks like
> the least painful way. For the read path the simplest is probably to still
> return whatever is in cache and just do the check + error return somewhere
> down in the call stack just before calling into filesystem. It is easy
> enough to stop things like ->read_folio, ->readahead, or ->lookup. But how
> about things like ->evict_inode or ->release?
If the filesystem is shut down, inode eviction or releasing a FD
should not be doing any IO at all - they should just be releasing
in-memory resources and freeing the objects being released. e.g. we
don't process unlinked inodes when the filesystem is shut down; they
remain as unlinked on disk and recovery gets to clean up the mess.
i.e. we process all inodes as if they were clean, linked inodes and
just tear down the in-memory structures attached to the inode.
i.e. shutdown isn't concerned about keeping anything consistent
either in memory or on disk - it's concerned only about releasing
in-memory resources such that the filesystem can be unmounted
without doing any IO at all. e.g. ext4_evict_inode() needs to treat
all unlinked inodes as if they are bad when the filesystem is shut
down. XFS does this (see the shutdown check in
xfs_inode_needs_inactive()) and every filesystem that does unlinked
inode processing in inode eviction will need similar modifications.
Yes, this means a "shutdown means no IO" model cannot be exclusively
implemented at the VFS - it will need things like filesystems with
customised inode eviction callouts to handle these cases themselves.
> They can trigger IO but
> allowing inode reclaim on shutdown fs is desirable I'd say. Similarly for
> things like ->remount_fs or ->put_super. So avoiding IO from operations
> like these would rely on fs implementation anyway.
remounts need to follow the fundamental rule of shutdowns: you can't
change the state of a shutdown filesystem -at all- because any
operation on a shutdown filesystem should be immediately failed. The
only thing you can reliably do once a filesystem is shut down is
unmount it. IOWs, the VFS should return -EIO when a remount is
requested on a shutdown filesystem, and the filesystem code then
doesn't have to care.
As for ->put_super(), this should act as if the filesystem is clean
when the filesystem is shut down as everything that is dirty in
memory will never get cleaned. IOWs, once shutdown has been set,
dirty state should be completely ignored by everything and so object
release/eviction should tear everything down regardless of it's
state.
Supporting a "no-IO shutdown" model properly will require filesystem
specific changes to handle, but that's really implementation details
more than anything else. What we need to do first is define
and document exactly what shutdown means and how the VFS and
filesystems need to operate when that bit is set. Then we have a
clear framework from which we can consistently answer "what should
filesystem X do in this situation" issues that arise...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2024-08-14 0:09 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-07 18:29 [PATCH RFC 0/13] fs: generic filesystem shutdown handling Jan Kara
2024-08-07 18:29 ` [PATCH 01/13] fs: Define bit numbers for SB_I_ flags Jan Kara
2024-08-07 18:29 ` [PATCH 02/13] fs: Convert fs_context use of SB_I_ flags to new constants Jan Kara
2024-08-07 18:29 ` [PATCH 03/13] fs: Convert mount_too_revealing() to new s_iflags handling functions Jan Kara
2024-08-07 18:29 ` [PATCH 04/13] fs: Convert remaining usage of SB_I_ flags Jan Kara
2024-08-07 18:29 ` [PATCH 05/13] fs: Drop old SB_I_ constants Jan Kara
2024-08-07 18:29 ` [PATCH 06/13] fs: Drop unnecessary underscore from _SB_I_ constants Jan Kara
2024-08-08 11:47 ` Amir Goldstein
2024-08-08 14:35 ` Darrick J. Wong
2024-08-08 14:50 ` Christian Brauner
2024-08-08 17:34 ` Jan Kara
2024-08-07 18:29 ` [PATCH 07/13] overlayfs: Make ovl_start_write() return error Jan Kara
2024-08-08 12:01 ` Amir Goldstein
2024-08-07 18:29 ` [PATCH 08/13] fs: Teach callers of kiocb_start_write() to handle errors Jan Kara
2024-08-07 18:29 ` [PATCH 09/13] fs: Teach callers of file_start_write() " Jan Kara
2024-08-07 18:29 ` [PATCH 10/13] fs: Add __must_check annotations to sb_start_write_trylock() and similar Jan Kara
2024-08-07 18:29 ` [PATCH 11/13] fs: Make sb_start_write() return error on shutdown filesystem Jan Kara
2024-08-07 18:29 ` [PATCH 12/13] fs: Make sb_start_pagefault() " Jan Kara
2024-08-07 18:29 ` [PATCH 13/13] ext4: Replace EXT4_FLAGS_SHUTDOWN flag with a generic SB_I_SHUTDOWN Jan Kara
2024-08-07 23:18 ` [PATCH RFC 0/13] fs: generic filesystem shutdown handling Dave Chinner
2024-08-08 14:32 ` Jan Kara
2024-08-13 12:46 ` Christian Brauner
2024-08-14 0:09 ` Dave Chinner [this message]
2024-08-08 14:51 ` Darrick J. Wong
2024-08-09 2:30 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zrv10AMSDwkPHl1U@dread.disaster.area \
--to=david@fromorbit.com \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).