linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH RFC 0/13] fs: generic filesystem shutdown handling
Date: Thu, 8 Aug 2024 07:51:41 -0700	[thread overview]
Message-ID: <20240808145141.GC6043@frogsfrogsfrogs> (raw)
In-Reply-To: <ZrQA2/fkHdSReAcv@dread.disaster.area>

On Thu, Aug 08, 2024 at 09:18:51AM +1000, Dave Chinner wrote:
> On Wed, Aug 07, 2024 at 08:29:45PM +0200, Jan Kara wrote:
> > Hello,
> > 
> > this patch series implements generic handling of filesystem shutdown. The idea
> > is very simple: Have a superblock flag, which when set, will make VFS refuse
> > modifications to the filesystem. The patch series consists of several parts.
> > Patches 1-6 cleanup handling of SB_I_ flags which is currently messy (different
> > flags seem to have different locks protecting them although they are modified
> > by plain stores). Patches 7-12 gradually convert code to be able to handle
> > errors from sb_start_write() / sb_start_pagefault(). Patch 13 then shows how
> > filesystems can use this generic flag. Additionally, we could remove some
> > shutdown checks from within ext4 code and rely on checks in VFS but I didn't
> > want to complicate the series with ext4 specific things.
> 
> Overall this looks good. Two things that I noticed that we should
> nail down before anything else:
> 
> 1. The original definition of a 'shutdown filesystem' (i.e. from the
> XFS origins) is that a shutdown filesystem must *never* do -physical
> IO- after the shutdown is initiated. This is a protection mechanism
> for the underlying storage to prevent potential propagation of
> problems in the storage media once a serious issue has been
> detected. (e.g. suspect physical media can be made worse by
> continually trying to read it.) It also allows the block device to
> go away and we won't try to access issue new IO to it once the
> ->shutdown call has been complete.
> 
> IOWs, XFS implements a "no new IO after shutdown" architecture, and
> this is also largely what ext4 implements as well.

I don't think it quite does -- for EXT4_GOING_FLAGS_DEFAULT, it sets the
shutdown flag, but it doesn't actually abort the journal.  I think
that's an implementation bug since XFS /does/ shut down the log.

But looking at XFS_FSOP_GOING_FLAGS_DEFAULT, I also notice that if the
bdev_freeze fails, it returns 0 and the fs isn't shut down.  ext4, otoh,
actually does pass bdev_freeze's errno along.  I think ext4's behavior
is the correct one, right?

> However, this isn't what this generic shutdown infrastructure
> implements. It only prevents new user modifications from being
> started - it is effectively a "instant RO" mechanism rather than an
> "instant no more IO" architecture.

I thought pagefaults are still allowed on a shutdown xfs?  Curiously I
don't see a prohibition on write faults, but iirc we still allowed read
faults so that a shutdown on the rootfs doesn't immediately crash the
whole machine?

> Hence we have an impedence mismatch between existing shutdown
> implementations that currently return -EIO on shutdown for all
> operations (both read and write) and this generic implementation
> which returns -EROFS only for write operations.
> 
> Hence the proposed generic shutdown model doesn't really solve the
> inconsistent shutdown behaviour problem across filesystems - it just
> adds a new inconsistency between existing filesystem shutdown
> implementations and the generic infrastructure.
> 
> 2. On shutdown, this patchset returns -EROFS.
> 
> As per #1, returning -EROFS on shutdown will be a significant change
> of behaviour for some filesystems as they currently return -EIO when
> the filesystem is shut down.
> 
> I don't think -EROFS is right, because existing shutdown behaviour
> also impacts read-only operations and will return -EIO for them,
> too.
> 
> I think the error returned by a shutdown filesystem should always be
> consistent and that really means -EIO needs to be returned rather
> than -EROFS.
> 
> However, given this is new generic infrastructure, we can define a
> new error like -ESHUTDOWN (to reuse an existing errno) or even a
> new errno like -EFSSHUTDOWN for this, document it man pages and then
> convert all the existing filesystem shutdown checks to return this
> error instead of -EIO...

Agree.

> > Also, as Dave suggested, we can lift *_IOC_{SHUTDOWN|GOINGDOWN} ioctl handling
> > to VFS (currently in 5 filesystems) and just call new ->shutdown op for
> > the filesystem abort handling itself. But that is kind of independent thing
> > and this series is long enough as is.
> 
> Agreed - that can be done separately once we've sorted out the
> little details of what a shutdown filesystem actually means and how
> that gets reported consistently to userspace...

I would define it as:

No more writes to the filesystem or its underlying storage; file IO
and stat* calls return ESHUTDOWN; read faults still allowed.

I like the idea of hoisting this to the vfs and defining how one figures
out if the fs is shut down; thank you for working on this, Jan.

--D

> -Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> 

  parent reply	other threads:[~2024-08-08 14:51 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-07 18:29 [PATCH RFC 0/13] fs: generic filesystem shutdown handling Jan Kara
2024-08-07 18:29 ` [PATCH 01/13] fs: Define bit numbers for SB_I_ flags Jan Kara
2024-08-07 18:29 ` [PATCH 02/13] fs: Convert fs_context use of SB_I_ flags to new constants Jan Kara
2024-08-07 18:29 ` [PATCH 03/13] fs: Convert mount_too_revealing() to new s_iflags handling functions Jan Kara
2024-08-07 18:29 ` [PATCH 04/13] fs: Convert remaining usage of SB_I_ flags Jan Kara
2024-08-07 18:29 ` [PATCH 05/13] fs: Drop old SB_I_ constants Jan Kara
2024-08-07 18:29 ` [PATCH 06/13] fs: Drop unnecessary underscore from _SB_I_ constants Jan Kara
2024-08-08 11:47   ` Amir Goldstein
2024-08-08 14:35     ` Darrick J. Wong
2024-08-08 14:50       ` Christian Brauner
2024-08-08 17:34         ` Jan Kara
2024-08-07 18:29 ` [PATCH 07/13] overlayfs: Make ovl_start_write() return error Jan Kara
2024-08-08 12:01   ` Amir Goldstein
2024-08-07 18:29 ` [PATCH 08/13] fs: Teach callers of kiocb_start_write() to handle errors Jan Kara
2024-08-07 18:29 ` [PATCH 09/13] fs: Teach callers of file_start_write() " Jan Kara
2024-08-07 18:29 ` [PATCH 10/13] fs: Add __must_check annotations to sb_start_write_trylock() and similar Jan Kara
2024-08-07 18:29 ` [PATCH 11/13] fs: Make sb_start_write() return error on shutdown filesystem Jan Kara
2024-08-07 18:29 ` [PATCH 12/13] fs: Make sb_start_pagefault() " Jan Kara
2024-08-07 18:29 ` [PATCH 13/13] ext4: Replace EXT4_FLAGS_SHUTDOWN flag with a generic SB_I_SHUTDOWN Jan Kara
2024-08-07 23:18 ` [PATCH RFC 0/13] fs: generic filesystem shutdown handling Dave Chinner
2024-08-08 14:32   ` Jan Kara
2024-08-13 12:46     ` Christian Brauner
2024-08-14  0:09     ` Dave Chinner
2024-08-08 14:51   ` Darrick J. Wong [this message]
2024-08-09  2:30     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240808145141.GC6043@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).