public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lukas Czerner <lczerner@redhat.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: "Jan Kara" <jack@suse.cz>, "Darrick J. Wong" <djwong@kernel.org>,
	"Luís Henriques" <lhenriques@suse.de>,
	"Andreas Dilger" <adilger.kernel@dilger.ca>,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Jeroen van Wolffelaar" <jeroen@wolffelaar.nl>
Subject: Re: [PATCH v2] ext4: set csum seed in tmp inode while migrating to extents
Date: Fri, 17 Dec 2021 10:35:34 +0100	[thread overview]
Message-ID: <20211217093534.2ug6e5cm37md2c3u@work> (raw)
In-Reply-To: <YbuGLsQy6TSM2xOl@mit.edu>

On Thu, Dec 16, 2021 at 01:32:14PM -0500, Theodore Ts'o wrote:
> On Wed, Dec 15, 2021 at 03:12:37PM +0100, Lukas Czerner wrote:
> > > Run fsck of course! And then recover from backups :) I know this is sad but
> > > the situation is that our migration code just is not crash-safe (if we
> > > crash we are going to free blocks that are still used by the migrated
> > > inode) and Luis makes it work in case we do not crash (which should be
> > > hopefully more common) and documents it does not work in case we crash.
> > > So overall I'd call it a win.
> > > 
> > > But maybe we should just remove this online-migration functionality
> > > completely from the kernel? That would be also a fine solution for me. I
> > > was thinking whether we could somehow make the inode migration crash-safe
> > > but I didn't think of anything which would not require on-disk format
> > > change...
> > 
> > Since this is not something that anyone can honestly recommend doing
> > without a prior backup and a word of warning I personaly would be in favor
> > of removing it.
> 
> So there are a couple options that we could pursue:
> 
> 1) We could change the migrate code to stop putting the orphan inode
> on the orphan list.  If we do this, an crash in the middle of the
> migrate will, in the worst case (when the migration isn't completed
> within a single jbd2 transaction) result in a leaked inode.  That's
> not ideal, but it won't lead user data loss, and e2fsck will recover
> the situation by cloning the blocks, and leaving the inode in
> lost+found.
> 
> 2) We could try to ensure migration happens all within a single
> transaction, if they all fit inside a the inode structure, we allocate
> a tmp inode for all of the indirect blocks, attach the blocks to the
> tmp inode, place the tmp inode on the orphan list, and put all of that
> on a single handle, and then in a second handle, truncate the tmp
> inode to release the indirect blocks.  If we need to allocate extent
> tree blocks, then all of that would need to fit in a single
> transaction, and it's a bit more complicated, but it is doable.
> 
> 3) We can simply remove the inode migration feature by removing
> EXT4_EXTENTS_FL from EXT4_FL_USER_MODIFIABLE, and changing the
> implementation of the EXT4_IOC_MIGRATE ioctl to return EOPNOTSUPP, and
> then cleaning up the code paths that are now unreachable.
> 
> The migration feature is clearly less compelling than it was ten years
> ago, when ext4 was first introduced --- and most enterprise distros
> have never supported the feature even when it has existed.  Also on
> the plus side, we've never shipped a program to globally migrate a
> file system by using ioctl interface.
> 
> On the other hand, there may have been user shell scripts that have
> done something like "find /mntpt -type f -print0 | xargs -0 chattr +e {} \;"
> And so option #3 could be construed as "breaking userspace", especially
> without a deprecation window.
> 
> Furthermore, Option #1 is pretty simple to implement, and chances of a
> migration getting spread across two jbd2 commits is not actually
> pretty low.  And if it does happen, there would only be a single inode
> that would get its blocks cloned and attached to lost+found.
> 
> Thats being said, if we *did* option #1, in the long run we'd want to
> land a complete solution, which would either be something like option
> #2, or allocating a flag to give a hint to e2fsprogs that if it does
> find an leaked inode with with the flag set on the on-disk inode, that
> all it needs to do is to zero out the inode and be done with it.
> 
> So the question is, is it worth it to continue supporting the migrate
> feature, or should we just delete all of the migration code, and risk
> users complaining that we've broken their use case?  The chances of
> that happening is admittedly low, and Linus's rule that "it's only
> breaking userspace if a user complains" means we might very well get
> away with it.  :-)

That's a very good summary Ted, thanks.

Our rationale behind not supporting the migration was always the fact
that we felt that backup was absolutely necessary before operation like
this. When you already have up-to-date backup available you might as
well create a fresh ext4 file system with all the advantages it brings
and recover data from said backup. I think this is still a very
reasonable approach.

I have no doubt it was useful featrure in the early days of ext4, but I
think we're well past that. Any attempt to rework or fix the feature
assumes it's still useful and has it's users today and into the future.
I very much doubt that, so let's test it. Let's start a year long
deprecation window.

-Lukas

> 
> 						- Ted
> 


  reply	other threads:[~2021-12-17  9:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-14 17:50 [PATCH v2] ext4: set csum seed in tmp inode while migrating to extents Luís Henriques
2021-12-15  0:49 ` Darrick J. Wong
2021-12-15 10:46   ` Luís Henriques
2021-12-15 11:28   ` Jan Kara
2021-12-15 14:12     ` Lukas Czerner
2021-12-15 15:37       ` Luís Henriques
2021-12-16 11:23       ` Luís Henriques
2021-12-16 18:32       ` Theodore Ts'o
2021-12-17  9:35         ` Lukas Czerner [this message]
2021-12-28 22:40           ` Pavel Machek
2021-12-30  6:56             ` Theodore Ts'o
2021-12-17 15:09         ` Jeroen van Wolffelaar
2022-01-06  4:41 ` Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211217093534.2ug6e5cm37md2c3u@work \
    --to=lczerner@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=djwong@kernel.org \
    --cc=jack@suse.cz \
    --cc=jeroen@wolffelaar.nl \
    --cc=lhenriques@suse.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox