From: Boris Burkov <boris@bur.io>
To: David Sterba <dsterba@suse.cz>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: do not clear read-only when adding sprout device
Date: Mon, 21 Oct 2024 12:29:31 -0700 [thread overview]
Message-ID: <20241021192931.GA2861189@zen.localdomain> (raw)
In-Reply-To: <20241021185651.GC24631@suse.cz>
On Mon, Oct 21, 2024 at 08:56:51PM +0200, David Sterba wrote:
> On Thu, Oct 17, 2024 at 09:41:59AM -0700, Boris Burkov wrote:
> > On Thu, Oct 17, 2024 at 04:01:12PM +0200, David Sterba wrote:
> > > On Tue, Oct 15, 2024 at 02:38:34PM -0700, Boris Burkov wrote:
> > > > If you follow the seed/sprout wiki, it suggests the following workflow:
> > > >
> > > > btrfstune -S 1 seed_dev
> > > > mount seed_dev mnt
> > > > btrfs device add sprout_dev
> > > > mount -o remount,rw mnt
> > > >
> > > > The first mount mounts the FS readonly, which results in not setting
> > > > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > > > somewhat surprisingly clears the readonly bit on the sb (though the
> > > > mount is still practically readonly, from the users perspective...).
> > > > Finally, the remount checks the readonly bit on the sb against the flag
> > > > and sees no change, so it does not run the code intended to run on
> > > > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> > > >
> > > > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > > > does no work. This results in leaking deleted snapshots until we run out
> > > > of space.
> > > >
> > > > I propose fixing it at the first departure from what feels reasonable:
> > > > when we clear the readonly bit on the sb during device add.
> > > >
> > > > A new fstest I have written reproduces the bug and confirms the fix.
> > > >
> > > > Signed-off-by: Boris Burkov <boris@bur.io>
> > > > ---
> > > > Note that this is a resend of an old unmerged fix:
> > > > https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> > > > Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> > > > were also explored but not merged around that time:
> > > > https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
> > > >
> > > > I don't have a strong preference, but I would really like to see this
> > > > trivial bug fixed. For what it is worth, we have been carrying this
> > > > patch internally at Meta since I first sent it with no incident.
> > >
> > > We have an unresolved dispute about the fix and now the patch got to
> > > for-next within a few days because it got two reviews but not mine.
> > > The way you use it in Meta works for you but the fix is changing
> > > behaviour so we can either ignore everybody else relying on the old
> > > way or say that seeding is so niche that we don't care and see what we
> > > can do once we get some report.
> >
> > Please feel free to remove it, and I am happy to review whatever other
> > fix you think is best. Sorry for rushing, I just wanted to get it done
> > and out of my head so I could move on to other issues.
>
> I'm concerned because change like that needs an announcement,
> documentation changes or eventually an optional change so both use cases
> are available. I haven't merged it since the last time you or somebody
> posted it because I don't see a way how to fix it without fixing the bug
> and not breaking the use case.
>
That's fair. However, I assume we agree we need some fix, somewhere.
Anyone who is really currently relying on this and not cycle mounting
after adding the device doesn't get a cleaner thread, which includes
empty block group cleanup, autorelocation, deleting snapshots, and defrag.
I think subvolume cleanup and block group cleanup are probably the worst
to lose.
So let's step back from me impulsively stuffing this in to for-next and
do the right thing together :)
Thinking of our options, in no particular order:
1a. just land the fix
It's not great if a lot of people rely on the fs "working" after device
add. Some of them might quickly find the proper steps in the docs, but
this might confuse/upset people.
1b. land the fix but properly socialize it
We can also make btrfs device add notice that it is seed sprout and add
a loud message that you need to remount rw. If this requires skipping a
merge or two while we socialize it to obvious or google-able users, that
seems reasonable.
1c. land the fix and make btrfs-progs do the remount after the device add
This is kind of weird, but does preserve the expected behavior. We could
also add some seed sprout specific flags or commands as the "main" way to
do it.
2. have the device add detect the seed sprout case and really make the
fs read-write
If we are committed to the behavior, and don't believe in a userspace
only fix, then it could be possible to make the current path which
simply clears the ro bit also invoke some or all of the remount rw
logic.
3. do nothing (remove the fix from for-next)
I keep carrying an internal patch forever, upstream stays in danger of
running an fs without a cleaner. If no one runs a long-lived fs off a live
cd for long enough to care, maybe it is ok, too. But stuff like this is
out there in google, for example:
https://blog.elastocloud.org/2022/11/btrfs-seed-devices-for-ab-system-updates.html
That's all I can think of, but I'm definitely open to other ideas!
Thanks,
Boris
> > I assume your concern is that before this fix, the filesystem is actually
> > read-write after the device add, which this patch breaks?
> >
> > My only argument against this is that the documentation has always said
> > you needed to remount/cycle-mount after adding the sprout, so there is
> > no fair reason to assume this would work. (In fact, it *doesn't*, the fs
> > is once again in a unexpected, degraded, state...)
> >
> > But if existing LiveCD users are relying on this undocumented behavior,
> > then I think you are right and it's a bad idea to break them.
>
> The problem here is that we don't know how the feature is used, the
> documentation came much later than the feature. So I take it as that
> people rely on code, not documenation, even if there's an undocumented
> behaviour.
next prev parent reply other threads:[~2024-10-21 19:30 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-15 21:38 [PATCH] btrfs: do not clear read-only when adding sprout device Boris Burkov
2024-10-15 22:00 ` Qu Wenruo
2024-10-15 22:12 ` Qu Wenruo
2024-10-15 23:23 ` Boris Burkov
2024-10-16 17:14 ` Anand Jain
2024-10-16 17:24 ` Boris Burkov
2024-10-17 20:47 ` Qu Wenruo
2024-10-18 11:54 ` Anand Jain
2024-10-17 14:01 ` David Sterba
2024-10-17 16:41 ` Boris Burkov
2024-10-21 18:56 ` David Sterba
2024-10-21 19:29 ` Boris Burkov [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-03-21 23:56 Boris Burkov
2022-03-22 21:46 ` Josef Bacik
2022-03-23 0:52 ` Naohiro Aota
2022-03-23 18:16 ` Boris Burkov
2022-03-28 11:11 ` Anand Jain
2022-03-29 4:33 ` Naohiro Aota
2022-03-29 19:45 ` Boris Burkov
2022-03-23 10:44 ` Anand Jain
2022-03-23 18:25 ` Boris Burkov
2022-03-24 11:16 ` Anand Jain
2022-03-23 20:17 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241021192931.GA2861189@zen.localdomain \
--to=boris@bur.io \
--cc=dsterba@suse.cz \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox