public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Boris Burkov <boris@bur.io>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH] btrfs: do not clear read-only when adding sprout device
Date: Wed, 16 Oct 2024 10:24:04 -0700	[thread overview]
Message-ID: <20241016172404.GA2521545@zen.localdomain> (raw)
In-Reply-To: <694552b3-5f70-48d2-a62f-4c2b8caf10fd@oracle.com>

On Thu, Oct 17, 2024 at 01:14:16AM +0800, Anand Jain wrote:
> On 16/10/24 05:38, Boris Burkov wrote:
> > If you follow the seed/sprout wiki, it suggests the following workflow:
> > 
> > btrfstune -S 1 seed_dev
> > mount seed_dev mnt
> > btrfs device add sprout_dev
> > mount -o remount,rw mnt
> > 
> 
> 
> 
> > The first mount mounts the FS readonly, which results in not setting
> > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > somewhat surprisingly clears the readonly bit on the sb (though the
> > mount is still practically readonly, from the users perspective...).
> > Finally, the remount checks the readonly bit on the sb against the flag
> > and sees no change, so it does not run the code intended to run on
> > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> > 
> > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > does no work. This results in leaking deleted snapshots until we run out
> > of space.
> > 
> > I propose fixing it at the first departure from what feels reasonable:
> > when we clear the readonly bit on the sb during device add.
> > 
> > A new fstest I have written reproduces the bug and confirms the fix.
> > 
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > Note that this is a resend of an old unmerged fix:
> > https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> > Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> > were also explored but not merged around that time:
> > https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
> > 
> > I don't have a strong preference, but I would really like to see this
> > trivial bug fixed. For what it is worth, we have been carrying this
> > patch internally at Meta since I first sent it with no incident.
> > ---
> 
> 
> I remember fixing this before. I tested on 5.15, and the bug isn't
> there, but it’s back in 6.10, so something broke in between.
> We need to track it down.

Thanks for weighing in again, and for re-testing on 5.15. That's
interesting that it broke again. And sorry if I didn't follow the rdonly
check patches properly and those did end up getting merged. Poor code
archaeology on my part :)

At any rate, I have pushed this patch into for-next for now, as I
think it constitutes an improvement without breaking any documented
behavior. If you look into what happened between 5.15 and 6.11 and want
to back it out with a different fix, I will not be offended.

If we also land the fstest I submitted, then hopefully future kernels
will *not* be breaking this again!

Thanks,
Boris

> 
> The original design (kernel 4.x and below) makes the filesystem switch
> to read-write mode after adding a sprout because:
> 
>    You can’t add a device to a normal read-only filesystem
>  so with seed read-only mount is different.
>    With a seed device, adding a writable device transforms
>  it into a new read-write filesystem with a _new_ FSID and
>  fs_devices. Logically, read-write at this stage makes sense,
>  but I’m okay without it and in fact we had fixed this before,
>  but a patch somewhere seems to have broken it again.
> 
> 
> (Demo below. :<x> is the return code from the 'run' command at
>  https://github.com/asj/run.git)
> 
> 
> ----- 5.15.0-208.159.3.2.el9uek.x86_64 ----
> $ mkfs.btrfs -fq /dev/loop0 :0
> $ btrfstune -S1 /dev/loop0 :0
> $ mount /dev/loop0 /btrfs :0
> mount: /btrfs: WARNING: source write-protected, mounted read-only.
> 
> $ cat /proc/self/mounts | grep btrfs :0
> /dev/loop0 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
> 
> $ findmnt -o SOURCE,UUID /btrfs :0
> SOURCE     UUID
> /dev/loop0 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
> 
> $ btrfs fi show -m :0
> Label: none  uuid: 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
> 	Total devices 1 FS bytes used 144.00KiB
> 	devid    1 size 3.00GiB used 536.00MiB path /dev/loop0
> 
> $ ls /sys/fs/btrfs :0
> 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
> features
> 
> $ btrfs dev add -f /dev/loop1 /btrfs :0
> 
> # After adding the device, the path and UUID are different,
> # so it’s a new filesystem. (But, as I said, I’m fine with
> # keeping it read-only and needing remount,rw.
> 
> $ cat /proc/self/mounts | grep btrfs :0
> /dev/loop1 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
> 
> $ findmnt -o SOURCE,UUID /btrfs :0
> SOURCE     UUID
> /dev/loop1 948cea35-18db-45da-9ec8-3d46cb5f0413
> 
> $ btrfs fi show -m :0
> Label: none  uuid: 948cea35-18db-45da-9ec8-3d46cb5f0413
> 	Total devices 2 FS bytes used 144.00KiB
> 	devid    1 size 3.00GiB used 520.00MiB path /dev/loop0
> 	devid    2 size 3.00GiB used 576.00MiB path /dev/loop1
> 
> 
> $ ls /sys/fs/btrfs :0
> 948cea35-18db-45da-9ec8-3d46cb5f0413
> features
> ---------
> 
> 
> Thanks, Anand
> 
> >   fs/btrfs/volumes.c | 4 ----
> >   1 file changed, 4 deletions(-)
> > 
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index dc9f54849f39..84e861dcb350 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> >   	set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
> >   	if (seeding_dev) {
> > -		btrfs_clear_sb_rdonly(sb);
> > -
> >   		/* GFP_KERNEL allocation must not be under device_list_mutex */
> >   		seed_devices = btrfs_init_sprout(fs_info);
> >   		if (IS_ERR(seed_devices)) {
> > @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> >   	mutex_unlock(&fs_info->chunk_mutex);
> >   	mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> >   error_trans:
> > -	if (seeding_dev)
> > -		btrfs_set_sb_rdonly(sb);
> >   	if (trans)
> >   		btrfs_end_transaction(trans);
> >   error_free_zone:
> 

  reply	other threads:[~2024-10-16 17:24 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-15 21:38 [PATCH] btrfs: do not clear read-only when adding sprout device Boris Burkov
2024-10-15 22:00 ` Qu Wenruo
2024-10-15 22:12   ` Qu Wenruo
2024-10-15 23:23     ` Boris Burkov
2024-10-16 17:14 ` Anand Jain
2024-10-16 17:24   ` Boris Burkov [this message]
2024-10-17 20:47   ` Qu Wenruo
2024-10-18 11:54     ` Anand Jain
2024-10-17 14:01 ` David Sterba
2024-10-17 16:41   ` Boris Burkov
2024-10-21 18:56     ` David Sterba
2024-10-21 19:29       ` Boris Burkov
  -- strict thread matches above, loose matches on Subject: below --
2022-03-21 23:56 Boris Burkov
2022-03-22 21:46 ` Josef Bacik
2022-03-23  0:52 ` Naohiro Aota
2022-03-23 18:16   ` Boris Burkov
2022-03-28 11:11     ` Anand Jain
2022-03-29  4:33       ` Naohiro Aota
2022-03-29 19:45         ` Boris Burkov
2022-03-23 10:44 ` Anand Jain
2022-03-23 18:25   ` Boris Burkov
2022-03-24 11:16     ` Anand Jain
2022-03-23 20:17   ` Josef Bacik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241016172404.GA2521545@zen.localdomain \
    --to=boris@bur.io \
    --cc=anand.jain@oracle.com \
    --cc=kernel-team@fb.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox