* [PATCH] btrfs: do not clear read-only when adding sprout device
@ 2022-03-21 23:56 Boris Burkov
2022-03-22 21:46 ` Josef Bacik
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Boris Burkov @ 2022-03-21 23:56 UTC (permalink / raw)
To: linux-btrfs, kernel-team
If you follow the seed/sprout wiki, it suggests the following workflow:
btrfstune -S 1 seed_dev
mount seed_dev mnt
btrfs device add sprout_dev
mount -o remount,rw mnt
The first mount mounts the FS readonly, which results in not setting
BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
somewhat surprisingly clears the readonly bit on the sb (though the
mount is still practically readonly, from the users perspective...).
Finally, the remount checks the readonly bit on the sb against the flag
and sees no change, so it does not run the code intended to run on
ro->rw transitions, leaving BTRFS_FS_OPEN unset.
As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
does no work. This results in leaking deleted snapshots until we run out
of space.
I propose fixing it at the first departure from what feels reasonable:
when we clear the readonly bit on the sb during device add. I have a
reproducer of the issue here:
https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
and confirm that this patch fixes it, and seems to work OK, otherwise. I
will admit that I couldn't dig up the original rationale for clearing
the bit here (it dates back to the original seed/sprout commit without
explicit explanation) so it's hard to imagine all the ramifications of
the change.
Signed-off-by: Boris Burkov <boris@bur.io>
---
fs/btrfs/volumes.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 3fd17e87815a..75d7eeb26fe6 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
if (seeding_dev) {
- btrfs_clear_sb_rdonly(sb);
-
/* GFP_KERNEL allocation must not be under device_list_mutex */
seed_devices = btrfs_init_sprout(fs_info);
if (IS_ERR(seed_devices)) {
@@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
mutex_unlock(&fs_info->chunk_mutex);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
error_trans:
- if (seeding_dev)
- btrfs_set_sb_rdonly(sb);
if (trans)
btrfs_end_transaction(trans);
error_free_zone:
--
2.30.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-21 23:56 Boris Burkov
@ 2022-03-22 21:46 ` Josef Bacik
2022-03-23 0:52 ` Naohiro Aota
2022-03-23 10:44 ` Anand Jain
2 siblings, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2022-03-22 21:46 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On Mon, Mar 21, 2022 at 04:56:17PM -0700, Boris Burkov wrote:
> If you follow the seed/sprout wiki, it suggests the following workflow:
>
> btrfstune -S 1 seed_dev
> mount seed_dev mnt
> btrfs device add sprout_dev
> mount -o remount,rw mnt
>
> The first mount mounts the FS readonly, which results in not setting
> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> somewhat surprisingly clears the readonly bit on the sb (though the
> mount is still practically readonly, from the users perspective...).
> Finally, the remount checks the readonly bit on the sb against the flag
> and sees no change, so it does not run the code intended to run on
> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>
> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> does no work. This results in leaking deleted snapshots until we run out
> of space.
>
> I propose fixing it at the first departure from what feels reasonable:
> when we clear the readonly bit on the sb during device add. I have a
> reproducer of the issue here:
> https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
> and confirm that this patch fixes it, and seems to work OK, otherwise. I
> will admit that I couldn't dig up the original rationale for clearing
> the bit here (it dates back to the original seed/sprout commit without
> explicit explanation) so it's hard to imagine all the ramifications of
> the change.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
Can we get this turned into a fstest please?
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Thanks,
Josef
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-21 23:56 Boris Burkov
2022-03-22 21:46 ` Josef Bacik
@ 2022-03-23 0:52 ` Naohiro Aota
2022-03-23 18:16 ` Boris Burkov
2022-03-23 10:44 ` Anand Jain
2 siblings, 1 reply; 23+ messages in thread
From: Naohiro Aota @ 2022-03-23 0:52 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
On Mon, Mar 21, 2022 at 04:56:17PM -0700, Boris Burkov wrote:
> If you follow the seed/sprout wiki, it suggests the following workflow:
>
> btrfstune -S 1 seed_dev
> mount seed_dev mnt
> btrfs device add sprout_dev
> mount -o remount,rw mnt
>
> The first mount mounts the FS readonly, which results in not setting
> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> somewhat surprisingly clears the readonly bit on the sb (though the
> mount is still practically readonly, from the users perspective...).
> Finally, the remount checks the readonly bit on the sb against the flag
> and sees no change, so it does not run the code intended to run on
> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>
> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> does no work. This results in leaking deleted snapshots until we run out
> of space.
>
> I propose fixing it at the first departure from what feels reasonable:
> when we clear the readonly bit on the sb during device add. I have a
> reproducer of the issue here:
> https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
> and confirm that this patch fixes it, and seems to work OK, otherwise. I
> will admit that I couldn't dig up the original rationale for clearing
> the bit here (it dates back to the original seed/sprout commit without
> explicit explanation) so it's hard to imagine all the ramifications of
> the change.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/volumes.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3fd17e87815a..75d7eeb26fe6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>
> if (seeding_dev) {
> - btrfs_clear_sb_rdonly(sb);
> -
After this line, it updates the metadata e.g, with
init_first_rw_device() and writes them out with
btrfs_commit_transaction(). Is that OK to do so with the SB_RDONLY
flag set?
> /* GFP_KERNEL allocation must not be under device_list_mutex */
> seed_devices = btrfs_init_sprout(fs_info);
> if (IS_ERR(seed_devices)) {
> @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> mutex_unlock(&fs_info->chunk_mutex);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> error_trans:
> - if (seeding_dev)
> - btrfs_set_sb_rdonly(sb);
> if (trans)
> btrfs_end_transaction(trans);
> error_free_zone:
> --
> 2.30.2
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-21 23:56 Boris Burkov
2022-03-22 21:46 ` Josef Bacik
2022-03-23 0:52 ` Naohiro Aota
@ 2022-03-23 10:44 ` Anand Jain
2022-03-23 18:25 ` Boris Burkov
2022-03-23 20:17 ` Josef Bacik
2 siblings, 2 replies; 23+ messages in thread
From: Anand Jain @ 2022-03-23 10:44 UTC (permalink / raw)
To: Boris Burkov, linux-btrfs, kernel-team
On 22/03/2022 07:56, Boris Burkov wrote:
> If you follow the seed/sprout wiki, it suggests the following workflow:
>
> btrfstune -S 1 seed_dev
> mount seed_dev mnt
> btrfs device add sprout_dev
> mount -o remount,rw mnt
or
umount mnt
mount sprout mnt
> The first mount mounts the FS readonly, which results in not setting
> BTRFS_FS_OPEN, and setting the readonly bit on the sb.
Why not set the BTRFS_FS_OPEN?
@@ -3904,8 +3904,11 @@ int __cold open_ctree(struct super_block *sb,
struct btrfs_fs_devices *fs_device
goto fail_qgroup;
}
- if (sb_rdonly(sb))
+ if (sb_rdonly(sb)) {
+ btrfs_set_sb_rdonly(sb);
+ set_bit(BTRFS_FS_OPEN, &fs_info->flags);
goto clear_oneshot;
+ }
ret = btrfs_start_pre_rw_mount(fs_info);
if (ret) {
> The device add
> somewhat surprisingly clears the readonly bit on the sb (though the
> mount is still practically readonly, from the users perspective...).
> Finally, the remount checks the readonly bit on the sb against the flag
> and sees no change, so it does not run the code intended to run on
> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
Originally, the step 'btrfs device add sprout_dev' provided seed
fs writeable without a remount.
I think the btrfs_clear_sb_rdonly(sb) in btrfs_init_new_device()
was part of it.
Removing it doesn't seem to affect the seed-sprout functionality
(did I miss anything?) either the -o remount,rw
or mount recycle will get it writeable.
> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> does no work. This results in leaking deleted snapshots until we run out
> of space.
> I propose fixing it at the first departure from what feels reasonable:
> when we clear the readonly bit on the sb during device add. I have a
> reproducer of the issue here:
> https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
> and confirm that this patch fixes it, and seems to work OK, otherwise. I
> will admit that I couldn't dig up the original rationale for clearing
> the bit here (it dates back to the original seed/sprout commit without
> explicit explanation) so it's hard to imagine all the ramifications of
> the change.
We got fstests -g seed to test the seed-sprout stuff. Your test case
here fits in it. IMO.
Thanks, Anand
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> fs/btrfs/volumes.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 3fd17e87815a..75d7eeb26fe6 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>
> if (seeding_dev) {
> - btrfs_clear_sb_rdonly(sb);
> -
> /* GFP_KERNEL allocation must not be under device_list_mutex */
> seed_devices = btrfs_init_sprout(fs_info);
> if (IS_ERR(seed_devices)) {
> @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> mutex_unlock(&fs_info->chunk_mutex);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> error_trans:
> - if (seeding_dev)
> - btrfs_set_sb_rdonly(sb);
> if (trans)
> btrfs_end_transaction(trans);
> error_free_zone:
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-23 0:52 ` Naohiro Aota
@ 2022-03-23 18:16 ` Boris Burkov
2022-03-28 11:11 ` Anand Jain
0 siblings, 1 reply; 23+ messages in thread
From: Boris Burkov @ 2022-03-23 18:16 UTC (permalink / raw)
To: Naohiro Aota; +Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
On Wed, Mar 23, 2022 at 12:52:15AM +0000, Naohiro Aota wrote:
> On Mon, Mar 21, 2022 at 04:56:17PM -0700, Boris Burkov wrote:
> > If you follow the seed/sprout wiki, it suggests the following workflow:
> >
> > btrfstune -S 1 seed_dev > > mount seed_dev mnt
> > btrfs device add sprout_dev
> > mount -o remount,rw mnt
> >
> > The first mount mounts the FS readonly, which results in not setting
> > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > somewhat surprisingly clears the readonly bit on the sb (though the
> > mount is still practically readonly, from the users perspective...).
> > Finally, the remount checks the readonly bit on the sb against the flag
> > and sees no change, so it does not run the code intended to run on
> > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> >
> > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > does no work. This results in leaking deleted snapshots until we run out
> > of space.
> >
> > I propose fixing it at the first departure from what feels reasonable:
> > when we clear the readonly bit on the sb during device add. I have a
> > reproducer of the issue here:
> > https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
> > and confirm that this patch fixes it, and seems to work OK, otherwise. I
> > will admit that I couldn't dig up the original rationale for clearing
> > the bit here (it dates back to the original seed/sprout commit without
> > explicit explanation) so it's hard to imagine all the ramifications of
> > the change.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > fs/btrfs/volumes.c | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 3fd17e87815a..75d7eeb26fe6 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
> >
> > if (seeding_dev) {
> > - btrfs_clear_sb_rdonly(sb);
> > -
>
> After this line, it updates the metadata e.g, with
> init_first_rw_device() and writes them out with
> btrfs_commit_transaction(). Is that OK to do so with the SB_RDONLY
> flag set?
Good question. As far as I can tell, the functions don't explicitly
check sb_rdonly, though that could be because they expect that to be
checked before you ever try to commit a transaction, for example..
If there is an issue, it's probably somewhat subtle, because the basic
behavior does work.
>
> > /* GFP_KERNEL allocation must not be under device_list_mutex */
> > seed_devices = btrfs_init_sprout(fs_info);
> > if (IS_ERR(seed_devices)) {
> > @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > mutex_unlock(&fs_info->chunk_mutex);
> > mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > error_trans:
> > - if (seeding_dev)
> > - btrfs_set_sb_rdonly(sb);
> > if (trans)
> > btrfs_end_transaction(trans);
> > error_free_zone:
> > --
> > 2.30.2
> >
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-23 10:44 ` Anand Jain
@ 2022-03-23 18:25 ` Boris Burkov
2022-03-24 11:16 ` Anand Jain
2022-03-23 20:17 ` Josef Bacik
1 sibling, 1 reply; 23+ messages in thread
From: Boris Burkov @ 2022-03-23 18:25 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs, kernel-team
On Wed, Mar 23, 2022 at 06:44:42PM +0800, Anand Jain wrote:
> On 22/03/2022 07:56, Boris Burkov wrote:
> > If you follow the seed/sprout wiki, it suggests the following workflow:
> >
> > btrfstune -S 1 seed_dev
> > mount seed_dev mnt
> > btrfs device add sprout_dev
>
> > mount -o remount,rw mnt
> or
> umount mnt
> mount sprout mnt
Agreed. FWIW, I tested that umount/mount is not vulnerable to the bug.
However, a user might be using one of the documented workflows anyway,
and would need it for an fs they add a sprout to while it is in use.
>
> > The first mount mounts the FS readonly, which results in not setting
> > BTRFS_FS_OPEN, and setting the readonly bit on the sb.
>
> Why not set the BTRFS_FS_OPEN?
>
One reason is that there is other logic that runs when transitioning
from ro->rw in remount besides just setting BTRFS_FS_OPEN. By not
improperly clearing ro, we let that logic do its thing naturally.
> @@ -3904,8 +3904,11 @@ int __cold open_ctree(struct super_block *sb, struct
> btrfs_fs_devices *fs_device
> goto fail_qgroup;
> }
>
> - if (sb_rdonly(sb))
> + if (sb_rdonly(sb)) {
> + btrfs_set_sb_rdonly(sb);
> + set_bit(BTRFS_FS_OPEN, &fs_info->flags);
> goto clear_oneshot;
> + }
>
> ret = btrfs_start_pre_rw_mount(fs_info);
> if (ret) {
>
> > The device add
> > somewhat surprisingly clears the readonly bit on the sb (though the
> > mount is still practically readonly, from the users perspective...).
> > Finally, the remount checks the readonly bit on the sb against the flag
> > and sees no change, so it does not run the code intended to run on
> > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>
> Originally, the step 'btrfs device add sprout_dev' provided seed
> fs writeable without a remount.
That is very interesting. Do you remember why we changed this behavior
to the current behavior of leaving the seed readonly until the
remount/mount cycle?
>
> I think the btrfs_clear_sb_rdonly(sb) in btrfs_init_new_device()
> was part of it.
>
> Removing it doesn't seem to affect the seed-sprout functionality
> (did I miss anything?) either the -o remount,rw
> or mount recycle will get it writeable.
My current understanding (probably flawed..):
before this patch: we clear the rdonly bit, but the fs is still readonly
until remount (should figure out exactly how)
after this patch: behavior is the same, except the rdonly bit gets
cleared along with the mounting, not the device add.
>
> > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > does no work. This results in leaking deleted snapshots until we run out
> > of space.
>
>
> > I propose fixing it at the first departure from what feels reasonable:
> > when we clear the readonly bit on the sb during device add. I have a
> > reproducer of the issue here:
> > https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
> > and confirm that this patch fixes it, and seems to work OK, otherwise. I
> > will admit that I couldn't dig up the original rationale for clearing
> > the bit here (it dates back to the original seed/sprout commit without
> > explicit explanation) so it's hard to imagine all the ramifications of
> > the change.
>
> We got fstests -g seed to test the seed-sprout stuff. Your test case
> here fits in it. IMO.
Thanks for the tip, I'll add it there, regardless of how we fix it.
>
> Thanks, Anand
>
>
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > fs/btrfs/volumes.c | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 3fd17e87815a..75d7eeb26fe6 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
> > if (seeding_dev) {
> > - btrfs_clear_sb_rdonly(sb);
> > -
> > /* GFP_KERNEL allocation must not be under device_list_mutex */
> > seed_devices = btrfs_init_sprout(fs_info);
> > if (IS_ERR(seed_devices)) {
> > @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > mutex_unlock(&fs_info->chunk_mutex);
> > mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > error_trans:
> > - if (seeding_dev)
> > - btrfs_set_sb_rdonly(sb);
> > if (trans)
> > btrfs_end_transaction(trans);
> > error_free_zone:
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-23 10:44 ` Anand Jain
2022-03-23 18:25 ` Boris Burkov
@ 2022-03-23 20:17 ` Josef Bacik
1 sibling, 0 replies; 23+ messages in thread
From: Josef Bacik @ 2022-03-23 20:17 UTC (permalink / raw)
To: Anand Jain; +Cc: Boris Burkov, linux-btrfs, kernel-team
On Wed, Mar 23, 2022 at 06:44:42PM +0800, Anand Jain wrote:
> On 22/03/2022 07:56, Boris Burkov wrote:
> > If you follow the seed/sprout wiki, it suggests the following workflow:
> >
> > btrfstune -S 1 seed_dev
> > mount seed_dev mnt
> > btrfs device add sprout_dev
>
> > mount -o remount,rw mnt
> or
> umount mnt
> mount sprout mnt
>
> > The first mount mounts the FS readonly, which results in not setting
> > BTRFS_FS_OPEN, and setting the readonly bit on the sb.
>
> Why not set the BTRFS_FS_OPEN?
>
> @@ -3904,8 +3904,11 @@ int __cold open_ctree(struct super_block *sb, struct
> btrfs_fs_devices *fs_device
> goto fail_qgroup;
> }
>
> - if (sb_rdonly(sb))
> + if (sb_rdonly(sb)) {
> + btrfs_set_sb_rdonly(sb);
> + set_bit(BTRFS_FS_OPEN, &fs_info->flags);
> goto clear_oneshot;
> + }
>
> ret = btrfs_start_pre_rw_mount(fs_info);
> if (ret) {
>
> > The device add
> > somewhat surprisingly clears the readonly bit on the sb (though the
> > mount is still practically readonly, from the users perspective...).
> > Finally, the remount checks the readonly bit on the sb against the flag
> > and sees no change, so it does not run the code intended to run on
> > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>
> Originally, the step 'btrfs device add sprout_dev' provided seed
> fs writeable without a remount.
>
> I think the btrfs_clear_sb_rdonly(sb) in btrfs_init_new_device()
> was part of it.
>
Yeah this was a bad idea, I don't want to randomly flip a mount from ro->rw
without the user telling us to. Thanks,
Josef
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-23 18:25 ` Boris Burkov
@ 2022-03-24 11:16 ` Anand Jain
0 siblings, 0 replies; 23+ messages in thread
From: Anand Jain @ 2022-03-24 11:16 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On 24/03/2022 02:25, Boris Burkov wrote:
> On Wed, Mar 23, 2022 at 06:44:42PM +0800, Anand Jain wrote:
>> On 22/03/2022 07:56, Boris Burkov wrote:
>>> If you follow the seed/sprout wiki, it suggests the following workflow:
>>>
>>> btrfstune -S 1 seed_dev
>>> mount seed_dev mnt
>>> btrfs device add sprout_dev
>>
>>> mount -o remount,rw mnt
>> or
>> umount mnt
>> mount sprout mnt
>
> Agreed. FWIW, I tested that umount/mount is not vulnerable to the bug.
> However, a user might be using one of the documented workflows anyway,
> and would need it for an fs they add a sprout to while it is in use.
Yep.
>>
>>> The first mount mounts the FS readonly, which results in not setting
>>> BTRFS_FS_OPEN, and setting the readonly bit on the sb.
>>
>> Why not set the BTRFS_FS_OPEN?
>>
>
> One reason is that there is other logic that runs when transitioning
> from ro->rw in remount besides just setting BTRFS_FS_OPEN. By not
> improperly clearing ro, we let that logic do its thing naturally.
>
>> @@ -3904,8 +3904,11 @@ int __cold open_ctree(struct super_block *sb, struct
>> btrfs_fs_devices *fs_device
>> goto fail_qgroup;
>> }
>>
>> - if (sb_rdonly(sb))
>> + if (sb_rdonly(sb)) {
>> + btrfs_set_sb_rdonly(sb);
>> + set_bit(BTRFS_FS_OPEN, &fs_info->flags);
Looked more deep. No. I misunderstood BTRFS_FS_OPEN, it implies
open_ctree() is complete and read-writeable. So this won't work.
>> goto clear_oneshot;
>> + }
>>
>> ret = btrfs_start_pre_rw_mount(fs_info);
>> if (ret) {
>>
>>> The device add
>>> somewhat surprisingly clears the readonly bit on the sb (though the
>>> mount is still practically readonly, from the users perspective...).
>>> Finally, the remount checks the readonly bit on the sb against the flag
>>> and sees no change, so it does not run the code intended to run on
>>> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>>
>> Originally, the step 'btrfs device add sprout_dev' provided seed
>> fs writeable without a remount.
>
> That is very interesting. Do you remember why we changed this behavior
> to the current behavior of leaving the seed readonly until the
> remount/mount cycle?
So far, I couldn't find the commit that changed it.
Perhaps Josef may have an idea?
More below.
>> I think the btrfs_clear_sb_rdonly(sb) in btrfs_init_new_device()
>> was part of it.
>>
>> Removing it doesn't seem to affect the seed-sprout functionality
>> (did I miss anything?) either the -o remount,rw
>> or mount recycle will get it writeable.
>
> My current understanding (probably flawed..):
> before this patch: we clear the rdonly bit, but the fs is still readonly
> until remount (should figure out exactly how)
> after this patch: behavior is the same, except the rdonly bit gets
> cleared along with the mounting, not the device add.
>
>>
>>> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
>>> does no work. This results in leaking deleted snapshots until we run out
>>> of space.
>>
>>
>>> I propose fixing it at the first departure from what feels reasonable:
>>> when we clear the readonly bit on the sb during device add. I have a
>>> reproducer of the issue here:
>>> https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
>>> and confirm that this patch fixes it, and seems to work OK, otherwise. I
>>> will admit that I couldn't dig up the original rationale for clearing
>>> the bit here (it dates back to the original seed/sprout commit without
>>> explicit explanation) so it's hard to imagine all the ramifications of
>>> the change.
>>
>> We got fstests -g seed to test the seed-sprout stuff. Your test case
>> here fits in it. IMO.
>
> Thanks for the tip, I'll add it there, regardless of how we fix it.
>
>>
>> Thanks, Anand
>>
>>
>>> Signed-off-by: Boris Burkov <boris@bur.io>
>>> ---
>>> fs/btrfs/volumes.c | 4 ----
>>> 1 file changed, 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 3fd17e87815a..75d7eeb26fe6 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>> set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>>> if (seeding_dev) {
>>> - btrfs_clear_sb_rdonly(sb);
>>> -
>>> /* GFP_KERNEL allocation must not be under device_list_mutex */
>>> seed_devices = btrfs_init_sprout(fs_info);
>>> if (IS_ERR(seed_devices)) {
>>> @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>> mutex_unlock(&fs_info->chunk_mutex);
>>> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>> error_trans:
>>> - if (seeding_dev)
>>> - btrfs_set_sb_rdonly(sb);
>>> if (trans)
>>> btrfs_end_transaction(trans);
>>> error_free_zone:
>>
This looks good to me. You may add.
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Thanks, Anand
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-23 18:16 ` Boris Burkov
@ 2022-03-28 11:11 ` Anand Jain
2022-03-29 4:33 ` Naohiro Aota
0 siblings, 1 reply; 23+ messages in thread
From: Anand Jain @ 2022-03-28 11:11 UTC (permalink / raw)
To: Boris Burkov, Naohiro Aota
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com
On 24/03/2022 02:16, Boris Burkov wrote:
> On Wed, Mar 23, 2022 at 12:52:15AM +0000, Naohiro Aota wrote:
>> On Mon, Mar 21, 2022 at 04:56:17PM -0700, Boris Burkov wrote:
>>> If you follow the seed/sprout wiki, it suggests the following workflow:
>>>
>>> btrfstune -S 1 seed_dev > > mount seed_dev mnt
>>> btrfs device add sprout_dev
>>> mount -o remount,rw mnt
>>>
>>> The first mount mounts the FS readonly, which results in not setting
>>> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
>>> somewhat surprisingly clears the readonly bit on the sb (though the
>>> mount is still practically readonly, from the users perspective...).
>>> Finally, the remount checks the readonly bit on the sb against the flag
>>> and sees no change, so it does not run the code intended to run on
>>> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>>>
>>> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
>>> does no work. This results in leaking deleted snapshots until we run out
>>> of space.
>>>
>>> I propose fixing it at the first departure from what feels reasonable:
>>> when we clear the readonly bit on the sb during device add. I have a
>>> reproducer of the issue here:
>>> https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
>>> and confirm that this patch fixes it, and seems to work OK, otherwise. I
>>> will admit that I couldn't dig up the original rationale for clearing
>>> the bit here (it dates back to the original seed/sprout commit without
>>> explicit explanation) so it's hard to imagine all the ramifications of
>>> the change.
>>>
>>> Signed-off-by: Boris Burkov <boris@bur.io>
>>> ---
>>> fs/btrfs/volumes.c | 4 ----
>>> 1 file changed, 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 3fd17e87815a..75d7eeb26fe6 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>> set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
>>>
>>> if (seeding_dev) {
>>> - btrfs_clear_sb_rdonly(sb);
>>> -
>>
>> After this line, it updates the metadata e.g, with
>> init_first_rw_device() and writes them out with
>> btrfs_commit_transaction(). Is that OK to do so with the SB_RDONLY
>> flag set?
>
It is ok as the device-add step creates a _new_ sprout filesystem which
is RW-able. btrfs_setup_sprout() resets the seeding flag.
super_flags = btrfs_super_flags(disk_super) &
~BTRFS_SUPER_FLAG_SEEDING;
btrfs_set_super_flags(disk_super, super_flags);
Thanks, Anand
> Good question. As far as I can tell, the functions don't explicitly
> check sb_rdonly, though that could be because they expect that to be
> checked before you ever try to commit a transaction, for example..
>
> If there is an issue, it's probably somewhat subtle, because the basic
> behavior does work.
>
>>
>>> /* GFP_KERNEL allocation must not be under device_list_mutex */
>>> seed_devices = btrfs_init_sprout(fs_info);
>>> if (IS_ERR(seed_devices)) {
>>> @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
>>> mutex_unlock(&fs_info->chunk_mutex);
>>> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>> error_trans:
>>> - if (seeding_dev)
>>> - btrfs_set_sb_rdonly(sb);
>>> if (trans)
>>> btrfs_end_transaction(trans);
>>> error_free_zone:
>>> --
>>> 2.30.2
>>>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-28 11:11 ` Anand Jain
@ 2022-03-29 4:33 ` Naohiro Aota
2022-03-29 19:45 ` Boris Burkov
0 siblings, 1 reply; 23+ messages in thread
From: Naohiro Aota @ 2022-03-29 4:33 UTC (permalink / raw)
To: Anand Jain; +Cc: Boris Burkov, linux-btrfs@vger.kernel.org, kernel-team@fb.com
On Mon, Mar 28, 2022 at 07:11:39PM +0800, Anand Jain wrote:
> On 24/03/2022 02:16, Boris Burkov wrote:
> > On Wed, Mar 23, 2022 at 12:52:15AM +0000, Naohiro Aota wrote:
> > > On Mon, Mar 21, 2022 at 04:56:17PM -0700, Boris Burkov wrote:
> > > > If you follow the seed/sprout wiki, it suggests the following workflow:
> > > >
> > > > btrfstune -S 1 seed_dev > > mount seed_dev mnt
> > > > btrfs device add sprout_dev
> > > > mount -o remount,rw mnt
> > > >
> > > > The first mount mounts the FS readonly, which results in not setting
> > > > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > > > somewhat surprisingly clears the readonly bit on the sb (though the
> > > > mount is still practically readonly, from the users perspective...).
> > > > Finally, the remount checks the readonly bit on the sb against the flag
> > > > and sees no change, so it does not run the code intended to run on
> > > > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> > > >
> > > > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > > > does no work. This results in leaking deleted snapshots until we run out
> > > > of space.
> > > >
> > > > I propose fixing it at the first departure from what feels reasonable:
> > > > when we clear the readonly bit on the sb during device add. I have a
> > > > reproducer of the issue here:
> > > > https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
> > > > and confirm that this patch fixes it, and seems to work OK, otherwise. I
> > > > will admit that I couldn't dig up the original rationale for clearing
> > > > the bit here (it dates back to the original seed/sprout commit without
> > > > explicit explanation) so it's hard to imagine all the ramifications of
> > > > the change.
> > > >
> > > > Signed-off-by: Boris Burkov <boris@bur.io>
> > > > ---
> > > > fs/btrfs/volumes.c | 4 ----
> > > > 1 file changed, 4 deletions(-)
> > > >
> > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > > index 3fd17e87815a..75d7eeb26fe6 100644
> > > > --- a/fs/btrfs/volumes.c
> > > > +++ b/fs/btrfs/volumes.c
> > > > @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > > > set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
> > > > if (seeding_dev) {
> > > > - btrfs_clear_sb_rdonly(sb);
> > > > -
> > >
> > > After this line, it updates the metadata e.g, with
> > > init_first_rw_device() and writes them out with
> > > btrfs_commit_transaction(). Is that OK to do so with the SB_RDONLY
> > > flag set?
> >
>
> It is ok as the device-add step creates a _new_ sprout filesystem which is
> RW-able. btrfs_setup_sprout() resets the seeding flag.
>
> super_flags = btrfs_super_flags(disk_super) &
> ~BTRFS_SUPER_FLAG_SEEDING;
> btrfs_set_super_flags(disk_super, super_flags);
>
> Thanks, Anand
Yeah, I see that point. I'm concerned about an interaction with the
VFS code. With this patch applied, it is going to do file-system
internal writes (updating the trees and transaction commit) with the
SB_RDONLY flag set. Doesn't it break the current and future assumptions
of the VFS side?
But, it's just a concern. It might not be a bid deal.
> > Good question. As far as I can tell, the functions don't explicitly
> > check sb_rdonly, though that could be because they expect that to be
> > checked before you ever try to commit a transaction, for example..
> >
> > If there is an issue, it's probably somewhat subtle, because the basic
> > behavior does work.
> >
> > >
> > > > /* GFP_KERNEL allocation must not be under device_list_mutex */
> > > > seed_devices = btrfs_init_sprout(fs_info);
> > > > if (IS_ERR(seed_devices)) {
> > > > @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > > > mutex_unlock(&fs_info->chunk_mutex);
> > > > mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > > > error_trans:
> > > > - if (seeding_dev)
> > > > - btrfs_set_sb_rdonly(sb);
> > > > if (trans)
> > > > btrfs_end_transaction(trans);
> > > > error_free_zone:
> > > > --
> > > > 2.30.2
> > > >
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2022-03-29 4:33 ` Naohiro Aota
@ 2022-03-29 19:45 ` Boris Burkov
0 siblings, 0 replies; 23+ messages in thread
From: Boris Burkov @ 2022-03-29 19:45 UTC (permalink / raw)
To: Naohiro Aota; +Cc: Anand Jain, linux-btrfs@vger.kernel.org, kernel-team@fb.com
On Tue, Mar 29, 2022 at 04:33:21AM +0000, Naohiro Aota wrote:
> On Mon, Mar 28, 2022 at 07:11:39PM +0800, Anand Jain wrote:
> > On 24/03/2022 02:16, Boris Burkov wrote:
> > > On Wed, Mar 23, 2022 at 12:52:15AM +0000, Naohiro Aota wrote:
> > > > On Mon, Mar 21, 2022 at 04:56:17PM -0700, Boris Burkov wrote:
> > > > > If you follow the seed/sprout wiki, it suggests the following workflow:
> > > > >
> > > > > btrfstune -S 1 seed_dev > > mount seed_dev mnt
> > > > > btrfs device add sprout_dev
> > > > > mount -o remount,rw mnt
> > > > >
> > > > > The first mount mounts the FS readonly, which results in not setting
> > > > > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > > > > somewhat surprisingly clears the readonly bit on the sb (though the
> > > > > mount is still practically readonly, from the users perspective...).
> > > > > Finally, the remount checks the readonly bit on the sb against the flag
> > > > > and sees no change, so it does not run the code intended to run on
> > > > > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> > > > >
> > > > > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > > > > does no work. This results in leaking deleted snapshots until we run out
> > > > > of space.
> > > > >
> > > > > I propose fixing it at the first departure from what feels reasonable:
> > > > > when we clear the readonly bit on the sb during device add. I have a
> > > > > reproducer of the issue here:
> > > > > https://raw.githubusercontent.com/boryas/scripts/main/sh/seed/mkseed.sh
> > > > > and confirm that this patch fixes it, and seems to work OK, otherwise. I
> > > > > will admit that I couldn't dig up the original rationale for clearing
> > > > > the bit here (it dates back to the original seed/sprout commit without
> > > > > explicit explanation) so it's hard to imagine all the ramifications of
> > > > > the change.
> > > > >
> > > > > Signed-off-by: Boris Burkov <boris@bur.io>
> > > > > ---
> > > > > fs/btrfs/volumes.c | 4 ----
> > > > > 1 file changed, 4 deletions(-)
> > > > >
> > > > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > > > index 3fd17e87815a..75d7eeb26fe6 100644
> > > > > --- a/fs/btrfs/volumes.c
> > > > > +++ b/fs/btrfs/volumes.c
> > > > > @@ -2675,8 +2675,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > > > > set_blocksize(device->bdev, BTRFS_BDEV_BLOCKSIZE);
> > > > > if (seeding_dev) {
> > > > > - btrfs_clear_sb_rdonly(sb);
> > > > > -
> > > >
> > > > After this line, it updates the metadata e.g, with
> > > > init_first_rw_device() and writes them out with
> > > > btrfs_commit_transaction(). Is that OK to do so with the SB_RDONLY
> > > > flag set?
> > >
> >
> > It is ok as the device-add step creates a _new_ sprout filesystem which is
> > RW-able. btrfs_setup_sprout() resets the seeding flag.
> >
> > super_flags = btrfs_super_flags(disk_super) &
> > ~BTRFS_SUPER_FLAG_SEEDING;
> > btrfs_set_super_flags(disk_super, super_flags);
> >
> > Thanks, Anand
>
> Yeah, I see that point. I'm concerned about an interaction with the
> VFS code. With this patch applied, it is going to do file-system
> internal writes (updating the trees and transaction commit) with the
> SB_RDONLY flag set. Doesn't it break the current and future assumptions
> of the VFS side?
>
> But, it's just a concern. It might not be a bid deal.
I discussed this with Chris and he pointed out an existing case:
log-replay. Log replay runs in mount before the sb_rdonly check and
checks rw_devices, but not the fs readonly bit. It uses transactions and
such to replay the log. This seems similar enough here: the device is rw,
but the fs is ro.
>
> > > Good question. As far as I can tell, the functions don't explicitly
> > > check sb_rdonly, though that could be because they expect that to be
> > > checked before you ever try to commit a transaction, for example..
> > >
> > > If there is an issue, it's probably somewhat subtle, because the basic
> > > behavior does work.
> > >
> > > >
> > > > > /* GFP_KERNEL allocation must not be under device_list_mutex */
> > > > > seed_devices = btrfs_init_sprout(fs_info);
> > > > > if (IS_ERR(seed_devices)) {
> > > > > @@ -2819,8 +2817,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > > > > mutex_unlock(&fs_info->chunk_mutex);
> > > > > mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > > > > error_trans:
> > > > > - if (seeding_dev)
> > > > > - btrfs_set_sb_rdonly(sb);
> > > > > if (trans)
> > > > > btrfs_end_transaction(trans);
> > > > > error_free_zone:
> > > > > --
> > > > > 2.30.2
> > > > >
> >
^ permalink raw reply [flat|nested] 23+ messages in thread
* [PATCH] btrfs: do not clear read-only when adding sprout device
@ 2024-10-15 21:38 Boris Burkov
2024-10-15 22:00 ` Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Boris Burkov @ 2024-10-15 21:38 UTC (permalink / raw)
To: linux-btrfs, kernel-team
If you follow the seed/sprout wiki, it suggests the following workflow:
btrfstune -S 1 seed_dev
mount seed_dev mnt
btrfs device add sprout_dev
mount -o remount,rw mnt
The first mount mounts the FS readonly, which results in not setting
BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
somewhat surprisingly clears the readonly bit on the sb (though the
mount is still practically readonly, from the users perspective...).
Finally, the remount checks the readonly bit on the sb against the flag
and sees no change, so it does not run the code intended to run on
ro->rw transitions, leaving BTRFS_FS_OPEN unset.
As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
does no work. This results in leaking deleted snapshots until we run out
of space.
I propose fixing it at the first departure from what feels reasonable:
when we clear the readonly bit on the sb during device add.
A new fstest I have written reproduces the bug and confirms the fix.
Signed-off-by: Boris Burkov <boris@bur.io>
---
Note that this is a resend of an old unmerged fix:
https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
were also explored but not merged around that time:
https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
I don't have a strong preference, but I would really like to see this
trivial bug fixed. For what it is worth, we have been carrying this
patch internally at Meta since I first sent it with no incident.
---
fs/btrfs/volumes.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index dc9f54849f39..84e861dcb350 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
if (seeding_dev) {
- btrfs_clear_sb_rdonly(sb);
-
/* GFP_KERNEL allocation must not be under device_list_mutex */
seed_devices = btrfs_init_sprout(fs_info);
if (IS_ERR(seed_devices)) {
@@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
mutex_unlock(&fs_info->chunk_mutex);
mutex_unlock(&fs_info->fs_devices->device_list_mutex);
error_trans:
- if (seeding_dev)
- btrfs_set_sb_rdonly(sb);
if (trans)
btrfs_end_transaction(trans);
error_free_zone:
--
2.46.2
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-15 21:38 [PATCH] btrfs: do not clear read-only when adding sprout device Boris Burkov
@ 2024-10-15 22:00 ` Qu Wenruo
2024-10-15 22:12 ` Qu Wenruo
2024-10-16 17:14 ` Anand Jain
2024-10-17 14:01 ` David Sterba
2 siblings, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2024-10-15 22:00 UTC (permalink / raw)
To: Boris Burkov, linux-btrfs, kernel-team
在 2024/10/16 08:08, Boris Burkov 写道:
> If you follow the seed/sprout wiki, it suggests the following workflow:
>
> btrfstune -S 1 seed_dev
> mount seed_dev mnt
> btrfs device add sprout_dev
> mount -o remount,rw mnt
>
> The first mount mounts the FS readonly, which results in not setting
> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> somewhat surprisingly clears the readonly bit on the sb (though the
> mount is still practically readonly, from the users perspective...).
> Finally, the remount checks the readonly bit on the sb against the flag
> and sees no change, so it does not run the code intended to run on
> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>
> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> does no work. This results in leaking deleted snapshots until we run out
> of space.
>
> I propose fixing it at the first departure from what feels reasonable:
> when we clear the readonly bit on the sb during device add.
>
> A new fstest I have written reproduces the bug and confirms the fix.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
The fix looks good to me, small and keeps the super block ro flag
consistent.
IIRC the old behavior of sprout is, adding device will immediately mark
the fs RW, which is a big surprise changing the RO/RW status.
So the extra Rw remount requirement looks very reasonable to me.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Thanks,
Qu
> ---
> Note that this is a resend of an old unmerged fix:
> https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> were also explored but not merged around that time:
> https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
>
> I don't have a strong preference, but I would really like to see this
> trivial bug fixed. For what it is worth, we have been carrying this
> patch internally at Meta since I first sent it with no incident.
> ---
> fs/btrfs/volumes.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dc9f54849f39..84e861dcb350 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
>
> if (seeding_dev) {
> - btrfs_clear_sb_rdonly(sb);
> -
> /* GFP_KERNEL allocation must not be under device_list_mutex */
> seed_devices = btrfs_init_sprout(fs_info);
> if (IS_ERR(seed_devices)) {
> @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> mutex_unlock(&fs_info->chunk_mutex);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> error_trans:
> - if (seeding_dev)
> - btrfs_set_sb_rdonly(sb);
> if (trans)
> btrfs_end_transaction(trans);
> error_free_zone:
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-15 22:00 ` Qu Wenruo
@ 2024-10-15 22:12 ` Qu Wenruo
2024-10-15 23:23 ` Boris Burkov
0 siblings, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2024-10-15 22:12 UTC (permalink / raw)
To: Boris Burkov, linux-btrfs, kernel-team
在 2024/10/16 08:30, Qu Wenruo 写道:
>
>
> 在 2024/10/16 08:08, Boris Burkov 写道:
>> If you follow the seed/sprout wiki, it suggests the following workflow:
>>
>> btrfstune -S 1 seed_dev
>> mount seed_dev mnt
>> btrfs device add sprout_dev
>> mount -o remount,rw mnt
>>
>> The first mount mounts the FS readonly, which results in not setting
>> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
>> somewhat surprisingly clears the readonly bit on the sb (though the
>> mount is still practically readonly, from the users perspective...).
>> Finally, the remount checks the readonly bit on the sb against the flag
>> and sees no change, so it does not run the code intended to run on
>> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>>
>> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
>> does no work. This results in leaking deleted snapshots until we run out
>> of space.
>>
>> I propose fixing it at the first departure from what feels reasonable:
>> when we clear the readonly bit on the sb during device add.
>>
>> A new fstest I have written reproduces the bug and confirms the fix.
>>
>> Signed-off-by: Boris Burkov <boris@bur.io>
>
> The fix looks good to me, small and keeps the super block ro flag
> consistent.
>
> IIRC the old behavior of sprout is, adding device will immediately mark
> the fs RW, which is a big surprise changing the RO/RW status.
>
> So the extra Rw remount requirement looks very reasonable to me.
Forgot to mention, although it's a trivial change in the behavior, if we
are determined to go this path, the man page of the "SEEDING DEVICE"
chapter also need to be updated.
Thanks,
Qu
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
>
> Thanks,
> Qu
>
>> ---
>> Note that this is a resend of an old unmerged fix:
>> https://lore.kernel.org/linux-
>> btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
>> Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
>> were also explored but not merged around that time:
>> https://lore.kernel.org/linux-btrfs/
>> cover.1654216941.git.anand.jain@oracle.com/
>>
>> I don't have a strong preference, but I would really like to see this
>> trivial bug fixed. For what it is worth, we have been carrying this
>> patch internally at Meta since I first sent it with no incident.
>> ---
>> fs/btrfs/volumes.c | 4 ----
>> 1 file changed, 4 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index dc9f54849f39..84e861dcb350 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
>> *fs_info, const char *device_path
>> set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
>>
>> if (seeding_dev) {
>> - btrfs_clear_sb_rdonly(sb);
>> -
>> /* GFP_KERNEL allocation must not be under device_list_mutex */
>> seed_devices = btrfs_init_sprout(fs_info);
>> if (IS_ERR(seed_devices)) {
>> @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
>> *fs_info, const char *device_path
>> mutex_unlock(&fs_info->chunk_mutex);
>> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>> error_trans:
>> - if (seeding_dev)
>> - btrfs_set_sb_rdonly(sb);
>> if (trans)
>> btrfs_end_transaction(trans);
>> error_free_zone:
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-15 22:12 ` Qu Wenruo
@ 2024-10-15 23:23 ` Boris Burkov
0 siblings, 0 replies; 23+ messages in thread
From: Boris Burkov @ 2024-10-15 23:23 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs, kernel-team
On Wed, Oct 16, 2024 at 08:42:14AM +1030, Qu Wenruo wrote:
>
>
> 在 2024/10/16 08:30, Qu Wenruo 写道:
> >
> >
> > 在 2024/10/16 08:08, Boris Burkov 写道:
> > > If you follow the seed/sprout wiki, it suggests the following workflow:
> > >
> > > btrfstune -S 1 seed_dev
> > > mount seed_dev mnt
> > > btrfs device add sprout_dev
> > > mount -o remount,rw mnt
> > >
> > > The first mount mounts the FS readonly, which results in not setting
> > > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > > somewhat surprisingly clears the readonly bit on the sb (though the
> > > mount is still practically readonly, from the users perspective...).
> > > Finally, the remount checks the readonly bit on the sb against the flag
> > > and sees no change, so it does not run the code intended to run on
> > > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> > >
> > > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > > does no work. This results in leaking deleted snapshots until we run out
> > > of space.
> > >
> > > I propose fixing it at the first departure from what feels reasonable:
> > > when we clear the readonly bit on the sb during device add.
> > >
> > > A new fstest I have written reproduces the bug and confirms the fix.
> > >
> > > Signed-off-by: Boris Burkov <boris@bur.io>
> >
> > The fix looks good to me, small and keeps the super block ro flag
> > consistent.
> >
> > IIRC the old behavior of sprout is, adding device will immediately mark
> > the fs RW, which is a big surprise changing the RO/RW status.
> >
> > So the extra Rw remount requirement looks very reasonable to me.
>
> Forgot to mention, although it's a trivial change in the behavior, if we
> are determined to go this path, the man page of the "SEEDING DEVICE"
> chapter also need to be updated.
I just checked the man page and everything there looks correct, at least
to me. It quite clearly states that after the 'device add' the fs is
ready to be remounted read-write (via cycle mount or remount).
Please let me know if there is some specific update you want me to make
that I missed, though!
BTW, this patch does change the rdonly flag behavior, so I will update
the test to look at that, as you suggested.
>
> Thanks,
> Qu
> >
> > Reviewed-by: Qu Wenruo <wqu@suse.com>
> >
> > Thanks,
> > Qu
> >
> > > ---
> > > Note that this is a resend of an old unmerged fix:
> > > https://lore.kernel.org/linux-
> > > btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> > > Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> > > were also explored but not merged around that time:
> > > https://lore.kernel.org/linux-btrfs/
> > > cover.1654216941.git.anand.jain@oracle.com/
> > >
> > > I don't have a strong preference, but I would really like to see this
> > > trivial bug fixed. For what it is worth, we have been carrying this
> > > patch internally at Meta since I first sent it with no incident.
> > > ---
> > > fs/btrfs/volumes.c | 4 ----
> > > 1 file changed, 4 deletions(-)
> > >
> > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > > index dc9f54849f39..84e861dcb350 100644
> > > --- a/fs/btrfs/volumes.c
> > > +++ b/fs/btrfs/volumes.c
> > > @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
> > > *fs_info, const char *device_path
> > > set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
> > >
> > > if (seeding_dev) {
> > > - btrfs_clear_sb_rdonly(sb);
> > > -
> > > /* GFP_KERNEL allocation must not be under device_list_mutex */
> > > seed_devices = btrfs_init_sprout(fs_info);
> > > if (IS_ERR(seed_devices)) {
> > > @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
> > > *fs_info, const char *device_path
> > > mutex_unlock(&fs_info->chunk_mutex);
> > > mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > > error_trans:
> > > - if (seeding_dev)
> > > - btrfs_set_sb_rdonly(sb);
> > > if (trans)
> > > btrfs_end_transaction(trans);
> > > error_free_zone:
> >
> >
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-15 21:38 [PATCH] btrfs: do not clear read-only when adding sprout device Boris Burkov
2024-10-15 22:00 ` Qu Wenruo
@ 2024-10-16 17:14 ` Anand Jain
2024-10-16 17:24 ` Boris Burkov
2024-10-17 20:47 ` Qu Wenruo
2024-10-17 14:01 ` David Sterba
2 siblings, 2 replies; 23+ messages in thread
From: Anand Jain @ 2024-10-16 17:14 UTC (permalink / raw)
To: Boris Burkov, linux-btrfs, kernel-team
On 16/10/24 05:38, Boris Burkov wrote:
> If you follow the seed/sprout wiki, it suggests the following workflow:
>
> btrfstune -S 1 seed_dev
> mount seed_dev mnt
> btrfs device add sprout_dev
> mount -o remount,rw mnt
>
> The first mount mounts the FS readonly, which results in not setting
> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> somewhat surprisingly clears the readonly bit on the sb (though the
> mount is still practically readonly, from the users perspective...).
> Finally, the remount checks the readonly bit on the sb against the flag
> and sees no change, so it does not run the code intended to run on
> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>
> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> does no work. This results in leaking deleted snapshots until we run out
> of space.
>
> I propose fixing it at the first departure from what feels reasonable:
> when we clear the readonly bit on the sb during device add.
>
> A new fstest I have written reproduces the bug and confirms the fix.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> Note that this is a resend of an old unmerged fix:
> https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> were also explored but not merged around that time:
> https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
>
> I don't have a strong preference, but I would really like to see this
> trivial bug fixed. For what it is worth, we have been carrying this
> patch internally at Meta since I first sent it with no incident.
> ---
I remember fixing this before. I tested on 5.15, and the bug isn't
there, but it’s back in 6.10, so something broke in between.
We need to track it down.
The original design (kernel 4.x and below) makes the filesystem switch
to read-write mode after adding a sprout because:
You can’t add a device to a normal read-only filesystem
so with seed read-only mount is different.
With a seed device, adding a writable device transforms
it into a new read-write filesystem with a _new_ FSID and
fs_devices. Logically, read-write at this stage makes sense,
but I’m okay without it and in fact we had fixed this before,
but a patch somewhere seems to have broken it again.
(Demo below. :<x> is the return code from the 'run' command at
https://github.com/asj/run.git)
----- 5.15.0-208.159.3.2.el9uek.x86_64 ----
$ mkfs.btrfs -fq /dev/loop0 :0
$ btrfstune -S1 /dev/loop0 :0
$ mount /dev/loop0 /btrfs :0
mount: /btrfs: WARNING: source write-protected, mounted read-only.
$ cat /proc/self/mounts | grep btrfs :0
/dev/loop0 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
$ findmnt -o SOURCE,UUID /btrfs :0
SOURCE UUID
/dev/loop0 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
$ btrfs fi show -m :0
Label: none uuid: 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
Total devices 1 FS bytes used 144.00KiB
devid 1 size 3.00GiB used 536.00MiB path /dev/loop0
$ ls /sys/fs/btrfs :0
64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
features
$ btrfs dev add -f /dev/loop1 /btrfs :0
# After adding the device, the path and UUID are different,
# so it’s a new filesystem. (But, as I said, I’m fine with
# keeping it read-only and needing remount,rw.
$ cat /proc/self/mounts | grep btrfs :0
/dev/loop1 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
$ findmnt -o SOURCE,UUID /btrfs :0
SOURCE UUID
/dev/loop1 948cea35-18db-45da-9ec8-3d46cb5f0413
$ btrfs fi show -m :0
Label: none uuid: 948cea35-18db-45da-9ec8-3d46cb5f0413
Total devices 2 FS bytes used 144.00KiB
devid 1 size 3.00GiB used 520.00MiB path /dev/loop0
devid 2 size 3.00GiB used 576.00MiB path /dev/loop1
$ ls /sys/fs/btrfs :0
948cea35-18db-45da-9ec8-3d46cb5f0413
features
---------
Thanks, Anand
> fs/btrfs/volumes.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index dc9f54849f39..84e861dcb350 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
>
> if (seeding_dev) {
> - btrfs_clear_sb_rdonly(sb);
> -
> /* GFP_KERNEL allocation must not be under device_list_mutex */
> seed_devices = btrfs_init_sprout(fs_info);
> if (IS_ERR(seed_devices)) {
> @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> mutex_unlock(&fs_info->chunk_mutex);
> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> error_trans:
> - if (seeding_dev)
> - btrfs_set_sb_rdonly(sb);
> if (trans)
> btrfs_end_transaction(trans);
> error_free_zone:
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-16 17:14 ` Anand Jain
@ 2024-10-16 17:24 ` Boris Burkov
2024-10-17 20:47 ` Qu Wenruo
1 sibling, 0 replies; 23+ messages in thread
From: Boris Burkov @ 2024-10-16 17:24 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs, kernel-team
On Thu, Oct 17, 2024 at 01:14:16AM +0800, Anand Jain wrote:
> On 16/10/24 05:38, Boris Burkov wrote:
> > If you follow the seed/sprout wiki, it suggests the following workflow:
> >
> > btrfstune -S 1 seed_dev
> > mount seed_dev mnt
> > btrfs device add sprout_dev
> > mount -o remount,rw mnt
> >
>
>
>
> > The first mount mounts the FS readonly, which results in not setting
> > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > somewhat surprisingly clears the readonly bit on the sb (though the
> > mount is still practically readonly, from the users perspective...).
> > Finally, the remount checks the readonly bit on the sb against the flag
> > and sees no change, so it does not run the code intended to run on
> > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> >
> > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > does no work. This results in leaking deleted snapshots until we run out
> > of space.
> >
> > I propose fixing it at the first departure from what feels reasonable:
> > when we clear the readonly bit on the sb during device add.
> >
> > A new fstest I have written reproduces the bug and confirms the fix.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > Note that this is a resend of an old unmerged fix:
> > https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> > Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> > were also explored but not merged around that time:
> > https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
> >
> > I don't have a strong preference, but I would really like to see this
> > trivial bug fixed. For what it is worth, we have been carrying this
> > patch internally at Meta since I first sent it with no incident.
> > ---
>
>
> I remember fixing this before. I tested on 5.15, and the bug isn't
> there, but it’s back in 6.10, so something broke in between.
> We need to track it down.
Thanks for weighing in again, and for re-testing on 5.15. That's
interesting that it broke again. And sorry if I didn't follow the rdonly
check patches properly and those did end up getting merged. Poor code
archaeology on my part :)
At any rate, I have pushed this patch into for-next for now, as I
think it constitutes an improvement without breaking any documented
behavior. If you look into what happened between 5.15 and 6.11 and want
to back it out with a different fix, I will not be offended.
If we also land the fstest I submitted, then hopefully future kernels
will *not* be breaking this again!
Thanks,
Boris
>
> The original design (kernel 4.x and below) makes the filesystem switch
> to read-write mode after adding a sprout because:
>
> You can’t add a device to a normal read-only filesystem
> so with seed read-only mount is different.
> With a seed device, adding a writable device transforms
> it into a new read-write filesystem with a _new_ FSID and
> fs_devices. Logically, read-write at this stage makes sense,
> but I’m okay without it and in fact we had fixed this before,
> but a patch somewhere seems to have broken it again.
>
>
> (Demo below. :<x> is the return code from the 'run' command at
> https://github.com/asj/run.git)
>
>
> ----- 5.15.0-208.159.3.2.el9uek.x86_64 ----
> $ mkfs.btrfs -fq /dev/loop0 :0
> $ btrfstune -S1 /dev/loop0 :0
> $ mount /dev/loop0 /btrfs :0
> mount: /btrfs: WARNING: source write-protected, mounted read-only.
>
> $ cat /proc/self/mounts | grep btrfs :0
> /dev/loop0 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
>
> $ findmnt -o SOURCE,UUID /btrfs :0
> SOURCE UUID
> /dev/loop0 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
>
> $ btrfs fi show -m :0
> Label: none uuid: 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
> Total devices 1 FS bytes used 144.00KiB
> devid 1 size 3.00GiB used 536.00MiB path /dev/loop0
>
> $ ls /sys/fs/btrfs :0
> 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
> features
>
> $ btrfs dev add -f /dev/loop1 /btrfs :0
>
> # After adding the device, the path and UUID are different,
> # so it’s a new filesystem. (But, as I said, I’m fine with
> # keeping it read-only and needing remount,rw.
>
> $ cat /proc/self/mounts | grep btrfs :0
> /dev/loop1 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
>
> $ findmnt -o SOURCE,UUID /btrfs :0
> SOURCE UUID
> /dev/loop1 948cea35-18db-45da-9ec8-3d46cb5f0413
>
> $ btrfs fi show -m :0
> Label: none uuid: 948cea35-18db-45da-9ec8-3d46cb5f0413
> Total devices 2 FS bytes used 144.00KiB
> devid 1 size 3.00GiB used 520.00MiB path /dev/loop0
> devid 2 size 3.00GiB used 576.00MiB path /dev/loop1
>
>
> $ ls /sys/fs/btrfs :0
> 948cea35-18db-45da-9ec8-3d46cb5f0413
> features
> ---------
>
>
> Thanks, Anand
>
> > fs/btrfs/volumes.c | 4 ----
> > 1 file changed, 4 deletions(-)
> >
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index dc9f54849f39..84e861dcb350 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
> > if (seeding_dev) {
> > - btrfs_clear_sb_rdonly(sb);
> > -
> > /* GFP_KERNEL allocation must not be under device_list_mutex */
> > seed_devices = btrfs_init_sprout(fs_info);
> > if (IS_ERR(seed_devices)) {
> > @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path
> > mutex_unlock(&fs_info->chunk_mutex);
> > mutex_unlock(&fs_info->fs_devices->device_list_mutex);
> > error_trans:
> > - if (seeding_dev)
> > - btrfs_set_sb_rdonly(sb);
> > if (trans)
> > btrfs_end_transaction(trans);
> > error_free_zone:
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-15 21:38 [PATCH] btrfs: do not clear read-only when adding sprout device Boris Burkov
2024-10-15 22:00 ` Qu Wenruo
2024-10-16 17:14 ` Anand Jain
@ 2024-10-17 14:01 ` David Sterba
2024-10-17 16:41 ` Boris Burkov
2 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2024-10-17 14:01 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On Tue, Oct 15, 2024 at 02:38:34PM -0700, Boris Burkov wrote:
> If you follow the seed/sprout wiki, it suggests the following workflow:
>
> btrfstune -S 1 seed_dev
> mount seed_dev mnt
> btrfs device add sprout_dev
> mount -o remount,rw mnt
>
> The first mount mounts the FS readonly, which results in not setting
> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> somewhat surprisingly clears the readonly bit on the sb (though the
> mount is still practically readonly, from the users perspective...).
> Finally, the remount checks the readonly bit on the sb against the flag
> and sees no change, so it does not run the code intended to run on
> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>
> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> does no work. This results in leaking deleted snapshots until we run out
> of space.
>
> I propose fixing it at the first departure from what feels reasonable:
> when we clear the readonly bit on the sb during device add.
>
> A new fstest I have written reproduces the bug and confirms the fix.
>
> Signed-off-by: Boris Burkov <boris@bur.io>
> ---
> Note that this is a resend of an old unmerged fix:
> https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> were also explored but not merged around that time:
> https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
>
> I don't have a strong preference, but I would really like to see this
> trivial bug fixed. For what it is worth, we have been carrying this
> patch internally at Meta since I first sent it with no incident.
We have an unresolved dispute about the fix and now the patch got to
for-next within a few days because it got two reviews but not mine.
The way you use it in Meta works for you but the fix is changing
behaviour so we can either ignore everybody else relying on the old
way or say that seeding is so niche that we don't care and see what we
can do once we get some report.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-17 14:01 ` David Sterba
@ 2024-10-17 16:41 ` Boris Burkov
2024-10-21 18:56 ` David Sterba
0 siblings, 1 reply; 23+ messages in thread
From: Boris Burkov @ 2024-10-17 16:41 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs, kernel-team
On Thu, Oct 17, 2024 at 04:01:12PM +0200, David Sterba wrote:
> On Tue, Oct 15, 2024 at 02:38:34PM -0700, Boris Burkov wrote:
> > If you follow the seed/sprout wiki, it suggests the following workflow:
> >
> > btrfstune -S 1 seed_dev
> > mount seed_dev mnt
> > btrfs device add sprout_dev
> > mount -o remount,rw mnt
> >
> > The first mount mounts the FS readonly, which results in not setting
> > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > somewhat surprisingly clears the readonly bit on the sb (though the
> > mount is still practically readonly, from the users perspective...).
> > Finally, the remount checks the readonly bit on the sb against the flag
> > and sees no change, so it does not run the code intended to run on
> > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> >
> > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > does no work. This results in leaking deleted snapshots until we run out
> > of space.
> >
> > I propose fixing it at the first departure from what feels reasonable:
> > when we clear the readonly bit on the sb during device add.
> >
> > A new fstest I have written reproduces the bug and confirms the fix.
> >
> > Signed-off-by: Boris Burkov <boris@bur.io>
> > ---
> > Note that this is a resend of an old unmerged fix:
> > https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> > Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> > were also explored but not merged around that time:
> > https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
> >
> > I don't have a strong preference, but I would really like to see this
> > trivial bug fixed. For what it is worth, we have been carrying this
> > patch internally at Meta since I first sent it with no incident.
>
> We have an unresolved dispute about the fix and now the patch got to
> for-next within a few days because it got two reviews but not mine.
> The way you use it in Meta works for you but the fix is changing
> behaviour so we can either ignore everybody else relying on the old
> way or say that seeding is so niche that we don't care and see what we
> can do once we get some report.
Please feel free to remove it, and I am happy to review whatever other
fix you think is best. Sorry for rushing, I just wanted to get it done
and out of my head so I could move on to other issues.
I assume your concern is that before this fix, the filesystem is actually
read-write after the device add, which this patch breaks?
My only argument against this is that the documentation has always said
you needed to remount/cycle-mount after adding the sprout, so there is
no fair reason to assume this would work. (In fact, it *doesn't*, the fs
is once again in a unexpected, degraded, state...)
But if existing LiveCD users are relying on this undocumented behavior,
then I think you are right and it's a bad idea to break them.
As long as my test (and presumably some fix) goes in and I don't have to
carry this patch internally for two more years, then I am happy.
Thanks,
Boris
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-16 17:14 ` Anand Jain
2024-10-16 17:24 ` Boris Burkov
@ 2024-10-17 20:47 ` Qu Wenruo
2024-10-18 11:54 ` Anand Jain
1 sibling, 1 reply; 23+ messages in thread
From: Qu Wenruo @ 2024-10-17 20:47 UTC (permalink / raw)
To: Anand Jain, Boris Burkov, linux-btrfs, kernel-team
在 2024/10/17 03:44, Anand Jain 写道:
> On 16/10/24 05:38, Boris Burkov wrote:
>> If you follow the seed/sprout wiki, it suggests the following workflow:
>>
>> btrfstune -S 1 seed_dev
>> mount seed_dev mnt
>> btrfs device add sprout_dev
>> mount -o remount,rw mnt
>>
>
>
>
>> The first mount mounts the FS readonly, which results in not setting
>> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
>> somewhat surprisingly clears the readonly bit on the sb (though the
>> mount is still practically readonly, from the users perspective...).
>> Finally, the remount checks the readonly bit on the sb against the flag
>> and sees no change, so it does not run the code intended to run on
>> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>>
>> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
>> does no work. This results in leaking deleted snapshots until we run out
>> of space.
>>
>> I propose fixing it at the first departure from what feels reasonable:
>> when we clear the readonly bit on the sb during device add.
>>
>> A new fstest I have written reproduces the bug and confirms the fix.
>>
>> Signed-off-by: Boris Burkov <boris@bur.io>
>> ---
>> Note that this is a resend of an old unmerged fix:
>> https://lore.kernel.org/linux-
>> btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
>> Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
>> were also explored but not merged around that time:
>> https://lore.kernel.org/linux-btrfs/
>> cover.1654216941.git.anand.jain@oracle.com/
>>
>> I don't have a strong preference, but I would really like to see this
>> trivial bug fixed. For what it is worth, we have been carrying this
>> patch internally at Meta since I first sent it with no incident.
>> ---
>
>
> I remember fixing this before. I tested on 5.15, and the bug isn't
> there, but it’s back in 6.10, so something broke in between.
> We need to track it down.
>
> The original design (kernel 4.x and below) makes the filesystem switch
> to read-write mode after adding a sprout because:
>
> You can’t add a device to a normal read-only filesystem
> so with seed read-only mount is different.
> With a seed device, adding a writable device transforms
> it into a new read-write filesystem with a _new_ FSID and
> fs_devices. Logically, read-write at this stage makes sense,
> but I’m okay without it and in fact we had fixed this before,
> but a patch somewhere seems to have broken it again.
>
>
> (Demo below. :<x> is the return code from the 'run' command at
> https://github.com/asj/run.git)
>
>
> ----- 5.15.0-208.159.3.2.el9uek.x86_64 ----
I also tried it on upstream kernel v5.15.94, the behavior is still the
old changed to RW immediately after device add:
[adam@btrfs-vm ~]$ uname -a
Linux btrfs-vm 5.15.94-1-lts #1 SMP Wed, 15 Feb 2023 07:09:02 +0000
x86_64 GNU/Linux
[adam@btrfs-vm ~]$ sudo mkfs.btrfs -f /dev/test/scratch1 > /dev/null
[adam@btrfs-vm ~]$ sudo btrfstune -S 1 /dev/test/scratch1
[adam@btrfs-vm ~]$ sudo mount /dev/test/scratch1 /mnt/btrfs/
mount: /mnt/btrfs: WARNING: source write-protected, mounted read-only.
[adam@btrfs-vm ~]$ sudo btrfs device add -f /dev/test/scratch2 /mnt/btrfs/
Performing full device TRIM /dev/test/scratch2 (10.00GiB) ...
[adam@btrfs-vm ~]$ sudo touch /mnt/btrfs/file
[adam@btrfs-vm ~]$ mount | grep mnt/btrfs
/dev/mapper/test-scratch2 on /mnt/btrfs type btrfs
(rw,relatime,space_cache=v2,subvolid=5,subvol=/)
So it looks like it's some extra backports causing the behavior change.
But I still strongly prefer to keep it RO.
Even if it's a different fs under the hood, it still suddenly changes
the RO/RW status of a mount point without letting the user to know.
Thanks,
Qu
> $ mkfs.btrfs -fq /dev/loop0 :0
> $ btrfstune -S1 /dev/loop0 :0
> $ mount /dev/loop0 /btrfs :0
> mount: /btrfs: WARNING: source write-protected, mounted read-only.
>
> $ cat /proc/self/mounts | grep btrfs :0
> /dev/loop0 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
>
> $ findmnt -o SOURCE,UUID /btrfs :0
> SOURCE UUID
> /dev/loop0 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
>
> $ btrfs fi show -m :0
> Label: none uuid: 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
> Total devices 1 FS bytes used 144.00KiB
> devid 1 size 3.00GiB used 536.00MiB path /dev/loop0
>
> $ ls /sys/fs/btrfs :0
> 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
> features
>
> $ btrfs dev add -f /dev/loop1 /btrfs :0
>
> # After adding the device, the path and UUID are different,
> # so it’s a new filesystem. (But, as I said, I’m fine with
> # keeping it read-only and needing remount,rw.
>
> $ cat /proc/self/mounts | grep btrfs :0
> /dev/loop1 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/ 0 0
>
> $ findmnt -o SOURCE,UUID /btrfs :0
> SOURCE UUID
> /dev/loop1 948cea35-18db-45da-9ec8-3d46cb5f0413
>
> $ btrfs fi show -m :0
> Label: none uuid: 948cea35-18db-45da-9ec8-3d46cb5f0413
> Total devices 2 FS bytes used 144.00KiB
> devid 1 size 3.00GiB used 520.00MiB path /dev/loop0
> devid 2 size 3.00GiB used 576.00MiB path /dev/loop1
>
>
> $ ls /sys/fs/btrfs :0
> 948cea35-18db-45da-9ec8-3d46cb5f0413
> features
> ---------
>
>
> Thanks, Anand
>
>> fs/btrfs/volumes.c | 4 ----
>> 1 file changed, 4 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index dc9f54849f39..84e861dcb350 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
>> *fs_info, const char *device_path
>> set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
>> if (seeding_dev) {
>> - btrfs_clear_sb_rdonly(sb);
>> -
>> /* GFP_KERNEL allocation must not be under device_list_mutex */
>> seed_devices = btrfs_init_sprout(fs_info);
>> if (IS_ERR(seed_devices)) {
>> @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
>> *fs_info, const char *device_path
>> mutex_unlock(&fs_info->chunk_mutex);
>> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>> error_trans:
>> - if (seeding_dev)
>> - btrfs_set_sb_rdonly(sb);
>> if (trans)
>> btrfs_end_transaction(trans);
>> error_free_zone:
>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-17 20:47 ` Qu Wenruo
@ 2024-10-18 11:54 ` Anand Jain
0 siblings, 0 replies; 23+ messages in thread
From: Anand Jain @ 2024-10-18 11:54 UTC (permalink / raw)
To: Qu Wenruo, Boris Burkov, linux-btrfs, kernel-team
On 18/10/24 04:47, Qu Wenruo wrote:
>
>
> 在 2024/10/17 03:44, Anand Jain 写道:
>> On 16/10/24 05:38, Boris Burkov wrote:
>>> If you follow the seed/sprout wiki, it suggests the following workflow:
>>>
>>> btrfstune -S 1 seed_dev
>>> mount seed_dev mnt
>>> btrfs device add sprout_dev
>>> mount -o remount,rw mnt
>>>
>>
>>
>>
>>> The first mount mounts the FS readonly, which results in not setting
>>> BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
>>> somewhat surprisingly clears the readonly bit on the sb (though the
>>> mount is still practically readonly, from the users perspective...).
>>> Finally, the remount checks the readonly bit on the sb against the flag
>>> and sees no change, so it does not run the code intended to run on
>>> ro->rw transitions, leaving BTRFS_FS_OPEN unset.
>>>
>>> As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
>>> does no work. This results in leaking deleted snapshots until we run out
>>> of space.
>>>
>>> I propose fixing it at the first departure from what feels reasonable:
>>> when we clear the readonly bit on the sb during device add.
>>>
>>> A new fstest I have written reproduces the bug and confirms the fix.
>>>
>>> Signed-off-by: Boris Burkov <boris@bur.io>
>>> ---
>>> Note that this is a resend of an old unmerged fix:
>>> https://lore.kernel.org/linux-
>>> btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
>>> Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
>>> were also explored but not merged around that time:
>>> https://lore.kernel.org/linux-btrfs/
>>> cover.1654216941.git.anand.jain@oracle.com/
>>>
>>> I don't have a strong preference, but I would really like to see this
>>> trivial bug fixed. For what it is worth, we have been carrying this
>>> patch internally at Meta since I first sent it with no incident.
>>> ---
>>
>>
>> I remember fixing this before. I tested on 5.15, and the bug isn't
>> there, but it’s back in 6.10, so something broke in between.
>> We need to track it down.
>>
>> The original design (kernel 4.x and below) makes the filesystem switch
>> to read-write mode after adding a sprout because:
>>
>> You can’t add a device to a normal read-only filesystem
>> so with seed read-only mount is different.
>> With a seed device, adding a writable device transforms
>> it into a new read-write filesystem with a _new_ FSID and
>> fs_devices. Logically, read-write at this stage makes sense,
>> but I’m okay without it and in fact we had fixed this before,
>> but a patch somewhere seems to have broken it again.
>>
>>
>> (Demo below. :<x> is the return code from the 'run' command at
>> https://github.com/asj/run.git)
>>
>>
>> ----- 5.15.0-208.159.3.2.el9uek.x86_64 ----
>
> I also tried it on upstream kernel v5.15.94, the behavior is still the
> old changed to RW immediately after device add:
>
> [adam@btrfs-vm ~]$ uname -a
> Linux btrfs-vm 5.15.94-1-lts #1 SMP Wed, 15 Feb 2023 07:09:02 +0000
> x86_64 GNU/Linux
> [adam@btrfs-vm ~]$ sudo mkfs.btrfs -f /dev/test/scratch1 > /dev/null
> [adam@btrfs-vm ~]$ sudo btrfstune -S 1 /dev/test/scratch1
> [adam@btrfs-vm ~]$ sudo mount /dev/test/scratch1 /mnt/btrfs/
> mount: /mnt/btrfs: WARNING: source write-protected, mounted read-only.
> [adam@btrfs-vm ~]$ sudo btrfs device add -f /dev/test/scratch2 /mnt/btrfs/
> Performing full device TRIM /dev/test/scratch2 (10.00GiB) ...
> [adam@btrfs-vm ~]$ sudo touch /mnt/btrfs/file
> [adam@btrfs-vm ~]$ mount | grep mnt/btrfs
> /dev/mapper/test-scratch2 on /mnt/btrfs type btrfs
> (rw,relatime,space_cache=v2,subvolid=5,subvol=/)
>
> So it looks like it's some extra backports causing the behavior change.
>
Actually, it is caused by util-linux, specifically the libmount.
v2.38 is good, but v2.39-rc1 is bad with the same kernel without
the fix.
Unfortunately, a bunch of libmount commits between these
versions are not bisectable.
So I have no specific commit but commits from 2b1db0951b9d
to f07412a04ca8.
> But I still strongly prefer to keep it RO.
> Even if it's a different fs under the hood, it still suddenly changes
> the RO/RW status of a mount point without letting the user to know.
Just my perspective—typically, we add an RW device to make a seed
filesystem writable. If one command can do it, that's great from
ease of use pov. But I’m fine with RO; it’s cleaner.
Thanks, Anand
>
> Thanks,
> Qu
>
>> $ mkfs.btrfs -fq /dev/loop0 :0
>> $ btrfstune -S1 /dev/loop0 :0
>> $ mount /dev/loop0 /btrfs :0
>> mount: /btrfs: WARNING: source write-protected, mounted read-only.
>>
>> $ cat /proc/self/mounts | grep btrfs :0
>> /dev/loop0 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/
>> 0 0
>>
>> $ findmnt -o SOURCE,UUID /btrfs :0
>> SOURCE UUID
>> /dev/loop0 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
>>
>> $ btrfs fi show -m :0
>> Label: none uuid: 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
>> Total devices 1 FS bytes used 144.00KiB
>> devid 1 size 3.00GiB used 536.00MiB path /dev/loop0
>>
>> $ ls /sys/fs/btrfs :0
>> 64f21b87-4e4c-4786-b2cd-c09a5ccd2afa
>> features
>>
>> $ btrfs dev add -f /dev/loop1 /btrfs :0
>>
>> # After adding the device, the path and UUID are different,
>> # so it’s a new filesystem. (But, as I said, I’m fine with
>> # keeping it read-only and needing remount,rw.
>>
>> $ cat /proc/self/mounts | grep btrfs :0
>> /dev/loop1 /btrfs btrfs ro,relatime,space_cache=v2,subvolid=5,subvol=/
>> 0 0
>>
>> $ findmnt -o SOURCE,UUID /btrfs :0
>> SOURCE UUID
>> /dev/loop1 948cea35-18db-45da-9ec8-3d46cb5f0413
>>
>> $ btrfs fi show -m :0
>> Label: none uuid: 948cea35-18db-45da-9ec8-3d46cb5f0413
>> Total devices 2 FS bytes used 144.00KiB
>> devid 1 size 3.00GiB used 520.00MiB path /dev/loop0
>> devid 2 size 3.00GiB used 576.00MiB path /dev/loop1
>>
>>
>> $ ls /sys/fs/btrfs :0
>> 948cea35-18db-45da-9ec8-3d46cb5f0413
>> features
>> ---------
>>
>>
>> Thanks, Anand
>>
>>> fs/btrfs/volumes.c | 4 ----
>>> 1 file changed, 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index dc9f54849f39..84e861dcb350 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -2841,8 +2841,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
>>> *fs_info, const char *device_path
>>> set_blocksize(device->bdev_file, BTRFS_BDEV_BLOCKSIZE);
>>> if (seeding_dev) {
>>> - btrfs_clear_sb_rdonly(sb);
>>> -
>>> /* GFP_KERNEL allocation must not be under
>>> device_list_mutex */
>>> seed_devices = btrfs_init_sprout(fs_info);
>>> if (IS_ERR(seed_devices)) {
>>> @@ -2985,8 +2983,6 @@ int btrfs_init_new_device(struct btrfs_fs_info
>>> *fs_info, const char *device_path
>>> mutex_unlock(&fs_info->chunk_mutex);
>>> mutex_unlock(&fs_info->fs_devices->device_list_mutex);
>>> error_trans:
>>> - if (seeding_dev)
>>> - btrfs_set_sb_rdonly(sb);
>>> if (trans)
>>> btrfs_end_transaction(trans);
>>> error_free_zone:
>>
>>
>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-17 16:41 ` Boris Burkov
@ 2024-10-21 18:56 ` David Sterba
2024-10-21 19:29 ` Boris Burkov
0 siblings, 1 reply; 23+ messages in thread
From: David Sterba @ 2024-10-21 18:56 UTC (permalink / raw)
To: Boris Burkov; +Cc: linux-btrfs, kernel-team
On Thu, Oct 17, 2024 at 09:41:59AM -0700, Boris Burkov wrote:
> On Thu, Oct 17, 2024 at 04:01:12PM +0200, David Sterba wrote:
> > On Tue, Oct 15, 2024 at 02:38:34PM -0700, Boris Burkov wrote:
> > > If you follow the seed/sprout wiki, it suggests the following workflow:
> > >
> > > btrfstune -S 1 seed_dev
> > > mount seed_dev mnt
> > > btrfs device add sprout_dev
> > > mount -o remount,rw mnt
> > >
> > > The first mount mounts the FS readonly, which results in not setting
> > > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > > somewhat surprisingly clears the readonly bit on the sb (though the
> > > mount is still practically readonly, from the users perspective...).
> > > Finally, the remount checks the readonly bit on the sb against the flag
> > > and sees no change, so it does not run the code intended to run on
> > > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> > >
> > > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > > does no work. This results in leaking deleted snapshots until we run out
> > > of space.
> > >
> > > I propose fixing it at the first departure from what feels reasonable:
> > > when we clear the readonly bit on the sb during device add.
> > >
> > > A new fstest I have written reproduces the bug and confirms the fix.
> > >
> > > Signed-off-by: Boris Burkov <boris@bur.io>
> > > ---
> > > Note that this is a resend of an old unmerged fix:
> > > https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> > > Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> > > were also explored but not merged around that time:
> > > https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
> > >
> > > I don't have a strong preference, but I would really like to see this
> > > trivial bug fixed. For what it is worth, we have been carrying this
> > > patch internally at Meta since I first sent it with no incident.
> >
> > We have an unresolved dispute about the fix and now the patch got to
> > for-next within a few days because it got two reviews but not mine.
> > The way you use it in Meta works for you but the fix is changing
> > behaviour so we can either ignore everybody else relying on the old
> > way or say that seeding is so niche that we don't care and see what we
> > can do once we get some report.
>
> Please feel free to remove it, and I am happy to review whatever other
> fix you think is best. Sorry for rushing, I just wanted to get it done
> and out of my head so I could move on to other issues.
I'm concerned because change like that needs an announcement,
documentation changes or eventually an optional change so both use cases
are available. I haven't merged it since the last time you or somebody
posted it because I don't see a way how to fix it without fixing the bug
and not breaking the use case.
> I assume your concern is that before this fix, the filesystem is actually
> read-write after the device add, which this patch breaks?
>
> My only argument against this is that the documentation has always said
> you needed to remount/cycle-mount after adding the sprout, so there is
> no fair reason to assume this would work. (In fact, it *doesn't*, the fs
> is once again in a unexpected, degraded, state...)
>
> But if existing LiveCD users are relying on this undocumented behavior,
> then I think you are right and it's a bad idea to break them.
The problem here is that we don't know how the feature is used, the
documentation came much later than the feature. So I take it as that
people rely on code, not documenation, even if there's an undocumented
behaviour.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] btrfs: do not clear read-only when adding sprout device
2024-10-21 18:56 ` David Sterba
@ 2024-10-21 19:29 ` Boris Burkov
0 siblings, 0 replies; 23+ messages in thread
From: Boris Burkov @ 2024-10-21 19:29 UTC (permalink / raw)
To: David Sterba; +Cc: linux-btrfs, kernel-team
On Mon, Oct 21, 2024 at 08:56:51PM +0200, David Sterba wrote:
> On Thu, Oct 17, 2024 at 09:41:59AM -0700, Boris Burkov wrote:
> > On Thu, Oct 17, 2024 at 04:01:12PM +0200, David Sterba wrote:
> > > On Tue, Oct 15, 2024 at 02:38:34PM -0700, Boris Burkov wrote:
> > > > If you follow the seed/sprout wiki, it suggests the following workflow:
> > > >
> > > > btrfstune -S 1 seed_dev
> > > > mount seed_dev mnt
> > > > btrfs device add sprout_dev
> > > > mount -o remount,rw mnt
> > > >
> > > > The first mount mounts the FS readonly, which results in not setting
> > > > BTRFS_FS_OPEN, and setting the readonly bit on the sb. The device add
> > > > somewhat surprisingly clears the readonly bit on the sb (though the
> > > > mount is still practically readonly, from the users perspective...).
> > > > Finally, the remount checks the readonly bit on the sb against the flag
> > > > and sees no change, so it does not run the code intended to run on
> > > > ro->rw transitions, leaving BTRFS_FS_OPEN unset.
> > > >
> > > > As a result, when the cleaner_kthread runs, it sees no BTRFS_FS_OPEN and
> > > > does no work. This results in leaking deleted snapshots until we run out
> > > > of space.
> > > >
> > > > I propose fixing it at the first departure from what feels reasonable:
> > > > when we clear the readonly bit on the sb during device add.
> > > >
> > > > A new fstest I have written reproduces the bug and confirms the fix.
> > > >
> > > > Signed-off-by: Boris Burkov <boris@bur.io>
> > > > ---
> > > > Note that this is a resend of an old unmerged fix:
> > > > https://lore.kernel.org/linux-btrfs/16c05d39566858bb8bc1e03bd19947cf2b601b98.1647906815.git.boris@bur.io/T/#u
> > > > Some other ideas for fixing it by modifying how we set BTRFS_FS_OPEN
> > > > were also explored but not merged around that time:
> > > > https://lore.kernel.org/linux-btrfs/cover.1654216941.git.anand.jain@oracle.com/
> > > >
> > > > I don't have a strong preference, but I would really like to see this
> > > > trivial bug fixed. For what it is worth, we have been carrying this
> > > > patch internally at Meta since I first sent it with no incident.
> > >
> > > We have an unresolved dispute about the fix and now the patch got to
> > > for-next within a few days because it got two reviews but not mine.
> > > The way you use it in Meta works for you but the fix is changing
> > > behaviour so we can either ignore everybody else relying on the old
> > > way or say that seeding is so niche that we don't care and see what we
> > > can do once we get some report.
> >
> > Please feel free to remove it, and I am happy to review whatever other
> > fix you think is best. Sorry for rushing, I just wanted to get it done
> > and out of my head so I could move on to other issues.
>
> I'm concerned because change like that needs an announcement,
> documentation changes or eventually an optional change so both use cases
> are available. I haven't merged it since the last time you or somebody
> posted it because I don't see a way how to fix it without fixing the bug
> and not breaking the use case.
>
That's fair. However, I assume we agree we need some fix, somewhere.
Anyone who is really currently relying on this and not cycle mounting
after adding the device doesn't get a cleaner thread, which includes
empty block group cleanup, autorelocation, deleting snapshots, and defrag.
I think subvolume cleanup and block group cleanup are probably the worst
to lose.
So let's step back from me impulsively stuffing this in to for-next and
do the right thing together :)
Thinking of our options, in no particular order:
1a. just land the fix
It's not great if a lot of people rely on the fs "working" after device
add. Some of them might quickly find the proper steps in the docs, but
this might confuse/upset people.
1b. land the fix but properly socialize it
We can also make btrfs device add notice that it is seed sprout and add
a loud message that you need to remount rw. If this requires skipping a
merge or two while we socialize it to obvious or google-able users, that
seems reasonable.
1c. land the fix and make btrfs-progs do the remount after the device add
This is kind of weird, but does preserve the expected behavior. We could
also add some seed sprout specific flags or commands as the "main" way to
do it.
2. have the device add detect the seed sprout case and really make the
fs read-write
If we are committed to the behavior, and don't believe in a userspace
only fix, then it could be possible to make the current path which
simply clears the ro bit also invoke some or all of the remount rw
logic.
3. do nothing (remove the fix from for-next)
I keep carrying an internal patch forever, upstream stays in danger of
running an fs without a cleaner. If no one runs a long-lived fs off a live
cd for long enough to care, maybe it is ok, too. But stuff like this is
out there in google, for example:
https://blog.elastocloud.org/2022/11/btrfs-seed-devices-for-ab-system-updates.html
That's all I can think of, but I'm definitely open to other ideas!
Thanks,
Boris
> > I assume your concern is that before this fix, the filesystem is actually
> > read-write after the device add, which this patch breaks?
> >
> > My only argument against this is that the documentation has always said
> > you needed to remount/cycle-mount after adding the sprout, so there is
> > no fair reason to assume this would work. (In fact, it *doesn't*, the fs
> > is once again in a unexpected, degraded, state...)
> >
> > But if existing LiveCD users are relying on this undocumented behavior,
> > then I think you are right and it's a bad idea to break them.
>
> The problem here is that we don't know how the feature is used, the
> documentation came much later than the feature. So I take it as that
> people rely on code, not documenation, even if there's an undocumented
> behaviour.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2024-10-21 19:30 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-15 21:38 [PATCH] btrfs: do not clear read-only when adding sprout device Boris Burkov
2024-10-15 22:00 ` Qu Wenruo
2024-10-15 22:12 ` Qu Wenruo
2024-10-15 23:23 ` Boris Burkov
2024-10-16 17:14 ` Anand Jain
2024-10-16 17:24 ` Boris Burkov
2024-10-17 20:47 ` Qu Wenruo
2024-10-18 11:54 ` Anand Jain
2024-10-17 14:01 ` David Sterba
2024-10-17 16:41 ` Boris Burkov
2024-10-21 18:56 ` David Sterba
2024-10-21 19:29 ` Boris Burkov
-- strict thread matches above, loose matches on Subject: below --
2022-03-21 23:56 Boris Burkov
2022-03-22 21:46 ` Josef Bacik
2022-03-23 0:52 ` Naohiro Aota
2022-03-23 18:16 ` Boris Burkov
2022-03-28 11:11 ` Anand Jain
2022-03-29 4:33 ` Naohiro Aota
2022-03-29 19:45 ` Boris Burkov
2022-03-23 10:44 ` Anand Jain
2022-03-23 18:25 ` Boris Burkov
2022-03-24 11:16 ` Anand Jain
2022-03-23 20:17 ` Josef Bacik
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox