* [PATCH 0/2] btrfs: fixes for the new fd-based API
@ 2024-10-30 0:55 Qu Wenruo
2024-10-30 0:55 ` [PATCH 1/2] btrfs: fix per-subvolume RO/RW flags with new mount API Qu Wenruo
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-10-30 0:55 UTC (permalink / raw)
To: linux-btrfs
There are two critical bugs, all introduced by the same new fd-based
mount API migration.
The first one will break the per-subvolume RW/RO mount, if using the
new fd-based mount API.
This is caused by the untrue promise that the new API will not set the
RO flag for the fsconfig() call (which sets the super flags), but only
set RO attribute for the mount point.
Based on the bad promise, we skip the RO->RW reconfiguration for the new
API based mount call. But since the new fd-based mount is already
rolling out to the end users, and it still sets RO flags for both super
and mount point.
So we should still do the same RO->RW reconfiguration, no matter the
API.
The second one is a long existing problem related the RO->RW
reconfiguration itself, where we retry without any lock, resulting race
between reconfiguration and RO/RW mounts.
This will leads to a failure to mount, exposed by SLE-micro bug report.
Those two bugs are all small but critical to the core function of btrfs,
thus should be queued for -rc kernels.
Qu Wenruo (2):
btrfs: fix per-subvolume RO/RW flags with new mount API
btrfs: fix mount failure due to remount races
fs/btrfs/super.c | 88 +++++++++++++++++-------------------------------
1 file changed, 31 insertions(+), 57 deletions(-)
--
2.47.0
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 1/2] btrfs: fix per-subvolume RO/RW flags with new mount API
2024-10-30 0:55 [PATCH 0/2] btrfs: fixes for the new fd-based API Qu Wenruo
@ 2024-10-30 0:55 ` Qu Wenruo
2024-10-30 0:55 ` [PATCH 2/2] btrfs: fix mount failure due to remount races Qu Wenruo
2024-11-26 15:50 ` [PATCH 0/2] btrfs: fixes for the new fd-based API David Sterba
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-10-30 0:55 UTC (permalink / raw)
To: linux-btrfs
[BUG]
With util-linux 2.40.2, the mount is already utilizing the new mount
API. e.g:
# strace mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test/
...
fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
fsmount(3, FSMOUNT_CLOEXEC, 0) = 4
mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
But this leads to a new problem, that per-subvolume RO/RW mount no
longer works, if the initial mount is RO:
# mount -o subvol=subv1,ro /dev/test/scratch1 /mnt/test
# mount -o rw,subvol=subv2 /dev/test/scratch1 /mnt/scratch
# mount | grep mnt
/dev/mapper/test-scratch1 on /mnt/test type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=256,subvol=/subv1)
/dev/mapper/test-scratch1 on /mnt/scratch type btrfs (ro,relatime,discard=async,space_cache=v2,subvolid=257,subvol=/subv2)
# touch /mnt/scratch/foobar
touch: cannot touch '/mnt/scratch/foobar': Read-only file system
[CAUSE]
We have the remount "hack" to handle the RO->RW change, but if the mount
is using the new mount API, we do not do the hack, and rely on the mount
tool NOT to set the ro flag.
But that's not how the mount tool is doing for the new API:
fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/mapper/test-scratch1", 0) = 0
fsconfig(3, FSCONFIG_SET_STRING, "subvol", "subv1", 0) = 0
fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 <<<< Setting RO flag for super block
fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
fsmount(3, FSMOUNT_CLOEXEC, 0) = 4
mount_setattr(4, "", AT_EMPTY_PATH, {attr_set=MOUNT_ATTR_RDONLY, attr_clr=0, propagation=0 /* MS_??? */, userns_fd=0}, 32) = 0
move_mount(4, "", AT_FDCWD, "/mnt/test", MOVE_MOUNT_F_EMPTY_PATH) = 0
This means we will set the super block RO at the first mount.
Later RW mount will not try to reconfigure the fs to RW because the
mount tool is already using the new API.
This totally breaks the per-subvolume RO/RW mount behavior.
[FIX]
Do not skip the reconfiguration even using the new API.
The old comments are just expecting the mount tool to properly skip RO
flag set even we specify "ro", which is not the reality.
And remove the comments pushing all the responsibility to mount command,
replace them with ones matching the reality instead.
Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
fs/btrfs/super.c | 24 +++++-------------------
1 file changed, 5 insertions(+), 19 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 6cc9291c4552..d77cce8d633e 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1977,25 +1977,11 @@ static int btrfs_get_tree_super(struct fs_context *fc)
* fsconfig(FSCONFIG_SET_FLAG, "ro"). This option is seen by the filesystem
* in fc->sb_flags.
*
- * This disambiguation has rather positive consequences. Mounting a subvolume
- * ro will not also turn the superblock ro. Only the mount for the subvolume
- * will become ro.
+ * But in reality, even the newer util-linux mount command, which is already
+ * utilizing the new mount API, is still setting fsconfig(FSCONFIG_SET_FLAG, "ro")
+ * no matter if it's a btrfs or not, setting the whole super block RO.
*
- * So, if the superblock creation request comes from the new mount API the
- * caller must have explicitly done:
- *
- * fsconfig(FSCONFIG_SET_FLAG, "ro")
- * fsmount/mount_setattr(MOUNT_ATTR_RDONLY)
- *
- * IOW, at some point the caller must have explicitly turned the whole
- * superblock ro and we shouldn't just undo it like we did for the old mount
- * API. In any case, it lets us avoid the hack in the new mount API.
- *
- * Consequently, the remounting hack must only be used for requests originating
- * from the old mount API and should be marked for full deprecation so it can be
- * turned off in a couple of years.
- *
- * The new mount API has no reason to support this hack.
+ * So here we always needs the remount hack to support per-subvolume RO/RW flags.
*/
static struct vfsmount *btrfs_reconfigure_for_mount(struct fs_context *fc)
{
@@ -2017,7 +2003,7 @@ static struct vfsmount *btrfs_reconfigure_for_mount(struct fs_context *fc)
if (IS_ERR(mnt))
return mnt;
- if (!fc->oldapi || !ro2rw)
+ if (!ro2rw)
return mnt;
/* We need to convert to rw, call reconfigure. */
--
2.47.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH 2/2] btrfs: fix mount failure due to remount races
2024-10-30 0:55 [PATCH 0/2] btrfs: fixes for the new fd-based API Qu Wenruo
2024-10-30 0:55 ` [PATCH 1/2] btrfs: fix per-subvolume RO/RW flags with new mount API Qu Wenruo
@ 2024-10-30 0:55 ` Qu Wenruo
2024-11-26 15:50 ` [PATCH 0/2] btrfs: fixes for the new fd-based API David Sterba
2 siblings, 0 replies; 4+ messages in thread
From: Qu Wenruo @ 2024-10-30 0:55 UTC (permalink / raw)
To: linux-btrfs; +Cc: Enno Gotthold, Fabian Vogt
[BUG]
The following reproducer can cause btrfs mount to fail:
dev="/dev/test/scratch1"
mnt1="/mnt/test"
mnt2="/mnt/scratch"
mkfs.btrfs -f $dev
mount $dev $mnt1
btrfs subvolume create $mnt1/subvol1
btrfs subvolume create $mnt1/subvol2
umount $mnt1
mount $dev $mnt1 -o subvol=subvol1
while mount -o remount,ro $mnt1; do mount -o remount,rw $mnt1; done &
bg=$!
while mount $dev $mnt2 -o subvol=subvol2; do umount $mnt2; done
kill $bg
wait
umount -R $mnt1
umount -R $mnt2
The script will fail with the following error:
mount: /mnt/scratch: /dev/mapper/test-scratch1 already mounted on /mnt/test.
dmesg(1) may have more information after failed mount system call.
umount: /mnt/test: target is busy.
umount: /mnt/scratch/: not mounted
And there is no kernel error message.
[CAUSE]
During the btrfs mount, to support mounting different subvolumes with
different RO/RW flags, we have a small hack during the mount:
Retry with matching RO flags if the initial mount fail with -EBUSY.
The problem is, during that retry we do not hold any super block lock
(s_umount), this meanings there can be a remount process changing the RO
flags of the original fs super block.
If so, we can have an EBUSY error during retry.
And this time we treat any failure as an error, without any retry and
cause the above EBUSY mount failure.
[FIX]
The current retry behavior is racy because we do not have a super block
thus no way to hold s_umount to prevent the race with remount.
Solve the root problem by allowing fc->sb_flags to mismatch from the
sb->s_flags at btrfs_get_tree_super().
Then at the re-entry point btrfs_get_tree_subvol(), manually check the
fc->s_flags against sb->s_flags, if it's a RO->RW mismatch, then
reconfigure with s_umount lock hold.
Fixes: f044b318675f ("btrfs: handle the ro->rw transition for mounting different subvolumes")
Reported-by: Enno Gotthold <egotthold@suse.com>
Reported-by: Fabian Vogt <fvogt@suse.com>
[ Special thanks for the reproducer and early analyze pointing to btrfs ]
Link: https://bugzilla.suse.com/show_bug.cgi?id=1231836
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Instead of brute force retry-until-success-or-failure, allow sb flags
mismatch during btrfs_get_tree_super()
So that we can utilize sb->s_umount to prevent race.
- Rebased after patch "btrfs: fix per-subvolume RO/RW flags with new mount API"
Fortunately or unfortunately my distro is already utilizing the new
mount API and I found the per-subvolume RO/RW support is broken again.
Unlike the original expectation, the new API is not the new hope, but
allows the old bug to strike back.
So this patch can only be applied after that fix.
---
fs/btrfs/super.c | 66 ++++++++++++++++++++----------------------------
1 file changed, 27 insertions(+), 39 deletions(-)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d77cce8d633e..d137ce2b5038 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1885,18 +1885,21 @@ static int btrfs_get_tree_super(struct fs_context *fc)
if (sb->s_root) {
btrfs_close_devices(fs_devices);
- if ((fc->sb_flags ^ sb->s_flags) & SB_RDONLY)
- ret = -EBUSY;
+ /*
+ * At this stage we may have RO flag mismatch between
+ * fc->sb_flags and sb->s_flags.
+ * Caller should detect such mismatch and reconfigure
+ * with sb->s_umount rwsem hold if needed.
+ */
} else {
snprintf(sb->s_id, sizeof(sb->s_id), "%pg", bdev);
shrinker_debugfs_rename(sb->s_shrink, "sb-btrfs:%s", sb->s_id);
btrfs_sb(sb)->bdev_holder = &btrfs_fs_type;
ret = btrfs_fill_super(sb, fs_devices);
- }
-
- if (ret) {
- deactivate_locked_super(sb);
- return ret;
+ if (ret) {
+ deactivate_locked_super(sb);
+ return ret;
+ }
}
btrfs_clear_oneshot_options(fs_info);
@@ -1983,39 +1986,18 @@ static int btrfs_get_tree_super(struct fs_context *fc)
*
* So here we always needs the remount hack to support per-subvolume RO/RW flags.
*/
-static struct vfsmount *btrfs_reconfigure_for_mount(struct fs_context *fc)
+static int btrfs_reconfigure_for_mount(struct fs_context *fc, struct vfsmount *mnt)
{
- struct vfsmount *mnt;
- int ret;
- const bool ro2rw = !(fc->sb_flags & SB_RDONLY);
+ int ret = 0;
- /*
- * We got an EBUSY because our SB_RDONLY flag didn't match the existing
- * super block, so invert our setting here and retry the mount so we
- * can get our vfsmount.
- */
- if (ro2rw)
- fc->sb_flags |= SB_RDONLY;
- else
- fc->sb_flags &= ~SB_RDONLY;
+ if (fc->sb_flags & SB_RDONLY)
+ return ret;
- mnt = fc_mount(fc);
- if (IS_ERR(mnt))
- return mnt;
-
- if (!ro2rw)
- return mnt;
-
- /* We need to convert to rw, call reconfigure. */
- fc->sb_flags &= ~SB_RDONLY;
down_write(&mnt->mnt_sb->s_umount);
- ret = btrfs_reconfigure(fc);
+ if (!(fc->sb_flags & SB_RDONLY) && (mnt->mnt_sb->s_flags & SB_RDONLY))
+ ret = btrfs_reconfigure(fc);
up_write(&mnt->mnt_sb->s_umount);
- if (ret) {
- mntput(mnt);
- return ERR_PTR(ret);
- }
- return mnt;
+ return ret;
}
static int btrfs_get_tree_subvol(struct fs_context *fc)
@@ -2025,6 +2007,7 @@ static int btrfs_get_tree_subvol(struct fs_context *fc)
struct fs_context *dup_fc;
struct dentry *dentry;
struct vfsmount *mnt;
+ int ret = 0;
/*
* Setup a dummy root and fs_info for test/set super. This is because
@@ -2067,11 +2050,16 @@ static int btrfs_get_tree_subvol(struct fs_context *fc)
fc->security = NULL;
mnt = fc_mount(dup_fc);
- if (PTR_ERR_OR_ZERO(mnt) == -EBUSY)
- mnt = btrfs_reconfigure_for_mount(dup_fc);
- put_fs_context(dup_fc);
- if (IS_ERR(mnt))
+ if (IS_ERR(mnt)) {
+ put_fs_context(dup_fc);
return PTR_ERR(mnt);
+ }
+ ret = btrfs_reconfigure_for_mount(dup_fc, mnt);
+ put_fs_context(dup_fc);
+ if (ret) {
+ mntput(mnt);
+ return ret;
+ }
/*
* This free's ->subvol_name, because if it isn't set we have to
--
2.47.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH 0/2] btrfs: fixes for the new fd-based API
2024-10-30 0:55 [PATCH 0/2] btrfs: fixes for the new fd-based API Qu Wenruo
2024-10-30 0:55 ` [PATCH 1/2] btrfs: fix per-subvolume RO/RW flags with new mount API Qu Wenruo
2024-10-30 0:55 ` [PATCH 2/2] btrfs: fix mount failure due to remount races Qu Wenruo
@ 2024-11-26 15:50 ` David Sterba
2 siblings, 0 replies; 4+ messages in thread
From: David Sterba @ 2024-11-26 15:50 UTC (permalink / raw)
To: Qu Wenruo; +Cc: linux-btrfs
On Wed, Oct 30, 2024 at 11:25:46AM +1030, Qu Wenruo wrote:
> There are two critical bugs, all introduced by the same new fd-based
> mount API migration.
>
> The first one will break the per-subvolume RW/RO mount, if using the
> new fd-based mount API.
> This is caused by the untrue promise that the new API will not set the
> RO flag for the fsconfig() call (which sets the super flags), but only
> set RO attribute for the mount point.
>
> Based on the bad promise, we skip the RO->RW reconfiguration for the new
> API based mount call. But since the new fd-based mount is already
> rolling out to the end users, and it still sets RO flags for both super
> and mount point.
>
> So we should still do the same RO->RW reconfiguration, no matter the
> API.
>
>
> The second one is a long existing problem related the RO->RW
> reconfiguration itself, where we retry without any lock, resulting race
> between reconfiguration and RO/RW mounts.
>
> This will leads to a failure to mount, exposed by SLE-micro bug report.
>
>
> Those two bugs are all small but critical to the core function of btrfs,
> thus should be queued for -rc kernels.
>
> Qu Wenruo (2):
> btrfs: fix per-subvolume RO/RW flags with new mount API
> btrfs: fix mount failure due to remount races
For the record, the first patch got merged to 6.12 and I've not added
the second one to for-next, targeting some 6.13-rc.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-11-26 15:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-30 0:55 [PATCH 0/2] btrfs: fixes for the new fd-based API Qu Wenruo
2024-10-30 0:55 ` [PATCH 1/2] btrfs: fix per-subvolume RO/RW flags with new mount API Qu Wenruo
2024-10-30 0:55 ` [PATCH 2/2] btrfs: fix mount failure due to remount races Qu Wenruo
2024-11-26 15:50 ` [PATCH 0/2] btrfs: fixes for the new fd-based API David Sterba
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox