From: Christian Brauner <brauner@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: linux-btrfs@vger.kernel.org, kernel-team@fb.com,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 13/18] btrfs: handle the ro->rw transition for mounting different subovls
Date: Wed, 8 Nov 2023 09:41:33 +0100 [thread overview]
Message-ID: <20231108-hallen-heimisch-d6bdb9e23cb7@brauner> (raw)
In-Reply-To: <0a73915edbb8d05e30d167351ea8c709a9bfe447.1699308010.git.josef@toxicpanda.com>
On Mon, Nov 06, 2023 at 05:08:21PM -0500, Josef Bacik wrote:
> This is an oddity that we've carried around since 0723a0473fb4 ("btrfs:
> allow mounting btrfs subvolumes with different ro/rw options") where
> we'll under the covers flip the file system to RW if you're mixing and
> matching ro/rw options with different subvol mounts. The first mount is
> what the super gets setup as, so we'd handle this by remount the super
> as rw under the covers to facilitate this behavior.
>
> With the new mount API we can't really allow this, because user space
> has the ability to specify the super block settings, and the mount
> settings. So if the user explicitly set the super block as read only,
> and then tried to mount a rw mount with the super block we'll reject
> this. However the old API was less descriptive and thus we allowed this
> kind of behavior.
>
> This patch preserves this behavior for the old api calls. This is
> inspired by Christians work, and includes one of his comments, and thus
> is included in the link below.
>
> Link: https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-2-045e9735a00b@kernel.org/
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>
Just note that all capitalization was removed from the comment
preceeding btrfs_reconfigure_for_mount() by accident. You might want to
fix that up/recopy that comment.
> fs/btrfs/super.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 132 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index 4ace42e08bff..e2ac0801211d 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2513,13 +2513,15 @@ static int btrfs_reconfigure(struct fs_context *fc)
> struct btrfs_fs_context *ctx = fc->fs_private;
> struct btrfs_fs_context old_ctx;
> int ret = 0;
> + bool mount_reconfigure = (fc->s_fs_info != NULL);
>
> btrfs_info_to_ctx(fs_info, &old_ctx);
>
> sync_filesystem(sb);
> set_bit(BTRFS_FS_STATE_REMOUNTING, &fs_info->fs_state);
>
> - if (!check_options(fs_info, &ctx->mount_opt, fc->sb_flags))
> + if (!mount_reconfigure &&
> + !check_options(fs_info, &ctx->mount_opt, fc->sb_flags))
> return -EINVAL;
>
> ret = btrfs_check_features(fs_info, !(fc->sb_flags & SB_RDONLY));
> @@ -2922,6 +2924,133 @@ static int btrfs_get_tree_super(struct fs_context *fc)
> return ret;
> }
>
> +/*
> + * Christian wrote this long comment about what we're doing here, preserving it
> + * so the history of this change is preserved.
> + *
> + * ever since commit 0723a0473fb4 ("btrfs: allow * mounting btrfs subvolumes
> + * with different ro/rw * options") the following works:
> + *
> + * (i) mount /dev/sda3 -o subvol=foo,ro /mnt/foo
> + * (ii) mount /dev/sda3 -o subvol=bar,rw /mnt/bar
> + *
> + * which looks nice and innocent but is actually pretty * intricate and
> + * deserves a long comment.
> + *
> + * on another filesystem a subvolume mount is close to * something like:
> + *
> + * (iii) # create rw superblock + initial mount
> + * mount -t xfs /dev/sdb /opt/
> + *
> + * # create ro bind mount
> + * mount --bind -o ro /opt/foo /mnt/foo
> + *
> + * # unmount initial mount
> + * umount /opt
> + *
> + * of course, there's some special subvolume sauce and there's the fact that the
> + * sb->s_root dentry is really swapped after mount_subtree(). but conceptually
> + * it's very close and will help us understand the issue.
> + *
> + * the old mount api didn't cleanly distinguish between a mount being made ro
> + * and a superblock being made ro. the only way to change the ro state of
> + * either object was by passing ms_rdonly. if a new mount was created via
> + * mount(2) such as:
> + *
> + * mount("/dev/sdb", "/mnt", "xfs", ms_rdonly, null);
> + *
> + * the ms_rdonly flag being specified had two effects:
> + *
> + * (1) mnt_readonly was raised -> the resulting mount got
> + * @mnt->mnt_flags |= mnt_readonly raised.
> + *
> + * (2) ms_rdonly was passed to the filesystem's mount method and the filesystems
> + * made the superblock ro. note, how sb_rdonly has the same value as
> + * ms_rdonly and is raised whenever ms_rdonly is passed through mount(2).
> + *
> + * creating a subtree mount via (iii) ends up leaving a rw superblock with a
> + * subtree mounted ro.
> + *
> + * but consider the effect on the old mount api on btrfs subvolume mounting
> + * which combines the distinct step in (iii) into a a single step.
> + *
> + * by issuing (i) both the mount and the superblock are turned ro. now when (ii)
> + * is issued the superblock is ro and thus even if the mount created for (ii) is
> + * rw it wouldn't help. hence, btrfs needed to transition the superblock from ro
> + * to rw for (ii) which it did using an internal remount call (a bold
> + * choice...).
> + *
> + * iow, subvolume mounting was inherently messy due to the ambiguity of
> + * ms_rdonly in mount(2). note, this ambiguity has mount(8) always translate
> + * "ro" to ms_rdonly. iow, in both (i) and (ii) "ro" becomes ms_rdonly when
> + * passed by mount(8) to mount(2).
> + *
> + * enter the new mount api. the new mount api disambiguates making a mount ro
> + * and making a superblock ro.
> + *
> + * (3) to turn a mount ro the mount_attr_rdonly flag can be used with either
> + * fsmount() or mount_setattr() this is a pure vfs level change for a
> + * specific mount or mount tree that is never seen by the filesystem itself.
> + *
> + * (4) to turn a superblock ro the "ro" flag must be used with
> + * fsconfig(fsconfig_set_flag, "ro"). this option is seen by the filesytem
> + * in fc->sb_flags.
> + *
> + * this disambiguation has rather positive consequences. mounting a subvolume
> + * ro will not also turn the superblock ro. only the mount for the subvolume
> + * will become ro.
> + *
> + * so, if the superblock creation request comes from the new mount api the
> + * caller must've explicitly done:
> + *
> + * fsconfig(fsconfig_set_flag, "ro")
> + * fsmount/mount_setattr(mount_attr_rdonly)
> + *
> + * iow, at some point the caller must have explicitly turned the whole
> + * superblock ro and we shouldn't just undo it like we did for the old mount
> + * api. in any case, it lets us avoid this nasty hack in the new mount api.
> + *
> + * Consequently, the remounting hack must only be used for requests originating
> + * from the old mount api and should be marked for full deprecation so it can be
> + * turned off in a couple of years.
> + *
> + * The new mount api has no reason to support this hack.
> + */
> +static struct vfsmount *btrfs_reconfigure_for_mount(struct fs_context *fc)
> +{
> + struct vfsmount *mnt;
> + int ret;
> + bool ro2rw = !(fc->sb_flags & SB_RDONLY);
> +
> + /*
> + * We got an EBUSY because our SB_RDONLY flag didn't match the existing
> + * super block, so invert our setting here and re-try the mount so we
> + * can get our vfsmount.
> + */
> + if (ro2rw)
> + fc->sb_flags |= SB_RDONLY;
> + else
> + fc->sb_flags &= ~SB_RDONLY;
> +
> + mnt = fc_mount(fc);
> + if (IS_ERR(mnt))
> + return mnt;
> +
> + if (!fc->oldapi || !ro2rw)
> + return mnt;
> +
> + /* We need to convert to rw, call reconfigure */
> + fc->sb_flags &= ~SB_RDONLY;
> + down_write(&mnt->mnt_sb->s_umount);
> + ret = btrfs_reconfigure(fc);
> + up_write(&mnt->mnt_sb->s_umount);
> + if (ret) {
> + mntput(mnt);
> + return ERR_PTR(ret);
> + }
> + return mnt;
> +}
> +
> static int btrfs_get_tree_subvol(struct fs_context *fc)
> {
> struct btrfs_fs_info *fs_info = NULL;
> @@ -2971,6 +3100,8 @@ static int btrfs_get_tree_subvol(struct fs_context *fc)
> fc->security = NULL;
>
> mnt = fc_mount(dup_fc);
> + if (PTR_ERR_OR_ZERO(mnt) == -EBUSY)
> + mnt = btrfs_reconfigure_for_mount(dup_fc);
> put_fs_context(dup_fc);
> if (IS_ERR(mnt))
> return PTR_ERR(mnt);
> --
> 2.41.0
>
next prev parent reply other threads:[~2023-11-08 8:41 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-06 22:08 [PATCH 00/18] btrfs: convert to the new mount API Josef Bacik
2023-11-06 22:08 ` [PATCH 01/18] fs: indicate request originates from old mount api Josef Bacik
2023-11-08 7:59 ` Christoph Hellwig
2023-11-06 22:08 ` [PATCH 02/18] btrfs: split out the mount option validation code into its own helper Josef Bacik
2023-11-06 22:08 ` [PATCH 03/18] btrfs: set default compress type at btrfs_init_fs_info time Josef Bacik
2023-11-06 22:08 ` [PATCH 04/18] btrfs: move space cache settings into open_ctree Josef Bacik
2023-11-07 15:14 ` Johannes Thumshirn
2023-11-06 22:08 ` [PATCH 05/18] btrfs: do not allow free space tree rebuild on extent tree v2 Josef Bacik
2023-11-06 22:08 ` [PATCH 06/18] btrfs: split out ro->rw and rw->ro helpers into their own functions Josef Bacik
2023-11-07 15:16 ` Johannes Thumshirn
2023-11-08 15:52 ` Josef Bacik
2023-11-06 22:08 ` [PATCH 07/18] btrfs: add a NOSPACECACHE mount option flag Josef Bacik
2023-11-06 22:08 ` [PATCH 08/18] btrfs: add fs_parameter definitions Josef Bacik
2023-11-06 22:08 ` [PATCH 09/18] btrfs: add parse_param callback for the new mount api Josef Bacik
2023-11-06 22:08 ` [PATCH 10/18] btrfs: add fs context handling functions Josef Bacik
2023-11-08 8:46 ` Christian Brauner
2023-11-06 22:08 ` [PATCH 11/18] btrfs: add reconfigure callback for fs_context Josef Bacik
2023-11-06 22:08 ` [PATCH 12/18] btrfs: add get_tree callback for new mount API Josef Bacik
2023-11-08 9:00 ` Christian Brauner
2023-11-06 22:08 ` [PATCH 13/18] btrfs: handle the ro->rw transition for mounting different subovls Josef Bacik
2023-11-08 8:41 ` Christian Brauner [this message]
2023-11-08 15:53 ` Josef Bacik
2023-11-06 22:08 ` [PATCH 14/18] btrfs: switch to the new mount API Josef Bacik
2023-11-08 9:03 ` Christian Brauner
2023-11-06 22:08 ` [PATCH 15/18] btrfs: move the device specific mount options to super.c Josef Bacik
2023-11-06 22:08 ` [PATCH 16/18] btrfs: remove old mount API code Josef Bacik
2023-11-06 22:08 ` [PATCH 17/18] btrfs: move one shot mount option clearing to super.c Josef Bacik
2023-11-06 22:08 ` [PATCH 18/18] btrfs: set clear_cache if we use usebackuproot Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231108-hallen-heimisch-d6bdb9e23cb7@brauner \
--to=brauner@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox