From: Christian Brauner <brauner@kernel.org>
To: Josef Bacik <josef@toxicpanda.com>
Cc: Qu Wenruo <quwenruo.btrfs@gmx.com>,
Christoph Hellwig <hch@infradead.org>,
Amir Goldstein <amir73il@gmail.com>, Jan Kara <jack@suse.cz>,
Chris Mason <clm@fb.com>, David Sterba <dsterba@suse.com>,
linux-btrfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 0/3] fanotify support for btrfs sub-volumes
Date: Thu, 2 Nov 2023 10:48:35 +0100 [thread overview]
Message-ID: <20231102-ankurbeln-eingearbeitet-cbeb018bfedc@brauner> (raw)
In-Reply-To: <20231102051349.GA3292886@perftesting>
> We'll be converted to the new mount API tho, so I suppose that's something.
> Thanks,
Just in case you forgot about it. I did send a patch to convert btrfs to
the new mount api in June:
https://lore.kernel.org/all/20230626-fs-btrfs-mount-api-v1-0-045e9735a00b@kernel.org
Can I ask you to please please copy just two things from that series:
(1) Please get rid of the second filesystems type.
(2) Please fix the silent remount behavior when mounting a subvolume.
You might need my first patch for that from that series for (2).
+static int btrfs_get_tree_common(struct fs_context *fc)
+{
+ struct vfsmount *root_mnt = NULL;
+ struct fs_context *root_fc;
+ struct dentry *root_dentry;
+ struct btrfs_fs_context *ctx = fc->fs_private;
+ int ret;
+
+ if (WARN_ON(ctx->phase != BTRFS_FS_CONTEXT_PREPARE))
+ return -EINVAL;
+
+ root_fc = vfs_dup_fs_context(fc);
+ if (IS_ERR(root_fc))
+ return PTR_ERR(root_fc);
+
+ /*
+ * We've duplicated the security mount options above and we only
+ * need them to be set when we really create a new superblock.
+ * They're irrelevant when we mount the subvolume as the
+ * superblock does already exist at that point. So free the
+ * security blob here.
+ */
+ security_free_mnt_opts(&fc->security);
+ fc->security = NULL;
+
+ /* Create the superblock so we can mount a subtree later. */
+ ctx->phase = BTRFS_FS_CONTEXT_SUPER;
+
+ root_mnt = fc_mount(root_fc);
+ if (PTR_ERR_OR_ZERO(root_mnt) == -EBUSY) {
+ bool ro2rw = !(root_fc->sb_flags & SB_RDONLY);
+
+ if (ro2rw)
+ root_fc->sb_flags |= SB_RDONLY;
+ else
+ root_fc->sb_flags &= ~SB_RDONLY;
+
+ root_mnt = fc_mount(root_fc);
+ if (IS_ERR(root_mnt)) {
+ put_fs_context(root_fc);
+ return PTR_ERR(root_mnt);
+ }
+ ctx->root_mnt = root_mnt;
+
+ /*
+ * Ever since commit 0723a0473fb4 ("btrfs: allow
+ * mounting btrfs subvolumes with different ro/rw
+ * options") the following works:
+ *
+ * (i) mount /dev/sda3 -o subvol=foo,ro /mnt/foo
+ * (ii) mount /dev/sda3 -o subvol=bar,rw /mnt/bar
+ *
+ * which looks nice and innocent but is actually pretty
+ * intricate and deserves a long comment.
+ *
+ * On another filesystem a subvolume mount is close to
+ * something like:
+ *
+ * (iii) # create rw superblock + initial mount
+ * mount -t xfs /dev/sdb /opt/
+ *
+ * # create ro bind mount
+ * mount --bind -o ro /opt/foo /mnt/foo
+ *
+ * # unmount initial mount
+ * umount /opt
+ *
+ * Of course, there's some special subvolume sauce and
+ * there's the fact that the sb->s_root dentry is really
+ * swapped after mount_subtree(). But conceptually it's
+ * very close and will help us understand the issue.
+ *
+ * The old mount api didn't cleanly distinguish between
+ * a mount being made ro and a superblock being made ro.
+ * The only way to change the ro state of either object
+ * was by passing MS_RDONLY. If a new mount was created
+ * via mount(2) such as:
+ *
+ * mount("/dev/sdb", "/mnt", "xfs", MS_RDONLY, NULL);
+ *
+ * the MS_RDONLY flag being specified had two effects:
+ *
+ * (1) MNT_READONLY was raised -> the resulting mount
+ * got @mnt->mnt_flags |= MNT_READONLY raised.
+ *
+ * (2) MS_RDONLY was passed to the filesystem's mount
+ * method and the filesystems made the superblock
+ * ro. Note, how SB_RDONLY has the same value as
+ * MS_RDONLY and is raised whenever MS_RDONLY is
+ * passed through mount(2).
+ *
+ * Creating a subtree mount via (iii) ends up leaving a
+ * rw superblock with a subtree mounted ro.
+ *
+ * But consider the effect on the old mount api on btrfs
+ * subvolume mounting which combines the distinct step
+ * in (iii) into a a single step.
+ *
+ * By issuing (i) both the mount and the superblock are
+ * turned ro. Now when (ii) is issued the superblock is
+ * ro and thus even if the mount created for (ii) is rw
+ * it wouldn't help. Hence, btrfs needed to transition
+ * the superblock from ro to rw for (ii) which it did
+ * using an internal remount call (a bold choice...).
+ *
+ * IOW, subvolume mounting was inherently messy due to
+ * the ambiguity of MS_RDONLY in mount(2). Note, this
+ * ambiguity has mount(8) always translate "ro" to
+ * MS_RDONLY. IOW, in both (i) and (ii) "ro" becomes
+ * MS_RDONLY when passed by mount(8) to mount(2).
+ *
+ * Enter the new mount api. The new mount api
+ * disambiguates making a mount ro and making a
+ * superblock ro.
+ *
+ * (3) To turn a mount ro the MOUNT_ATTR_RDONLY flag can
+ * be used with either fsmount() or mount_setattr().
+ * This is a pure VFS level change for a specific
+ * mount or mount tree that is never seen by the
+ * filesystem itself.
+ *
+ * (4) To turn a superblock ro the "ro" flag must be
+ * used with fsconfig(FSCONFIG_SET_FLAG, "ro"). This
+ * option is seen by the filesytem in fc->sb_flags.
+ *
+ * This disambiguation has rather positive consequences.
+ * Mounting a subvolume ro will not also turn the
+ * superblock ro. Only the mount for the subvolume will
+ * become ro.
+ *
+ * So, if the superblock creation request comes from the
+ * new mount api the caller must've explicitly done:
+ *
+ * fsconfig(FSCONFIG_SET_FLAG, "ro")
+ * fsmount/mount_setattr(MOUNT_ATTR_RDONLY)
+ *
+ * IOW, at some point the caller must have explicitly
+ * turned the whole superblock ro and we shouldn't just
+ * undo it like we did for the old mount api. In any
+ * case, it lets us avoid this nasty hack in the new
+ * mount api.
+ *
+ * Consequently, the remounting hack must only be used
+ * for requests originating from the old mount api and
+ * should be marked for full deprecation so it can be
+ * turned off in a couple of years.
+ *
+ * The new mount api has no reason to support this hack.
+ */
+ if (root_fc->oldapi && ro2rw) {
+ /*
+ * This magic internal remount is a pretty bold
+ * move as the VFS reserves the right to protect
+ * ro->rw transitions on the VFS layer similar
+ * to how it protects rw->ro transitions.
+ */
+ ret = btrfs_legacy_reconfigure(root_fc);
+ if (ret)
+ root_mnt = ERR_PTR(ret);
+ }
+ }
+ put_fs_context(root_fc);
+ if (IS_ERR(root_mnt))
+ return PTR_ERR(root_mnt);
+ ctx->root_mnt = root_mnt;
+
+ root_dentry = mount_subvol(fc);
+ if (IS_ERR(root_dentry))
+ return PTR_ERR(root_dentry);
+
+ fc->root = root_dentry;
+ return 0;
+}
next prev parent reply other threads:[~2023-11-02 9:48 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-25 13:50 [PATCH 0/3] fanotify support for btrfs sub-volumes Amir Goldstein
2023-10-25 13:50 ` [PATCH 1/3] fs: define a new super operation to get fsid Amir Goldstein
2023-10-25 13:50 ` [PATCH 2/3] btrfs: implement " Amir Goldstein
2023-10-25 13:50 ` [PATCH 3/3] fanotify: support reporting events with fid on btrfs sub-volumes Amir Goldstein
2023-10-25 15:34 ` [PATCH 0/3] fanotify support for " Christoph Hellwig
2023-10-25 17:04 ` Jan Kara
2023-10-27 5:44 ` Christoph Hellwig
2023-10-27 10:58 ` Jan Kara
2023-10-25 21:06 ` Josef Bacik
2023-10-25 23:02 ` Qu Wenruo
2023-10-26 5:49 ` Amir Goldstein
2023-10-27 5:46 ` Christoph Hellwig
[not found] ` <20231027131726.GA2915471@perftesting>
2023-10-28 5:57 ` Amir Goldstein
2023-10-30 13:25 ` Christoph Hellwig
2023-10-31 12:14 ` Christian Brauner
2023-10-31 12:22 ` Christoph Hellwig
2023-10-31 12:50 ` Christian Brauner
2023-10-31 17:06 ` Christoph Hellwig
2023-11-01 0:03 ` Qu Wenruo
2023-11-03 14:21 ` Christoph Hellwig
2023-11-01 8:16 ` Christian Brauner
2023-11-01 8:41 ` Qu Wenruo
2023-11-01 9:52 ` Christian Brauner
2023-11-02 5:13 ` Josef Bacik
2023-11-02 8:53 ` Amir Goldstein
2023-11-02 9:48 ` Christian Brauner [this message]
2023-11-02 12:34 ` Josef Bacik
2023-11-02 17:07 ` David Sterba
2023-11-02 20:32 ` Josef Bacik
2023-11-03 6:56 ` Christian Brauner
2023-11-03 13:52 ` Josef Bacik
2023-11-02 11:07 ` Christian Brauner
2023-11-03 14:28 ` Christoph Hellwig
2023-11-03 15:47 ` Christian Brauner
2023-11-06 7:53 ` Christoph Hellwig
2023-11-06 8:18 ` Qu Wenruo
2023-11-06 9:56 ` Christian Brauner
2023-11-06 12:25 ` Christoph Hellwig
2023-11-06 10:03 ` Christian Brauner
2023-11-06 10:41 ` Qu Wenruo
2023-11-06 10:59 ` Christian Brauner
2023-11-06 12:30 ` Christoph Hellwig
2023-11-06 13:05 ` Christian Brauner
2023-11-06 17:10 ` Christoph Hellwig
2023-11-07 8:58 ` Christian Brauner
2023-11-08 7:56 ` Christoph Hellwig
2023-11-08 8:09 ` Christian Brauner
2023-11-08 8:12 ` Christoph Hellwig
2023-11-08 8:22 ` Christian Brauner
2023-11-08 14:07 ` Christoph Hellwig
2023-11-08 15:57 ` Christian Brauner
2023-11-06 12:29 ` Christoph Hellwig
2023-11-06 13:47 ` Christian Brauner
2023-11-06 17:13 ` Christoph Hellwig
2023-11-06 22:42 ` Josef Bacik
2023-11-07 9:06 ` Christian Brauner
2023-11-08 7:52 ` Christoph Hellwig
2023-11-08 8:27 ` Christian Brauner
2023-11-08 14:08 ` Christoph Hellwig
2023-11-08 16:16 ` Christian Brauner
2023-11-08 16:20 ` Christian Brauner
2023-11-09 6:55 ` Christoph Hellwig
2023-11-09 9:07 ` Christian Brauner
2023-11-09 14:41 ` Christoph Hellwig
2023-11-10 9:33 ` Christian Brauner
2023-11-10 10:31 ` Amir Goldstein
2023-11-09 6:53 ` Christoph Hellwig
2023-11-08 7:51 ` Christoph Hellwig
2023-11-08 11:08 ` Jan Kara
2023-11-08 14:11 ` Christoph Hellwig
2023-11-06 9:03 ` Jan Kara
2023-11-06 9:52 ` Christian Brauner
2023-11-06 12:22 ` Jan Kara
2023-11-03 14:23 ` Christoph Hellwig
2023-11-03 14:22 ` Christoph Hellwig
2023-10-25 17:17 ` Amir Goldstein
2023-10-25 18:02 ` Amir Goldstein
2023-10-26 12:17 ` Jan Kara
2023-10-26 12:36 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231102-ankurbeln-eingearbeitet-cbeb018bfedc@brauner \
--to=brauner@kernel.org \
--cc=amir73il@gmail.com \
--cc=clm@fb.com \
--cc=dsterba@suse.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox