From: Anand Jain <anand.jain@oracle.com>
To: "Guilherme G. Piccoli" <gpiccoli@igalia.com>,
linux-btrfs@vger.kernel.org
Cc: clm@fb.com, josef@toxicpanda.com, dsterba@suse.com,
linux-fsdevel@vger.kernel.org, kernel@gpiccoli.net,
kernel-dev@igalia.com, david@fromorbit.com, kreijack@libero.it,
johns@valvesoftware.com, ludovico.denittis@collabora.com,
quwenruo.btrfs@gmx.com, wqu@suse.com, vivek@collabora.com
Subject: Re: [PATCH 2/3] btrfs: Introduce the single-dev feature
Date: Thu, 24 Aug 2023 00:31:39 +0800 [thread overview]
Message-ID: <9dae9ca5-be94-af95-e7c3-0cb1d04731f2@oracle.com> (raw)
In-Reply-To: <20230803154453.1488248-3-gpiccoli@igalia.com>
On 8/3/23 23:43, Guilherme G. Piccoli wrote:
> Btrfs doesn't currently support to mount 2 different devices holding the
> same filesystem - the fsid is used as a unique identifier in the driver.
fsid is for the external frontend, systemd, and udev stuff;
metadata_uuid pertains to the actual btrfs on-disk.
> This case is supported though in some other common filesystems, like
> ext4;
> one of the reasons for which is not trivial supporting this case
> on btrfs is due to its multi-device filesystem nature, native RAID, etc.
How is it related to the multi-device aspect? The main limitation is
that a disk image can be cloned without maintaining a unique device-
uuid.
> Supporting the same-fsid mounts has the advantage of allowing btrfs to
> be used in A/B partitioned devices, like mobile phones or the Steam Deck
> for example.
> Without this support, it's not safe for users to keep the
> same "image version" in both A and B partitions, a setup that is quite
> common for development, for example. Also, as a big bonus, it allows fs
> integrity check based on block devices for RO devices (whereas currently
> it is required that both have different fsid, breaking the block device
> hash comparison).
Does it apply to smaller disk images? Otherwise, it will be very
time-consuming. Just curious, how is the checksum verified for the entire
disk? (Btrfs might provide a checksum tree-based solution at
some point.)
> Such same-fsid mounting is hereby added through the usage of the
> filesystem feature "single-dev" - when such feature is used, btrfs
> generates a random fsid for the filesystem and leverages the long-term
> present metadata_uuid infrastructure to enable the usage of this
> secondary virtual fsid, effectively requiring few non-invasive changes
> to the code and no new potential corner cases.
>
> In order to prevent more code complexity and corner cases, given
> the nature of this mechanism (single-devices), the single-dev feature
> is not allowed when the metadata_uuid flag is already present on the
> fs, or if the device is on fsid-change state. Device removal/replace
> is also disabled for devices presenting the single-dev feature.
>
> Suggested-by: John Schoenick <johns@valvesoftware.com>
> Suggested-by: Qu Wenruo <wqu@suse.com>
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> ---
> fs/btrfs/disk-io.c | 19 +++++++-
> fs/btrfs/fs.h | 3 +-
> fs/btrfs/ioctl.c | 18 +++++++
> fs/btrfs/super.c | 8 ++--
> fs/btrfs/volumes.c | 97 ++++++++++++++++++++++++++++++--------
> fs/btrfs/volumes.h | 3 +-
> include/uapi/linux/btrfs.h | 7 +++
> 7 files changed, 127 insertions(+), 28 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 669b10355091..455fa4949c98 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -320,7 +320,7 @@ static bool check_tree_block_fsid(struct extent_buffer *eb)
> /*
> * alloc_fs_devices() copies the fsid into metadata_uuid if the
> * metadata_uuid is unset in the superblock, including for a seed device.
> - * So, we can use fs_devices->metadata_uuid.
> + * So, we can use fs_devices->metadata_uuid; same for SINGLE_DEV devices.
> */
> if (!memcmp(fsid, fs_info->fs_devices->metadata_uuid, BTRFS_FSID_SIZE))
> return false;
> @@ -2288,6 +2288,7 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
> {
> u64 nodesize = btrfs_super_nodesize(sb);
> u64 sectorsize = btrfs_super_sectorsize(sb);
> + u8 *fsid;
> int ret = 0;
>
> if (btrfs_super_magic(sb) != BTRFS_MAGIC) {
> @@ -2368,7 +2369,21 @@ int btrfs_validate_super(struct btrfs_fs_info *fs_info,
> ret = -EINVAL;
> }
>
> - if (memcmp(fs_info->fs_devices->fsid, sb->fsid, BTRFS_FSID_SIZE)) {
> + /*
> + * For SINGLE_DEV devices, btrfs creates a random fsid and makes
> + * use of the metadata_uuid infrastructure in order to allow, for
> + * example, two devices with same fsid getting mounted at the same
> + * time. But notice no changes happen at the disk level, so the
> + * random generated fsid is a driver abstraction, not to be written
> + * in the disk. That's the reason we're required here to compare the
> + * fsid with the metadata_uuid for such devices.
> + */
> + if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV))
> + fsid = fs_info->fs_devices->metadata_uuid;
> + else
> + fsid = fs_info->fs_devices->fsid;
Below alloc_fs_device(), fsid is still being kept equal to metadata_uuid
in memory for single_dev. So, this distinction is unnecessary.
> +
> + if (memcmp(fsid, sb->fsid, BTRFS_FSID_SIZE)) {
David prefers memcmp to be either compared to == or != to 0
depending on the requirement.
> btrfs_err(fs_info,
> "superblock fsid doesn't match fsid of fs_devices: %pU != %pU",
> sb->fsid, fs_info->fs_devices->fsid);
> diff --git a/fs/btrfs/fs.h b/fs/btrfs/fs.h
> index 203d2a267828..c6d124973361 100644
> --- a/fs/btrfs/fs.h
> +++ b/fs/btrfs/fs.h
> @@ -200,7 +200,8 @@ enum {
> (BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE | \
> BTRFS_FEATURE_COMPAT_RO_FREE_SPACE_TREE_VALID | \
> BTRFS_FEATURE_COMPAT_RO_VERITY | \
> - BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE)
> + BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE | \
> + BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV)
>
> #define BTRFS_FEATURE_COMPAT_RO_SAFE_SET 0ULL
> #define BTRFS_FEATURE_COMPAT_RO_SAFE_CLEAR 0ULL
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index a895d105464b..56703d87def9 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2678,6 +2678,12 @@ static long btrfs_ioctl_rm_dev_v2(struct file *file, void __user *arg)
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
> + btrfs_err(fs_info,
> + "device removal is unsupported on SINGLE_DEV devices\n");
> + return -EINVAL;
> + }
> +
> vol_args = memdup_user(arg, sizeof(*vol_args));
> if (IS_ERR(vol_args))
> return PTR_ERR(vol_args);
> @@ -2744,6 +2750,12 @@ static long btrfs_ioctl_rm_dev(struct file *file, void __user *arg)
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
> + btrfs_err(fs_info,
> + "device removal is unsupported on SINGLE_DEV devices\n");
> + return -EINVAL;
> + }
> +
> vol_args = memdup_user(arg, sizeof(*vol_args));
> if (IS_ERR(vol_args))
> return PTR_ERR(vol_args);
> @@ -3268,6 +3280,12 @@ static long btrfs_ioctl_dev_replace(struct btrfs_fs_info *fs_info,
> if (!capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (btrfs_fs_compat_ro(fs_info, SINGLE_DEV)) {
> + btrfs_err(fs_info,
> + "device removal is unsupported on SINGLE_DEV devices\n");
> + return -EINVAL;
> + }
> +
> if (btrfs_fs_incompat(fs_info, EXTENT_TREE_V2)) {
> btrfs_err(fs_info, "device replace not supported on extent tree v2 yet");
> return -EINVAL;
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index f1dd172d8d5b..ee87189b1ccd 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -883,7 +883,7 @@ static int btrfs_parse_device_options(const char *options, blk_mode_t flags)
> error = -ENOMEM;
> goto out;
> }
> - device = btrfs_scan_one_device(device_name, flags);
> + device = btrfs_scan_one_device(device_name, flags, true);
> kfree(device_name);
> if (IS_ERR(device)) {
> error = PTR_ERR(device);
> @@ -1478,7 +1478,7 @@ static struct dentry *btrfs_mount_root(struct file_system_type *fs_type,
> goto error_fs_info;
> }
>
> - device = btrfs_scan_one_device(device_name, mode);
> + device = btrfs_scan_one_device(device_name, mode, true);
> if (IS_ERR(device)) {
> mutex_unlock(&uuid_mutex);
> error = PTR_ERR(device);
> @@ -2190,7 +2190,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
> switch (cmd) {
> case BTRFS_IOC_SCAN_DEV:
> mutex_lock(&uuid_mutex);
> - device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
> + device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
> ret = PTR_ERR_OR_ZERO(device);
> mutex_unlock(&uuid_mutex);
> break;
> @@ -2204,7 +2204,7 @@ static long btrfs_control_ioctl(struct file *file, unsigned int cmd,
> break;
> case BTRFS_IOC_DEVICES_READY:
> mutex_lock(&uuid_mutex);
> - device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ);
> + device = btrfs_scan_one_device(vol->name, BLK_OPEN_READ, false);
> if (IS_ERR(device)) {
> mutex_unlock(&uuid_mutex);
> ret = PTR_ERR(device);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 73753dae111a..433a490f2de8 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -681,12 +681,14 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
> return -EINVAL;
> }
>
> -static u8 *btrfs_sb_metadata_uuid_or_null(struct btrfs_super_block *sb)
> +static u8 *btrfs_sb_metadata_uuid_single_dev(struct btrfs_super_block *sb,
> + bool has_metadata_uuid,
> + bool single_dev)
> {
> - bool has_metadata_uuid = (btrfs_super_incompat_flags(sb) &
> - BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
> + if (has_metadata_uuid || single_dev)
> + return sb->metadata_uuid;
>
> - return has_metadata_uuid ? sb->metadata_uuid : NULL;
> + return NULL;
> }
>
> u8 *btrfs_sb_fsid_ptr(struct btrfs_super_block *sb)
You can rebase the code onto the latest misc-next branch.
This is because we have dropped the function
btrfs_sb_metadata_uuid_or_null() in the final integration.
> @@ -775,8 +777,36 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
>
> return NULL;
> }
> +
> +static void prepare_virtual_fsid(struct btrfs_super_block *disk_super,
> + const char *path)
> +{
> + struct btrfs_fs_devices *fs_devices;
> + u8 vfsid[BTRFS_FSID_SIZE];
> + bool dup_fsid = true;
> +
> + while (dup_fsid) {
> + dup_fsid = false;
> + generate_random_uuid(vfsid);
> +
> + list_for_each_entry(fs_devices, &fs_uuids, fs_list) {
> + if (!memcmp(vfsid, fs_devices->fsid, BTRFS_FSID_SIZE) ||
> + !memcmp(vfsid, fs_devices->metadata_uuid,
> + BTRFS_FSID_SIZE))
> + dup_fsid = true;
> + }
> + }
> +
> + memcpy(disk_super->metadata_uuid, disk_super->fsid, BTRFS_FSID_SIZE);
> + memcpy(disk_super->fsid, vfsid, BTRFS_FSID_SIZE);
> +
> + pr_info("BTRFS: virtual fsid (%pU) set for SINGLE_DEV device %s (real fsid %pU)\n",
> + disk_super->fsid, path, disk_super->metadata_uuid);
> +}
> +
> /*
> - * Add new device to list of registered devices
> + * Add new device to list of registered devices, or in case of a SINGLE_DEV
> + * device, also creates a virtual fsid to cope with same-fsid cases.
> *
> * Returns:
> * device pointer which was just added or updated when successful
> @@ -784,7 +814,7 @@ static struct btrfs_fs_devices *find_fsid_reverted_metadata(
> */
> static noinline struct btrfs_device *device_list_add(const char *path,
> struct btrfs_super_block *disk_super,
> - bool *new_device_added)
> + bool *new_device_added, bool single_dev)
> {
> struct btrfs_device *device;
> struct btrfs_fs_devices *fs_devices = NULL;
> @@ -805,23 +835,32 @@ static noinline struct btrfs_device *device_list_add(const char *path,
> return ERR_PTR(error);
> }
>
> - if (fsid_change_in_progress) {
> - if (!has_metadata_uuid)
> - fs_devices = find_fsid_inprogress(disk_super);
> - else
> - fs_devices = find_fsid_changed(disk_super);
> - } else if (has_metadata_uuid) {
> - fs_devices = find_fsid_with_metadata_uuid(disk_super);
> + if (single_dev) {
> + if (has_metadata_uuid || fsid_change_in_progress) {
> + btrfs_err(NULL,
> + "SINGLE_DEV devices don't support the metadata_uuid feature\n");
> + return ERR_PTR(-EINVAL);
> + }
> + prepare_virtual_fsid(disk_super, path);
> } else {
> - fs_devices = find_fsid_reverted_metadata(disk_super);
> - if (!fs_devices)
> - fs_devices = find_fsid(disk_super->fsid, NULL);
> + if (fsid_change_in_progress) {
> + if (!has_metadata_uuid)
> + fs_devices = find_fsid_inprogress(disk_super);
> + else
> + fs_devices = find_fsid_changed(disk_super);
> + } else if (has_metadata_uuid) {
> + fs_devices = find_fsid_with_metadata_uuid(disk_super);
> + } else {
> + fs_devices = find_fsid_reverted_metadata(disk_super);
> + if (!fs_devices)
> + fs_devices = find_fsid(disk_super->fsid, NULL);
> + }
> }
>
> -
> if (!fs_devices) {
> fs_devices = alloc_fs_devices(disk_super->fsid,
> - btrfs_sb_metadata_uuid_or_null(disk_super));
> + btrfs_sb_metadata_uuid_single_dev(disk_super,
> + has_metadata_uuid, single_dev));
I think it is a good idea to rebase on latest misc-next and add the
below patch, as the arguments of alloc_fs_device() have been simplified.
[PATCH resend] btrfs: simplify alloc_fs_devices() remove arg2
Thanks, Anand
> if (IS_ERR(fs_devices))
> return ERR_CAST(fs_devices);
>
> @@ -1365,13 +1404,15 @@ int btrfs_forget_devices(dev_t devt)
> * and we are not allowed to call set_blocksize during the scan. The superblock
> * is read via pagecache
> */
> -struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
> +struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
> + bool mounting)
> {
> struct btrfs_super_block *disk_super;
> bool new_device_added = false;
> struct btrfs_device *device = NULL;
> struct block_device *bdev;
> u64 bytenr, bytenr_orig;
> + bool single_dev;
> int ret;
>
> lockdep_assert_held(&uuid_mutex);
> @@ -1410,7 +1451,17 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags)
> goto error_bdev_put;
> }
>
> - device = device_list_add(path, disk_super, &new_device_added);
> + single_dev = btrfs_super_compat_ro_flags(disk_super) &
> + BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV;
> +
> + if (!mounting && single_dev) {
> + pr_info("BTRFS: skipped non-mount scan on SINGLE_DEV device %s\n",
> + path);
> + btrfs_release_disk_super(disk_super);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + device = device_list_add(path, disk_super, &new_device_added, single_dev);
> if (!IS_ERR(device) && new_device_added)
> btrfs_free_stale_devices(device->devt, device);
>
> @@ -2406,6 +2457,12 @@ int btrfs_get_dev_args_from_path(struct btrfs_fs_info *fs_info,
>
> args->devid = btrfs_stack_device_id(&disk_super->dev_item);
> memcpy(args->uuid, disk_super->dev_item.uuid, BTRFS_UUID_SIZE);
> +
> + /*
> + * Note that SINGLE_DEV devices are not handled in a special way here;
> + * device removal/replace is instead forbidden when such feature is
> + * present, this note is for future users/readers of this function.
> + */
> if (btrfs_fs_incompat(fs_info, METADATA_UUID))
> memcpy(args->fsid, disk_super->metadata_uuid, BTRFS_FSID_SIZE);
> else
> diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
> index 0f87057bb575..b9856c801567 100644
> --- a/fs/btrfs/volumes.h
> +++ b/fs/btrfs/volumes.h
> @@ -611,7 +611,8 @@ struct btrfs_block_group *btrfs_create_chunk(struct btrfs_trans_handle *trans,
> void btrfs_mapping_tree_free(struct extent_map_tree *tree);
> int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
> blk_mode_t flags, void *holder);
> -struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags);
> +struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
> + bool mounting);
> int btrfs_forget_devices(dev_t devt);
> void btrfs_close_devices(struct btrfs_fs_devices *fs_devices);
> void btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices);
> diff --git a/include/uapi/linux/btrfs.h b/include/uapi/linux/btrfs.h
> index dbb8b96da50d..cb7a7cfe1ea9 100644
> --- a/include/uapi/linux/btrfs.h
> +++ b/include/uapi/linux/btrfs.h
> @@ -313,6 +313,13 @@ struct btrfs_ioctl_fs_info_args {
> */
> #define BTRFS_FEATURE_COMPAT_RO_BLOCK_GROUP_TREE (1ULL << 3)
>
> +/*
> + * Single devices (as flagged by the corresponding compat_ro flag) only
> + * gets scanned during mount time; also, a random fsid is generated for
> + * them, in order to cope with same-fsid filesystem mounts.
> + */
> +#define BTRFS_FEATURE_COMPAT_RO_SINGLE_DEV (1ULL << 4)
> +
> #define BTRFS_FEATURE_INCOMPAT_MIXED_BACKREF (1ULL << 0)
> #define BTRFS_FEATURE_INCOMPAT_DEFAULT_SUBVOL (1ULL << 1)
> #define BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS (1ULL << 2)
next prev parent reply other threads:[~2023-08-23 16:32 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-03 15:43 [PATCH V2 0/3] Supporting same fsid mounting through a compat_ro feature Guilherme G. Piccoli
2023-08-03 15:43 ` [PATCH 1/3] btrfs-progs: Add the single-dev feature (to both mkfs/tune) Guilherme G. Piccoli
2023-08-17 15:46 ` Josef Bacik
2023-08-17 16:16 ` Guilherme G. Piccoli
2023-08-03 15:43 ` [PATCH 2/3] btrfs: Introduce the single-dev feature Guilherme G. Piccoli
2023-08-04 8:27 ` Qu Wenruo
2023-08-04 11:38 ` Guilherme G. Piccoli
2023-08-17 15:41 ` Josef Bacik
2023-08-17 16:20 ` Guilherme G. Piccoli
2023-08-17 16:58 ` Josef Bacik
2023-08-17 17:09 ` Guilherme G. Piccoli
2023-08-23 16:31 ` Anand Jain [this message]
2023-08-24 20:55 ` Guilherme G. Piccoli
2023-08-29 20:28 ` Guilherme G. Piccoli
2023-08-30 7:11 ` Anand Jain
2023-08-30 12:00 ` Guilherme G. Piccoli
2023-08-03 15:43 ` [PATCH 3/3] btrfs: Add parameter to force devices behave as single-dev ones Guilherme G. Piccoli
2023-08-17 15:44 ` Josef Bacik
2023-08-20 18:16 ` Guilherme G. Piccoli
2023-08-17 13:56 ` [PATCH V2 0/3] Supporting same fsid mounting through a compat_ro feature Guilherme G. Piccoli
2023-08-17 14:19 ` Josef Bacik
2023-08-17 14:23 ` Guilherme G. Piccoli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9dae9ca5-be94-af95-e7c3-0cb1d04731f2@oracle.com \
--to=anand.jain@oracle.com \
--cc=clm@fb.com \
--cc=david@fromorbit.com \
--cc=dsterba@suse.com \
--cc=gpiccoli@igalia.com \
--cc=johns@valvesoftware.com \
--cc=josef@toxicpanda.com \
--cc=kernel-dev@igalia.com \
--cc=kernel@gpiccoli.net \
--cc=kreijack@libero.it \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=ludovico.denittis@collabora.com \
--cc=quwenruo.btrfs@gmx.com \
--cc=vivek@collabora.com \
--cc=wqu@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).