From: Anand Jain <anand.jain@oracle.com>
To: dsterba@suse.cz
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH RFC] btrfs: pick device with lowest devt for show_devname
Date: Mon, 27 Nov 2023 19:48:34 +0800 [thread overview]
Message-ID: <589d8650-26e8-4c0e-a602-bdb5ce427ed5@oracle.com> (raw)
In-Reply-To: <36171811-ed49-4427-a647-e052ec70faa0@oracle.com>
On 11/25/23 09:09, Anand Jain wrote:
>
>
> On 11/25/23 00:19, David Sterba wrote:
>> On Thu, Nov 02, 2023 at 07:10:48PM +0800, Anand Jain wrote:
>>> In a non-single-device Btrfs filesystem, if Btrfs is already mounted and
>>> if you run the command 'mount -a,' it will fail and the command
>>> 'umount <device>' also fails. As below:
>>>
>>> ----------------
>>> $ cat /etc/fstab | grep btrfs
>>> UUID=12345678-1234-1234-1234-123456789abc /btrfs btrfs
>>> defaults,nofail 0 0
>>>
>>> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc
>>> /dev/sda2 /dev/sda1
>>> $ mount --verbose -a
>>> / : ignored
>>> /btrfs : successfully mounted
>>>
>>> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc
>>> lrwxrwxrwx 1 root root 10 Nov 2 17:43
>>> 12345678-1234-1234-1234-123456789abc -> ../../sda1
>>>
>>> $ cat /proc/self/mounts | grep btrfs
>>> /dev/sda2 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/
>>> 0 0
>>>
>>> $ findmnt --df /btrfs
>>> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET
>>> /dev/sda2 btrfs 2G 5.8M 1.8G 0% /btrfs
>>>
>>> $ mount --verbose -a
>>> / : ignored
>>> mount: /btrfs: /dev/sda1 already mounted or mount point busy.
>>> $echo $?
>>> 32
>>>
>>> $ umount /dev/sda1
>>> umount: /dev/sda1: not mounted.
>>> $ echo $?
>>> 32
>>> ----------------
>>>
>>> I assume (RFC) this is because '/dev/disk/by-uuid,' '/proc/self/mounts,'
>>> and 'findmnt' do not all reference the same device, resulting in the
>>> 'mount -a' and 'umount' failures. However, an empirically found solution
>>> is to align them using a rule, such as the disk with the lowest 'devt,'
>>> for a multi-device Btrfs filesystem.
>>>
>>> I'm not yet sure (RFC) how to create a udev rule to point to the disk
>>> with
>>> the lowest 'devt,' as this kernel patch does, and I believe it is
>>> possible.
>>>
>>> And this would ensure that '/proc/self/mounts,' 'findmnt,' and
>>> '/dev/disk/by-uuid' all reference the same device.
>>>
>>> After applying this patch, the above test passes. Unfortunately,
>>> /dev/disk/by-uuid also points to the lowest 'devt' by chance, even
>>> though
>>> no rule has been set as of now. As shown below.
>>>
>>> ----------------
>>> $ mkfs.btrfs -qf --uuid 12345678-1234-1234-1234-123456789abc
>>> /dev/sda2 /dev/sda1
>>>
>>> $ mount --verbose -a
>>> / : ignored
>>> /btrfs : successfully mounted
>>>
>>> $ ls -l /dev/disk/by-uuid | grep 12345678-1234-1234-1234-123456789abc
>>> lrwxrwxrwx 1 root root 10 Nov 2 17:53
>>> 12345678-1234-1234-1234-123456789abc -> ../../sda1
>>>
>>> $ cat /proc/self/mounts | grep btrfs
>>> /dev/sda1 /btrfs btrfs rw,relatime,space_cache=v2,subvolid=5,subvol=/
>>> 0 0
>>>
>>> $ findmnt --df /btrfs
>>> SOURCE FSTYPE SIZE USED AVAIL USE% TARGET
>>> /dev/sda1 btrfs 2G 5.8M 1.8G 0% /btrfs
>>>
>>> $ mount --verbose -a
>>> / : ignored
>>> /btrfs : already mounted
>>> $echo $?
>>> 0
>>>
>>> $ umount /dev/sda1
>>> $echo $?
>>> 0
>>> ----------------
>>>
>>> Signed-off-by: Anand Jain <anand.jain@oracle.com>
>>> ---
>>> fs/btrfs/super.c | 15 +++++++++++++--
>>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>> index 66bdb6fd83bd..d768917cc5cc 100644
>>> --- a/fs/btrfs/super.c
>>> +++ b/fs/btrfs/super.c
>>> @@ -2301,7 +2301,18 @@ static int btrfs_unfreeze(struct super_block *sb)
>>> static int btrfs_show_devname(struct seq_file *m, struct dentry *root)
>>> {
>>> - struct btrfs_fs_info *fs_info = btrfs_sb(root->d_sb);
>>> + struct btrfs_fs_devices *fs_devices =
>>> btrfs_sb(root->d_sb)->fs_devices;
>>> + struct btrfs_device *device;
>>> + struct btrfs_device *first_device = NULL;
>>> +
>>> + list_for_each_entry(device, &fs_devices->devices, dev_list) {
>>> + if (first_device == NULL) {
>>> + first_device = device;
>>> + continue;
>>> + }
>>> + if (first_device->devt > device->devt)
>>> + first_device = device;
>>> + }
>>
>> I think we agree in principle that the devt can be used to determine the
>> device to print in show_devname, however I'd like to avoid iterating the
>> device list here. We used to have it, first with mutex protection, then
>> RCU. That we can simply print the latest_dev is quite convenient.
>>
>> I suggest to either add a separate fs_info variable to set the desired
>> device and only print it here, or reuse latest_dev for that.
>>
>
> I got it. Thx. However, I am still thinking about the fix. more below.
>
>> Adding a separate variable for that should be safer as latest_dev is
>> used in various way and I can't say it's always clear.
>
>
> Libmount and libblkid use the device path from the mount table and udev,
> respectively. These output gets string-matched during 'umount <dev>',
> 'findmnt', and 'mount -a'. However, in the case of Btrfs, there may be
> more than one device per FSID, and there is neither a master device nor
> a consolidating pseudo device for public relations, similar to LVM.
>
> A rule, such as selecting the device with the lowest maj:min, works if
> it is implemented in both systemd and btrfs.ko. But, this approach does
> not resolve the 'umount' problem, such as
> 'umount <device-which-mount-table-did-not-show>'.
>
> I am skeptical about whether we have a strong case to create a single
> pseudo device per multi-device Btrfs filesystem, such as, for example
> '/dev/btrfs/<fsid>-<random>/rootid=5' and which means pseudo device
> will carry the btrfs-magic and the actual blk devices something else.
>
> OR for now, regarding the umount issue mentioned above, we just can
> document it for the users to be aware of.
>
> Any feedback is greatly appreciated.
>
How about if we display the devices list in the options, so that
user-land libs have something in the mount-table that tells all
the devices part of the fsid?
For example:
$ cat /proc/self/mounts | grep btrfs
/dev/sda1 /btrfs btrfs
rw,relatime,space_cache=v2,subvolid=5,subvol=/,device=/dev/sda2,device=/dev/sdb3
0 0
Thanks.
next prev parent reply other threads:[~2023-11-27 11:48 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-02 11:10 [PATCH RFC] btrfs: pick device with lowest devt for show_devname Anand Jain
2023-11-02 20:26 ` David Sterba
2023-11-02 22:55 ` Anand Jain
2023-11-20 14:42 ` David Sterba
2023-11-24 16:19 ` David Sterba
2023-11-25 1:09 ` Anand Jain
2023-11-27 11:48 ` Anand Jain [this message]
2023-11-28 8:00 ` Goffredo Baroncelli
2023-11-28 23:28 ` Anand Jain
2023-11-29 13:38 ` Andrei Borzenkov
2023-11-29 20:54 ` Goffredo Baroncelli
2023-12-05 17:44 ` David Sterba
2023-12-06 19:52 ` Goffredo Baroncelli
2023-12-05 17:43 ` David Sterba
2023-11-29 21:20 ` Goffredo Baroncelli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=589d8650-26e8-4c0e-a602-bdb5ce427ed5@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox