public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Su Yue <Damenly_Su@gmx.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: Nikolay Borisov <nborisov@suse.com>,
	damenly.su@gmail.com, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted)
Date: Fri, 13 Dec 2019 15:15:51 +0800	[thread overview]
Message-ID: <d972b667-83a6-34a1-0b91-e7c6c7a80bad@gmx.com> (raw)
In-Reply-To: <d0da81b4-801d-cb90-6e15-66905f190930@oracle.com>

On 2019/12/13 1:36 PM, Anand Jain wrote:
> 
> 
>   metadata_uuid code is too confusing, a lot of if and if-nots
>   it should be have been better.
> 

Agreed. It costed much brain power of mine.

>   more below.
> 

Willing to answer from my understanding.
If something wrong, please point at.

> On 13/12/19 10:46 AM, Su Yue wrote:
>> On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
>>>
>>>
>>> On 12.12.19 г. 13:01 ч., damenly.su@gmail.com wrote:
>>>
>>> <snip>
>>>
>>>> Acutally, there are two devices in the fs. Device 2 with
>>>> FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
>>>> fs_devices but failed to be added into since fs_devices->opened (
>>>
>>> It's not clear why device 1 wasn't able to be added to the fs_devices
>>> allocated by dev 2. Please elaborate?
>>>
>>>
>> Sure, of course.
>>
>> For example.
>>
>> $cat test.sh
>> ====================================================================
>> img1="/tmp/test1.img"
>> img2="/tmp/test2.img"
>>
>> [ -f "$img1" ] || fallocate -l 300M "$img1"
>> [ -f "$img2" ] || fallocate -l 300M "$img2"
>>
>> mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
>> losetup -D
>>
>> dmesg -C
>> rmmod btrfs
>> modprobe btrfs
>>
>> loop1=$(losetup --find --show "$img1")
>> loop2=$(losetup --find --show "$img2")
> 
>   Can you explicitly show what devices should be scanned to make the
>   device mount (below) successful. Fist you can cleanup the
>   device list using
> 
>     btrfs device --forget
> 

Thanks for the tip.
The purpose of simple script is to show that there
may be uncompleted/unsuccessful device(s) scanning due to
fs_devices->opened. Is the issue already known?

>> mount $loop1 /mnt || exit 1
>> umount /mnt
>> ====================================================================
>>
>> $dmesg
>> ====================================================================
>> [  395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
>> devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
>> [  395.210773] !!!!!!!!fs_device opened
>> [  395.213875] BTRFS info (device loop0): disk space caching is enabled
>> [  395.214994] BTRFS info (device loop0): has skinny extents
>> [  395.215891] BTRFS info (device loop0): flagging fs with big metadata
>> feature
>> [  395.222639] BTRFS error (device loop0): devid 2 uuid
>> adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
>> [  395.224147] BTRFS error (device loop0): failed to read the system
>> array: -2
>> [  395.246163] !!!!!!!!fs_device opened
>> [  395.338219] BTRFS error (device loop0): open_ctree failed
>> =====================================================================
>>
>> The line "!!!!!!!!fs_device opened" is handy added by me in debug 
>> purpose.
>>
>> =====================================================================
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -794,6 +794,7 @@ static noinline struct btrfs_device
>> *device_list_add(const char *path,
>>
>>          if (!device) {
>>                  if (fs_devices->opened) {
>> +                       pr_info("!!!!!!!!fs_device opened\n");
>>                          mutex_unlock(&fs_devices->device_list_mutex);
>>                          return ERR_PTR(-EBUSY);
>>                  }
>> =====================================================================
>>
>> To make it more clear. The following is in metadata_uuid situation.
>> Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
>> (newer transid).
>>
>> Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
>> transid).
> 
> How were you able to set FSID_CHANGING_V2
> and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?
> 
> 
The device2 is simulated to be the device failed to sync due
to some expected reason (power loss).

mkfs on two devices, use v5.4 progs/btrfstune -m $device.
Then both two devices both have the BTRFS_FEATURE_INCOMPAT_METADATA_UUID.
Play some tricks in btrfstune.c to avoid final super block
write on one deivce. Like the ugly code to delete the device:
========================================================================
diff --git a/btrfstune.c b/btrfstune.c
index afa3aae3..f678b978 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -101,12 +101,14 @@ static int set_metadata_uuid(struct btrfs_root 
*root, const char *uuid_string)
         struct btrfs_super_block *disk_super;
         uuid_t new_fsid, unused1, unused2;
         struct btrfs_trans_handle *trans;
+       struct btrfs_device *dev, *next;
         bool new_uuid = true;
         u64 incompat_flags;
         bool uuid_changed;
         u64 super_flags;
         int ret;

         disk_super = root->fs_info->super_copy;
         super_flags = btrfs_super_flags(disk_super);
         incompat_flags = btrfs_super_incompat_flags(disk_super);
@@ -170,6 +172,14 @@ static int set_metadata_uuid(struct btrfs_root 
*root, const char *uuid_string)
                 return 0;
         }

+       list_for_each_entry_safe(dev, next, 
&root->fs_info->fs_devices->devices,
+                                dev_list) {
+               if (dev->devid == 2) {
+                       fsync(dev->fd);
+                       list_del_init(&dev->dev_list);
+               }
+       }
+==================================================================


Compile again. call btrfstune -m again.
Then we get a device with
BTRFS_FEATURE_INCOMPAT_METADATA_UUID and FSID_CHANGING_V2.

>> The workflow in misc-tests/034 is
>>
>> loop1=$(losetup --find --show "$device2")
>> loop2=$(losetup --find --show "$device1")
>>
>> mount $loop1 /mnt ---> fails here
>>
>> Assume the fs_devices was allocated by systemd-udevd through
>> btrfs_control_ioctl() path after finish of scanning of device2.
>>
>> Then:
>>
> 
> In the two threads which are in race (below), the mount thread can't be 
> successful unless -o degraded is used, if it does it means the devid 1 

Right.. The dmesg reports the device 1 is missing.

> is already scanned and for that btrfs_device to be in the
> btrfs_fs_devices list the fsid has to match (does not matter 
> metadata_uuid).

Sorry, I doesn't make much clear what you mean. In similar but no 
metadata_uuid situation, mount will fail too but the assertion won't
fail of course. The device1 was scanned but not added into the
fs_devices(already found) since the fs_devices was opened by the
mounting thread.

> 
>> Thread *mounting device2*            Thread *scanning device1*
>>
>>
>> btrfs_mount_root                     btrfs_control_ioctl
>>
>>    mutex_lock(&uuid_mutex);
>>
>>      btrfs_read_disk_super
>>      btrfs_scan_one_device
>>      --> there is only device2
>>      in the fs_devices
>>
>>      btrfs_open_devices
>>        fs_devices->opened = 1
>>        fs_devices->latest_bdev = device2
>>
>>      mutex_unlock(&uuid_mutex);
>>
>>                                        mutex_lock(&uuid_mutex);
>>                                        btrfs_scan_one_device
>>                                          btrfs_read_disk_super
>>
>>                                          device_list_add
>>                                            found fs_devices
>>                                              device = btrfs_find_device
>>
>>                                              rewrite fs_deivces->fsid if
>>                                              scanned device1 is newer
>>                                               --> Change fs_devices->fsi
>>                                                    d to device1->fsid
>>
>>                                            if (!device)
>>                                               if(fs_devices->opened)
>>                           return -EBUSY
>>                                               --> the device1 adding
>>                                                   aborts since
>>                                                   fs_devices was opened
>>                                        mutex_unlock(&uuid_mutex);
>>    btrfs_fill_super
>>      open_ctree
>>         btrfs_read_dev_super(
>>         fs_devices->latest_bdev)
>>         --> the latest_bdev is device2
>>
>>         assert fs_devices->fsid equals
>>         device2's fsid.
>>         --> fs_device->fsid was rewritten by
>>             the scanning thread
>>
>> The result is fs_device->fsid is from device1 but super->fsid is from
>> the lastest device2.
>>
> 
>   Oops that's not good. However still not able to image various devices
>   and its fsid to achieve that condition. Is it possible to write a test
>   case? It would help.
> 
It did happened in my test environment.. You can try misc-tests/034 
about 20 times on v5.4 progs and v5.5-rc1 kernel. As for the test case,
will give a try.

Thanks
> Thanks, Anand
> 
>>>> the thread is doing mount device 1). But device 1's fsid was copied
>>>> to fs_devices->fsid then the assertion failed.
>>>
>>>
>>> dev 1 fsid should be copied iff its transid is newer.
>>>
>>
>> Even it was failed to be added into the fs_devices?
>>
>>>>
>>>> The solution is that only if a new device was added into a existing
>>>> fs_device, then the fs_devices->fsid is allowed to be rewritten.
>>>
>>> fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
>>> to the transid.
>>>
>>
>> Then the assertion failed in above scenario. Just do not update the
>> fs_devices->fsid, let later btrfs_read_sys_array() report the device
>> missing then reject to mount.
>>
>> Thanks
>>
>>>>
>>>> Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario 
>>>> during fsid change")
>>>> Signed-off-by: Su Yue <Damenly_Su@gmx.com>
>>>> ---
>>>>   fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
>>>>   1 file changed, 21 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index d8e5560db285..9efa4123c335 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -732,6 +732,9 @@ static noinline struct btrfs_device 
>>>> *device_list_add(const char *path,
>>>>           BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
>>>>       bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
>>>>                       BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
>>>> +    bool fs_devices_found = false;
>>>> +
>>>> +    *new_device_added = false;
>>>>
>>>>       if (fsid_change_in_progress) {
>>>>           if (!has_metadata_uuid) {
>>>> @@ -772,24 +775,11 @@ static noinline struct btrfs_device 
>>>> *device_list_add(const char *path,
>>>>
>>>>           device = NULL;
>>>>       } else {
>>>> +        fs_devices_found = true;
>>>> +
>>>>           mutex_lock(&fs_devices->device_list_mutex);
>>>>           device = btrfs_find_device(fs_devices, devid,
>>>>                   disk_super->dev_item.uuid, NULL, false);
>>>> -
>>>> -        /*
>>>> -         * If this disk has been pulled into an fs devices created by
>>>> -         * a device which had the CHANGING_FSID_V2 flag then 
>>>> replace the
>>>> -         * metadata_uuid/fsid values of the fs_devices.
>>>> -         */
>>>> -        if (has_metadata_uuid && fs_devices->fsid_change &&
>>>> -            found_transid > fs_devices->latest_generation) {
>>>> -            memcpy(fs_devices->fsid, disk_super->fsid,
>>>> -                    BTRFS_FSID_SIZE);
>>>> -            memcpy(fs_devices->metadata_uuid,
>>>> -                    disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>> -
>>>> -            fs_devices->fsid_change = false;
>>>> -        }
>>>>       }
>>>>
>>>>       if (!device) {
>>>> @@ -912,6 +902,22 @@ static noinline struct btrfs_device 
>>>> *device_list_add(const char *path,
>>>>           }
>>>>       }
>>>>
>>>> +    /*
>>>> +     * If the new added disk has been pulled into an fs devices 
>>>> created by
>>>> +     * a device which had the CHANGING_FSID_V2 flag then replace the
>>>> +     * metadata_uuid/fsid values of the fs_devices.
>>>> +     */
>>>> +    if (*new_device_added && fs_devices_found &&
>>>> +        has_metadata_uuid && fs_devices->fsid_change &&
>>>> +        found_transid > fs_devices->latest_generation) {
>>>> +        memcpy(fs_devices->fsid, disk_super->fsid,
>>>> +               BTRFS_FSID_SIZE);
>>>> +        memcpy(fs_devices->metadata_uuid,
>>>> +               disk_super->metadata_uuid, BTRFS_FSID_SIZE);
>>>> +
>>>> +        fs_devices->fsid_change = false;
>>>> +    }
>>>> +
>>>>       /*
>>>>        * Unmount does not free the btrfs_device struct but would zero
>>>>        * generation along with most of the other members. So just 
>>>> update
>>>>
>>
> 


  reply	other threads:[~2019-12-13  7:16 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-12 11:01 [PATCH 0/6] btrfs: metadata uuid fixes and enhancements damenly.su
2019-12-12 11:01 ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan damenly.su
2019-12-12 14:15   ` Nikolay Borisov
2019-12-13  2:30     ` Su Yue
2019-12-13  2:46     ` [PATCH 1/6] btrfs: metadata_uuid: fix failed assertion due to unsuccessful device scan (reformatted) Su Yue
2019-12-13  5:36       ` Anand Jain
2019-12-13  7:15         ` Su Yue [this message]
2019-12-13  8:51           ` Anand Jain
2019-12-13 10:10             ` Su Yue
2019-12-12 11:01 ` [PATCH 2/6] btrfs: metadata_uuid: move split-brain handling from fs_id() to new function damenly.su
2019-12-12 13:05   ` Nikolay Borisov
2019-12-12 13:32     ` Su Yue
2019-12-12 11:01 ` [PATCH 3/6] btrfs: split-brain case for scanned changing device with INCOMPAT_METADATA_UUID damenly.su
2019-12-12 13:24   ` Su Yue
2019-12-12 13:34   ` Nikolay Borisov
2019-12-12 14:19     ` Su Yue
2019-12-12 11:01 ` [PATCH 4/6] btrfs: split-brain case for scanned changed device without INCOMPAT_METADATA_UUID damenly.su
2019-12-12 11:01 ` [PATCH 5/6] btrfs: copy fsid and metadata_uuid for pulled disk " damenly.su
2020-01-06 15:12   ` Nikolay Borisov
2020-01-07  1:31     ` Su Yue
2020-01-07  7:18       ` Nikolay Borisov
2020-01-07  7:34         ` Su Yue
2019-12-12 11:01 ` [PATCH 6/6] btrfs: metadata_uuid: move partly logic into find_fsid_inprogress() damenly.su
2019-12-12 13:37   ` Nikolay Borisov
2019-12-13  8:03 ` [PATCH 0/6] btrfs: metadata uuid fixes and enhancements Nikolay Borisov
2019-12-16  0:49   ` Su Yue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d972b667-83a6-34a1-0b91-e7c6c7a80bad@gmx.com \
    --to=damenly_su@gmx.com \
    --cc=anand.jain@oracle.com \
    --cc=damenly.su@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox