From: Anand Jain <anand.jain@oracle.com>
To: Nikolay Borisov <nborisov@suse.com>, linux-btrfs@vger.kernel.org
Cc: dsterba@suse.com, josef@toxicpanda.com
Subject: Re: [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device
Date: Fri, 1 May 2020 01:54:41 +0800 [thread overview]
Message-ID: <36da56d2-2384-87fb-8003-814e9c72ddbb@oracle.com> (raw)
In-Reply-To: <55a5fd3a-dddb-df52-7f22-01e3407c0325@suse.com>
On 30/4/20 2:05 pm, Nikolay Borisov wrote:
>
>
> On 28.04.20 г. 18:22 ч., Anand Jain wrote:
>> v3 REBASED: Based on the latest misc-next. for for-5.8.
>> Dropped the following patches as there were concerns about the usage
>> of error code -EUCLEAN
>> btrfs: remove identified alien device in open_fs_devices
>> btrfs: remove identified alien btrfs device in open_fs_devices
>>
>> Rmaining 3 patches here have obtained reviewed-by. With this pathset
>> the pertaining fstests btrfs/197 and btrfs/198 (which tests 3 bugs)
>> would pass as the patch 2/3 fixed a bug and 3/3 fixed the trigger
>> of 2 other bugs (patch 1/3 is just a cleanup). Further at the moment
>> I am not sure if there is any other trigger where it could again leave
>> an alien device in the fs_devices leading to the same/similar bugs.
>>
>> ==== original email ====
>> v3: Fix alien device is due to wipefs in Patch4.
>> Fix a nit in Patch3.
>> Patches are reordered.
>>
>> Alien device is a device in fs_devices list having a different fsid than
>> the expected fsid or no btrfs_magic. This patch set fixes issues found due
>> to the same.
>>
>> Patch1: is a cleanup patch, not related.
>> Patch2: fixes failing to mount a degraded RAIDs (RAID1/5/6/10), by
>> hardening the function btrfs_free_extra_devids().
>> Patch3: fixes the missing device (due to alien btrfs-device) not missing in
>> the userland, by hardening the function btrfs_open_one_device().
>> Patch4: fixes the missing device (due to alien device) not missing in
>> the userland, by returning EUCLEAN in btrfs_read_dev_one_super().
>> Patch5: eliminates the source of the alien device in the fs_devices.
>>
>> PS: Fundamentally its wrong approach that btrfs-progs deduces the device
>> missing state in the userland instead of obtaining it from the kernel.
>> I remember objecting on the btrfs-progs patch which did that, but still
>> it got merged, bugs in p3 and p4 are its side effects. I wrote
>> patches to read device_state from the kernel using ioctl, procfs and
>> sysfs but it didn't get the due attention till a merger.
>>
>> Anand Jain (3):
>> btrfs: drop useless goto in open_fs_devices
>> btrfs: include non-missing as a qualifier for the latest_bdev
>> btrfs: free alien device due to device add
>>
>> fs/btrfs/volumes.c | 30 ++++++++++++++++++++++--------
>> 1 file changed, 22 insertions(+), 8 deletions(-)
>>
>
>
> One thing I'm not clear is how can we get into a situation of an alien
> device. I.e devices should be in fs_devices iff they are part of the> filesystem, no ?
>
I think you are missing the point that, when the devices (of a
raid1/raid5/raid6) are unmounted, we don't free any of their
fs_devices::device. So in this situation if one of those devices is
added to any another fsid (using btrfs device add) or wiped using wipefs
-a, we still don't free the device's former fs_devices::device entry in
the kernel and it acts as an alien device among its former partners when
it is mounting.
Now when this former partner(s) is(are) trying to mount in degraded, the
mount thread reads this device's SB (which is now an alien) and gets new
fsid+devid. The function btrfs_open_one_device() sees the non-matching
fsid+devid and it just ignores and still does not free the
fs_devices::device. If it had done, then the add_missing_dev() could
have allocated a fresh fs_devices::device using the data in the
chunk-tree instead of just setting the device state as missing in the
function read_one_dev().
Please look at fstests btrfs/197 and btrfs/198 they tests these
scenarios.
There are three bugs.
1. raid1 mount -o degraded failed (patch 2/3 fixes it) (this bug is
not related to the alien device issue, but we need this patch
to continue testing).
2. btrfs fi show -m does not show the missing device after mount of
the degraded raid1.
a. Bug triggered by btrfs device add. Fixed by the dropped patch
[patch] btrfs: remove identified alien btrfs device in
open_fs_devices
root cause is btrfs_open_one_device() did not free the known
alien device which contains the btrfs_magic but does not contain
the required fsid+devid.
b. Bug triggered by wipefs -a. Fixed by the dropped patch
[patch] btrfs: remove identified alien device in open_fs_devices
root cause is mount thread does not free the identified alien
(not even contains the btrfs_magic) device.
3. Device add ioctl trigger. (patch 3/3 fixes it).
We still need those two dropped patches. They are stuck with the
-EUCLEAN usage. It can be discussed and sent as separate patches.
next prev parent reply other threads:[~2020-04-30 9:55 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-28 15:22 [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Anand Jain
2020-04-28 15:22 ` [PATCH 1/3] btrfs: drop useless goto in open_fs_devices Anand Jain
2020-04-30 14:09 ` David Sterba
2020-04-28 15:22 ` [PATCH 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
2020-04-30 13:46 ` David Sterba
2020-05-01 22:54 ` Anand Jain
2020-04-28 15:22 ` [PATCH 3/3] btrfs: free alien device due to device add Anand Jain
2020-04-30 13:31 ` David Sterba
2020-05-01 20:01 ` Anand Jain
2020-05-05 17:02 ` David Sterba
2020-05-04 18:58 ` [PATCH v4 2/3] btrfs: include non-missing as a qualifier for the latest_bdev Anand Jain
2020-05-04 18:58 ` [PATCH v4 3/3] btrfs: free alien device due to device add Anand Jain
2020-05-05 19:34 ` David Sterba
2020-05-05 23:40 ` Anand Jain
2020-04-30 6:05 ` [PATCH v3 REBASED 0/3] btrfs: fix issues due to alien device Nikolay Borisov
2020-04-30 17:54 ` Anand Jain [this message]
2020-04-30 10:28 ` Nikolay Borisov
2020-05-01 19:45 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36da56d2-2384-87fb-8003-814e9c72ddbb@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.com \
--cc=josef@toxicpanda.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).