From: Anand Jain <anand.jain@oracle.com>
To: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.cz
Subject: Re: [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace
Date: Thu, 14 Apr 2016 17:57:40 +0800 [thread overview]
Message-ID: <570F6994.9030605@oracle.com> (raw)
In-Reply-To: <20160414092246.GB17024@jeknote.loshitsa1.net>
On 04/14/2016 05:22 PM, Yauhen Kharuzhy wrote:
> On Thu, Apr 14, 2016 at 04:45:11PM +0800, Anand Jain wrote:
>>
>>
>>
>> Thanks for the report ! more below..
>>
>>
>> You may use simpler devmgt tool, https://github.com/asj/devmgt
>
> Thanks, will try.
>
>>
>> You are failing the replace-target, presumably when the replace is
>> still running, however note that this patch-set does not fail the
>> replace-target for errors (as of now I have no idea how to do that
>> without leading to a messy situation), and so it would follow the
>> original code as without this patch.
>> Next, originally with-out this patch-set we won't close any device
>> for errors. So when you delete the device at the block-layer and
>> re-attach (scan) most probably you are having a newer device path
>> to the block device. (which kind of defeats the idea of testing
>> an intermittently disappearing device), so I doubt, if the test
>> case is reliable, and above panic is btrfs related and if its
>> this patch-set related.
>
> No, It is fixed by my latest patch (about of s_bdev field in
> superblock). Actual sequence which leads to oops is:
> 1) FS is mounted, s_bdev is NULL
> 2) failed device is closed, s_bdev untouched
> 3) missing device is replaced, s_bdev is set to non-NULL – bdev of
> the replaced device
> 4) at second device closing, s_bdev is "changed" to first device from
> the device list but it is... some device because closed dev still
> didn't delete from the list!
> 5) after device closing, s_bdev points to invalid bdev.
> 6) umount -> sync_filesystem() -> sync_blokdev(s_bdev) -> OOPS.
>
This is wrong. It should be other way around. That is s_bdev
should continue to be NULL. And if s_bdev continues to be NULL
the sync thread will fail-safe.
The diff sent in the other thread will fix.
Thanks, Anand
next prev parent reply other threads:[~2016-04-14 9:58 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-12 14:15 [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace Anand Jain
2016-04-12 14:15 ` [PATCH 01/13] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2016-04-12 19:21 ` Yauhen Kharuzhy
2016-04-12 14:15 ` [PATCH 02/13] btrfs: Do per-chunk check for mount time check Anand Jain
2016-04-12 14:15 ` [PATCH 03/13] btrfs: Do per-chunk degraded check for remount Anand Jain
2016-04-12 14:15 ` [PATCH 04/13] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2016-04-12 14:15 ` [PATCH 05/13] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2016-04-12 14:15 ` [PATCH 06/13] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2016-04-12 14:15 ` [PATCH 07/13] btrfs: add check not to mount a spare device Anand Jain
2016-04-12 14:15 ` [PATCH 08/13] btrfs: support btrfs dev scan for " Anand Jain
2016-04-12 14:15 ` [PATCH 09/13] btrfs: provide framework to get and put a " Anand Jain
2016-04-12 14:16 ` [PATCH 10/13] btrfs: introduce helper functions to perform hot replace Anand Jain
2016-04-12 14:40 ` kbuild test robot
2016-04-12 14:16 ` [PATCH 11/13] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2016-04-14 1:15 ` [PATCH] Btrfs: Set superblock s_bdev field properly at device closing Yauhen Kharuzhy
2016-04-14 6:59 ` Anand Jain
2016-04-14 9:10 ` Yauhen Kharuzhy
2016-04-14 9:48 ` Anand Jain
2016-04-14 10:51 ` [PATCH v5 11/13] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2016-04-14 16:56 ` Yauhen Kharuzhy
2016-04-18 10:50 ` Anand Jain
2016-04-12 14:16 ` [PATCH 12/13] btrfs: check device for critical errors and mark failed Anand Jain
2016-04-12 14:16 ` [PATCH 13/13] btrfs: check for failed device and hot replace Anand Jain
2016-04-12 20:02 ` [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace Yauhen Kharuzhy
2016-04-13 22:43 ` Anand Jain
2016-04-13 21:21 ` Yauhen Kharuzhy
2016-04-14 8:45 ` Anand Jain
2016-04-14 9:22 ` Yauhen Kharuzhy
2016-04-14 9:57 ` Anand Jain [this message]
2016-04-14 19:12 ` Yauhen Kharuzhy
2016-04-14 23:09 ` Yauhen Kharuzhy
2016-04-18 8:54 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=570F6994.9030605@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=yauhen.kharuzhy@zavadatar.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).