From: Anand Jain <anand.jain@oracle.com>
To: Yauhen Kharuzhy <yauhen.kharuzhy@zavadatar.com>
Cc: linux-btrfs@vger.kernel.org, dsterba@suse.cz
Subject: Re: [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace
Date: Thu, 14 Apr 2016 17:57:40 +0800 [thread overview]
Message-ID: <570F6994.9030605@oracle.com> (raw)
In-Reply-To: <20160414092246.GB17024@jeknote.loshitsa1.net>
On 04/14/2016 05:22 PM, Yauhen Kharuzhy wrote:
> On Thu, Apr 14, 2016 at 04:45:11PM +0800, Anand Jain wrote:
>>
>>
>>
>> Thanks for the report ! more below..
>>
>>
>> You may use simpler devmgt tool, https://github.com/asj/devmgt
>
> Thanks, will try.
>
>>
>> You are failing the replace-target, presumably when the replace is
>> still running, however note that this patch-set does not fail the
>> replace-target for errors (as of now I have no idea how to do that
>> without leading to a messy situation), and so it would follow the
>> original code as without this patch.
>> Next, originally with-out this patch-set we won't close any device
>> for errors. So when you delete the device at the block-layer and
>> re-attach (scan) most probably you are having a newer device path
>> to the block device. (which kind of defeats the idea of testing
>> an intermittently disappearing device), so I doubt, if the test
>> case is reliable, and above panic is btrfs related and if its
>> this patch-set related.
>
> No, It is fixed by my latest patch (about of s_bdev field in
> superblock). Actual sequence which leads to oops is:
> 1) FS is mounted, s_bdev is NULL
> 2) failed device is closed, s_bdev untouched
> 3) missing device is replaced, s_bdev is set to non-NULL – bdev of
> the replaced device
> 4) at second device closing, s_bdev is "changed" to first device from
> the device list but it is... some device because closed dev still
> didn't delete from the list!
> 5) after device closing, s_bdev points to invalid bdev.
> 6) umount -> sync_filesystem() -> sync_blokdev(s_bdev) -> OOPS.
>
This is wrong. It should be other way around. That is s_bdev
should continue to be NULL. And if s_bdev continues to be NULL
the sync thread will fail-safe.
The diff sent in the other thread will fix.
Thanks, Anand
next prev parent reply other threads:[~2016-04-14 9:58 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-12 14:15 [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace Anand Jain
2016-04-12 14:15 ` [PATCH 01/13] btrfs: Introduce a new function to check if all chunks a OK for degraded mount Anand Jain
2016-04-12 19:21 ` Yauhen Kharuzhy
2016-04-12 14:15 ` [PATCH 02/13] btrfs: Do per-chunk check for mount time check Anand Jain
2016-04-12 14:15 ` [PATCH 03/13] btrfs: Do per-chunk degraded check for remount Anand Jain
2016-04-12 14:15 ` [PATCH 04/13] btrfs: Allow barrier_all_devices to do per-chunk device check Anand Jain
2016-04-12 14:15 ` [PATCH 05/13] btrfs: Cleanup num_tolerated_disk_barrier_failures Anand Jain
2016-04-12 14:15 ` [PATCH 06/13] btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV Anand Jain
2016-04-12 14:15 ` [PATCH 07/13] btrfs: add check not to mount a spare device Anand Jain
2016-04-12 14:15 ` [PATCH 08/13] btrfs: support btrfs dev scan for " Anand Jain
2016-04-12 14:15 ` [PATCH 09/13] btrfs: provide framework to get and put a " Anand Jain
2016-04-12 14:16 ` [PATCH 10/13] btrfs: introduce helper functions to perform hot replace Anand Jain
2016-04-12 14:40 ` kbuild test robot
2016-04-12 14:16 ` [PATCH 11/13] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2016-04-14 1:15 ` [PATCH] Btrfs: Set superblock s_bdev field properly at device closing Yauhen Kharuzhy
2016-04-14 6:59 ` Anand Jain
2016-04-14 9:10 ` Yauhen Kharuzhy
2016-04-14 9:48 ` Anand Jain
2016-04-14 10:51 ` [PATCH v5 11/13] btrfs: introduce device dynamic state transition to offline or failed Anand Jain
2016-04-14 16:56 ` Yauhen Kharuzhy
2016-04-18 10:50 ` Anand Jain
2016-04-12 14:16 ` [PATCH 12/13] btrfs: check device for critical errors and mark failed Anand Jain
2016-04-12 14:16 ` [PATCH 13/13] btrfs: check for failed device and hot replace Anand Jain
2016-04-12 20:02 ` [PATCH v4 00/13] Introduce device state 'failed', spare device and auto replace Yauhen Kharuzhy
2016-04-13 22:43 ` Anand Jain
2016-04-13 21:21 ` Yauhen Kharuzhy
2016-04-14 8:45 ` Anand Jain
2016-04-14 9:22 ` Yauhen Kharuzhy
2016-04-14 9:57 ` Anand Jain [this message]
2016-04-14 19:12 ` Yauhen Kharuzhy
2016-04-14 23:09 ` Yauhen Kharuzhy
2016-04-18 8:54 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=570F6994.9030605@oracle.com \
--to=anand.jain@oracle.com \
--cc=dsterba@suse.cz \
--cc=linux-btrfs@vger.kernel.org \
--cc=yauhen.kharuzhy@zavadatar.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.