From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:33945 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751980AbdJCP7u (ORCPT ); Tue, 3 Oct 2017 11:59:50 -0400 Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v93Fxn8u002544 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 3 Oct 2017 15:59:50 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v93FxnDN023612 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 3 Oct 2017 15:59:49 GMT Received: from abhmp0015.oracle.com (abhmp0015.oracle.com [141.146.116.21]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id v93FxnXL002610 for ; Tue, 3 Oct 2017 15:59:49 GMT From: Anand Jain To: linux-btrfs@vger.kernel.org Cc: bo.li.liu@oracle.com Subject: [PATCH v8 0/2] [RFC] Introduce device state 'failed' Date: Tue, 3 Oct 2017 23:59:18 +0800 Message-Id: <20171003155920.24925-1-anand.jain@oracle.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: When one device fails it has to be closed and marked as failed. Further it needs sysfs (or some) interface to provide complete information about the device and the volume status to the user land from the kernel. Next when the disappeared device reappears we need to resilver/insync depending on the RAID profile which should be handled per RAID profile specific. The efforts here are to fix above three missing items. To begin with this patch brings a Write/Flush failed device to a failed state. Next about bringing the device back to the alloc list and verifying its consistency and kicking off the re-silvering part that still WIP, & feedback helps. For RAID1 a convert of single raid profile back to all raid1 will help. For RAID56 I am backing on Luibo's recent RAID56 write hole work I am yet to look deeper on that. Next for RAID1 there can be split brain scenario where each of the devices were mounted independently, so to fix this I planning to set an (new) incompatible flag if any of the device is written without the other. Now when they are brought together then incompatible flag should be their on only one of the device, however if incompatible flag is on both the devices then its a split brain scenario where user intervention will be required. On the sysfs part there are patches in the ML which was sent before, I shall be reviving them as well. Thanks, Anand Anand Jain (2): btrfs: introduce device dynamic state transition to failed btrfs: check device for critical errors and mark failed fs/btrfs/ctree.h | 2 + fs/btrfs/disk-io.c | 78 ++++++++++++++++++++++++++++++++++++++- fs/btrfs/volumes.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.h | 19 +++++++++- 4 files changed, 202 insertions(+), 2 deletions(-) -- 2.7.0