From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx2.suse.de ([195.135.220.15]:38022 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752363AbcD2Qhm (ORCPT ); Fri, 29 Apr 2016 12:37:42 -0400 Date: Fri, 29 Apr 2016 18:37:27 +0200 From: David Sterba To: Anand Jain Cc: linux-btrfs@vger.kernel.org, clm@fb.com Subject: Re: [PATCH 0/2] [RFC] btrfs: create degraded-RAID1 chunks Message-ID: <20160429163727.GD29353@suse.cz> Reply-To: dsterba@suse.cz References: <1461812780-538-1-git-send-email-anand.jain@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: <1461812780-538-1-git-send-email-anand.jain@oracle.com> Sender: linux-btrfs-owner@vger.kernel.org List-ID: On Thu, Apr 28, 2016 at 11:06:18AM +0800, Anand Jain wrote: > From the comments that commit[1] deleted > > - /* > - * we add in the count of missing devices because we want > - * to make sure that any RAID levels on a degraded FS > - * continue to be honored. > - * > > appear to me that automatic reduced-chunk-allocation > when RAID1 is degraded wasn't in the original design. > > which also introduced unpleasant things like automatically > allocating single chunks when RAID1 is mounted in degraded > mode, which will hinder further RAID1 mount in degraded > mode. Agreed. As the automatic conversion cannot be turned off, it causes some surprises. We've opposed against such things in the past, so I'm for not doing the 'single' allocations. Independly, I got a feedback from a user who liked the proposed change. > And now to fix the original issue that is - chunk allocation > fails when RAID1 is degraded, The reason for the problem > seems to be that we had the devs_min attribute for RAID1 > set wrongly. Correcting this also means that its time to > fix the RAID1 fixmes in the functions __btrfs_alloc_chunk() > patch [2] does that, and is for review. This means we'd allow full writes to a degraded raid1 filesystem. This can bring surprises as well. The question is what to do if the device pops out, some writes happen, and then is added. One option is to set some bit in the degraded filesystem that degraded writes happened. After that, mounting the whole filesystem would recommend running scrub before dropping the bit. Forcing a read-only mount here would be similar to read-only degraded mount, so I guess we'd have to somehow deal with the missing writes. I haven't thought about all details, the raid1 auto-repair can handle corrupted data, I think missing metadata should be handled as well and repaired.