From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-16.italiaonline.it ([212.48.25.144]:48999 "EHLO libero.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751432AbcK1Spv (ORCPT ); Mon, 28 Nov 2016 13:45:51 -0500 Reply-To: kreijack@inwind.it Subject: Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q References: <20161121085016.7148-1-quwenruo@cn.fujitsu.com> <94606bda-dab0-e7c9-7fc6-1af9069b64fc@inwind.it> <20161125043119.GG8685@hungrycats.org> <20161126185402.GK8685@hungrycats.org> <59e0b1c7-51a9-ede4-6571-fa0b20394145@inwind.it> <451f26a9-2880-9887-c17e-97c2690a2baa@cn.fujitsu.com> To: Qu Wenruo , Zygo Blaxell Cc: linux-btrfs@vger.kernel.org From: Goffredo Baroncelli Message-ID: Date: Mon, 28 Nov 2016 19:45:49 +0100 MIME-Version: 1.0 In-Reply-To: <451f26a9-2880-9887-c17e-97c2690a2baa@cn.fujitsu.com> Content-Type: text/plain; charset=utf-8 Sender: linux-btrfs-owner@vger.kernel.org List-ID: On 2016-11-28 01:40, Qu Wenruo wrote: > > At 11/27/2016 07:16 AM, Goffredo Baroncelli wrote: >> On 2016-11-26 19:54, Zygo Blaxell wrote: >>> On Sat, Nov 26, 2016 at 02:12:56PM +0100, Goffredo Baroncelli wrote: >>>> On 2016-11-25 05:31, Zygo Blaxell wrote: >> [...] >>>> >>>> BTW Btrfs in RAID1 mode corrects the data even in the read case. So >>> >>> Have you tested this? I think you'll find that it doesn't. >> >> Yes I tested it; and it does the rebuild automatically. >> I corrupted a disk of mirror, then I read the related file. The log says: >> >> [ 59.287748] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 12813760 expected csum 3114703128 >> [ 59.291542] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 12813760 expected csum 3114703128 >> [ 59.294950] BTRFS info (device vdb): read error corrected: ino 257 off 0 (dev /dev/vdb sector 2154496) >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >> >> IIRC In case of RAID5/6 the last line is missing. However in both the case the data returned is good; but in RAID1 the data is corrected also on the disk. >> >> Where you read that the data is not rebuild automatically ? >> >> In fact I was surprised that RAID5/6 behaves differently.... >> > > Yes, I also tried that and realized that RAID1 is recovering corrupted data at *READ* time. > > The main difference between RAID1 and RAID56 seems to be the complexity. > > For RAID56, we have different read/write behavior, for read, we use flag BTRFS_RBIO_READ_REBUILD, which will only rebuild data but not write them into disk. > And I'm a little concern about the race between read time fix and write. > > I assume it's possible to change the behavior to follow RAID1, but I'd like to do it in the following steps: > 1) Fix known RAID56 bugs > With the v3 patch and previous 2 patches, it seems OK now. > 2) Full fstests test case, with all possible corruption combination > (WIP) > 3) Rework current RAID56 code to a cleaner and more readable status > (long term) > 4) Add the support to fix things at read time. > > So the behavior change is not something we will see in short time. +1 I am understanding that the status of RAID5/6 code is so badly that we need to correct all the more critical bugs and then increase the tests for not regression. On the point 3, I don't know the code well enough to say something, the code is very complex. I see the point 4 as the less urgent. Let me to make a request: I would like to know your opinion about my email "RFC: raid with a variable stripe size", which started a little thread. I am asking this because you now have the hands on this code: my suggestion (use different BG, with different stripe size to avoid RMW cycles) or the Zygo's one (don't fill a stripe if you don't need to avoid RMW cycles) are difficult to implement ? BR G.Baroncelli > > Thanks, > Qu -- gpg @keyserver.linux.it: Goffredo Baroncelli Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5