From: Goffredo Baroncelli <kreijack@inwind.it>
To: Qu Wenruo <quwenruo@cn.fujitsu.com>,
Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q
Date: Mon, 28 Nov 2016 19:45:49 +0100 [thread overview]
Message-ID: <ad8f8dd8-123f-bbf3-04fb-bb6b64a050fd@inwind.it> (raw)
In-Reply-To: <451f26a9-2880-9887-c17e-97c2690a2baa@cn.fujitsu.com>
On 2016-11-28 01:40, Qu Wenruo wrote:
>
> At 11/27/2016 07:16 AM, Goffredo Baroncelli wrote:
>> On 2016-11-26 19:54, Zygo Blaxell wrote:
>>> On Sat, Nov 26, 2016 at 02:12:56PM +0100, Goffredo Baroncelli wrote:
>>>> On 2016-11-25 05:31, Zygo Blaxell wrote:
>> [...]
>>>>
>>>> BTW Btrfs in RAID1 mode corrects the data even in the read case. So
>>>
>>> Have you tested this? I think you'll find that it doesn't.
>>
>> Yes I tested it; and it does the rebuild automatically.
>> I corrupted a disk of mirror, then I read the related file. The log says:
>>
>> [ 59.287748] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 12813760 expected csum 3114703128
>> [ 59.291542] BTRFS warning (device vdb): csum failed ino 257 off 0 csum 12813760 expected csum 3114703128
>> [ 59.294950] BTRFS info (device vdb): read error corrected: ino 257 off 0 (dev /dev/vdb sector 2154496)
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>> IIRC In case of RAID5/6 the last line is missing. However in both the case the data returned is good; but in RAID1 the data is corrected also on the disk.
>>
>> Where you read that the data is not rebuild automatically ?
>>
>> In fact I was surprised that RAID5/6 behaves differently....
>>
>
> Yes, I also tried that and realized that RAID1 is recovering corrupted data at *READ* time.
>
> The main difference between RAID1 and RAID56 seems to be the complexity.
>
> For RAID56, we have different read/write behavior, for read, we use flag BTRFS_RBIO_READ_REBUILD, which will only rebuild data but not write them into disk.
> And I'm a little concern about the race between read time fix and write.
>
> I assume it's possible to change the behavior to follow RAID1, but I'd like to do it in the following steps:
> 1) Fix known RAID56 bugs
> With the v3 patch and previous 2 patches, it seems OK now.
> 2) Full fstests test case, with all possible corruption combination
> (WIP)
> 3) Rework current RAID56 code to a cleaner and more readable status
> (long term)
> 4) Add the support to fix things at read time.
>
> So the behavior change is not something we will see in short time.
+1
I am understanding that the status of RAID5/6 code is so badly that we need to correct all the more critical bugs and then increase the tests for not regression.
On the point 3, I don't know the code well enough to say something, the code is very complex.
I see the point 4 as the less urgent.
Let me to make a request: I would like to know your opinion about my email "RFC: raid with a variable stripe size", which started a little thread. I am asking this because you now have the hands on this code: my suggestion (use different BG, with different stripe size to avoid RMW cycles) or the Zygo's one (don't fill a stripe if you don't need to avoid RMW cycles) are difficult to implement ?
BR
G.Baroncelli
>
> Thanks,
> Qu
--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5
next prev parent reply other threads:[~2016-11-28 18:45 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-21 8:50 [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q Qu Wenruo
2016-11-21 18:48 ` Goffredo Baroncelli
2016-11-22 0:28 ` Qu Wenruo
2016-11-22 18:02 ` Goffredo Baroncelli
2016-11-25 4:31 ` Zygo Blaxell
2016-11-25 4:40 ` Gareth Pye
2016-11-25 5:07 ` Zygo Blaxell
2016-11-26 13:12 ` Goffredo Baroncelli
2016-11-26 18:54 ` Zygo Blaxell
2016-11-26 23:16 ` Goffredo Baroncelli
2016-11-27 16:53 ` Zygo Blaxell
2016-11-28 0:40 ` Qu Wenruo
2016-11-28 18:45 ` Goffredo Baroncelli [this message]
2016-11-28 19:01 ` Christoph Anton Mitterer
2016-11-28 19:39 ` Austin S. Hemmelgarn
2016-11-28 3:37 ` Christoph Anton Mitterer
2016-11-28 3:53 ` Andrei Borzenkov
2016-11-28 4:01 ` Christoph Anton Mitterer
2016-11-28 18:32 ` Goffredo Baroncelli
2016-11-28 19:00 ` Christoph Anton Mitterer
2016-11-28 21:48 ` Zygo Blaxell
2016-11-29 1:52 ` Christoph Anton Mitterer
2016-11-29 3:19 ` Zygo Blaxell
2016-11-29 7:35 ` Adam Borowski
2016-11-29 14:24 ` Christoph Anton Mitterer
2016-11-22 18:58 ` Chris Mason
2016-11-23 0:26 ` Qu Wenruo
2016-11-26 17:18 ` Chris Mason
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ad8f8dd8-123f-bbf3-04fb-bb6b64a050fd@inwind.it \
--to=kreijack@inwind.it \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).