From: MRK <mrk@shiftmail.org>
To: Janos Haar <janos.haar@netcenter.hu>
Cc: linux-raid@vger.kernel.org, Neil Brown <neilb@suse.de>
Subject: Re: Suggestion needed for fixing RAID6
Date: Wed, 28 Apr 2010 01:02:14 +0200 [thread overview]
Message-ID: <4BD76CF6.5020804@shiftmail.org> (raw)
In-Reply-To: <80a201cae621$684daa30$0400a8c0@dcccs>
On 04/27/2010 05:50 PM, Janos Haar wrote:
>
> ----- Original Message ----- From: "MRK" <mrk@shiftmail.org>
> To: "Janos Haar" <janos.haar@netcenter.hu>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Monday, April 26, 2010 6:53 PM
> Subject: Re: Suggestion needed for fixing RAID6
>
>
>> On 04/26/2010 02:52 PM, Janos Haar wrote:
>>>
>>> Oops, you are right!
>>> It was my mistake.
>>> Sorry, i will try it again, to support 2 drives with dm-cow.
>>> I will try it.
>>
>> Great! post here the results... the dmesg in particular.
>> The dmesg should contain multiple lines like this "raid5:md3: read
>> error corrected ....."
>> then you know it worked.
>
> I am affraid i am still right about that....
>
> ...
> end_request: I/O error, dev sdh, sector 1667152256
> raid5:md3: read error not correctable (sector 1662188168 on dm-1).
> raid5: Disk failure on dm-1, disabling device.
> raid5: Operation continuing on 10 devices.
I think I can see a problem here:
You had 11 active devices over 12 when you received the read error.
At 11 devices over 12 your array is singly-degraded and this should be
enough for raid6 to recompute the block from parity and perform the
rewrite, correcting the read-error, but instead MD declared that it's
impossible to correct the error, and dropped one more device (going to
doubly-degraded).
I think this is an MD bug, and I think I know where it is:
--- linux-2.6.33-vanilla/drivers/md/raid5.c 2010-02-24
19:52:17.000000000 +0100
+++ linux-2.6.33/drivers/md/raid5.c 2010-04-27 23:58:31.000000000 +0200
@@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
atomic_inc(&rdev->read_errors);
- if (conf->mddev->degraded)
+ if (conf->mddev->degraded == conf->max_degraded)
printk_rl(KERN_WARNING
"raid5:%s: read error not correctable "
"(sector %llu on %s).\n",
------------------------------------------------------
(This is just compile-tested so try at your risk)
I'd like to hear what Neil thinks of this...
The problem here (apart from the erroneous error message) is that if
execution goes inside that "if" clause, it will eventually reach the
md_error() statement some 30 lines below there, which will have the
effect of dropping one further device further worsening the situation
instead of recovering it, and this is not the correct behaviour in this
case as far as I understand.
At the current state raid6 behaves like if it was a raid5, effectively
supporting only one failed disk.
next prev parent reply other threads:[~2010-04-27 23:02 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
2010-04-22 15:00 ` Mikael Abrahamsson
2010-04-22 15:12 ` Janos Haar
2010-04-22 15:18 ` Mikael Abrahamsson
2010-04-22 16:25 ` Janos Haar
2010-04-22 16:32 ` Peter Rabbitson
[not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
2010-04-22 20:48 ` Janos Haar
2010-04-23 6:51 ` Luca Berra
2010-04-23 8:47 ` Janos Haar
2010-04-23 12:34 ` MRK
2010-04-24 19:36 ` Janos Haar
2010-04-24 22:47 ` MRK
2010-04-25 10:00 ` Janos Haar
2010-04-26 10:24 ` MRK
2010-04-26 12:52 ` Janos Haar
2010-04-26 16:53 ` MRK
2010-04-26 22:39 ` Janos Haar
2010-04-26 23:06 ` Michael Evans
[not found] ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
2010-04-27 0:04 ` Michael Evans
2010-04-27 15:50 ` Janos Haar
2010-04-27 23:02 ` MRK [this message]
2010-04-28 1:37 ` Neil Brown
2010-04-28 2:02 ` Mikael Abrahamsson
2010-04-28 2:12 ` Neil Brown
2010-04-28 2:30 ` Mikael Abrahamsson
2010-05-03 2:29 ` Neil Brown
2010-04-28 12:57 ` MRK
2010-04-28 13:32 ` Janos Haar
2010-04-28 14:19 ` MRK
2010-04-28 14:51 ` Janos Haar
2010-04-29 7:55 ` Janos Haar
2010-04-29 15:22 ` MRK
2010-04-29 21:07 ` Janos Haar
2010-04-29 23:00 ` MRK
2010-04-30 6:17 ` Janos Haar
2010-04-30 23:54 ` MRK
[not found] ` <4BDB6DB6.5020306@sh iftmail.org>
2010-05-01 9:37 ` Janos Haar
2010-05-01 17:17 ` MRK
2010-05-01 21:44 ` Janos Haar
2010-05-02 23:05 ` MRK
2010-05-03 2:17 ` Neil Brown
2010-05-03 10:04 ` MRK
2010-05-03 10:21 ` MRK
2010-05-03 21:04 ` Neil Brown
2010-05-03 21:02 ` Neil Brown
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
2010-05-03 10:20 ` Janos Haar
2010-05-05 15:24 ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
2010-05-05 19:27 ` MRK
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BD76CF6.5020804@shiftmail.org \
--to=mrk@shiftmail.org \
--cc=janos.haar@netcenter.hu \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).