From: Neil Brown <neilb@suse.de>
To: MRK <mrk@shiftmail.org>
Cc: Janos Haar <janos.haar@netcenter.hu>, linux-raid@vger.kernel.org
Subject: Re: Suggestion needed for fixing RAID6
Date: Tue, 4 May 2010 07:04:32 +1000 [thread overview]
Message-ID: <20100504070432.371a9df9@notabene.brown> (raw)
In-Reply-To: <4BDEA394.6010502@shiftmail.org>
On Mon, 03 May 2010 12:21:08 +0200
MRK <mrk@shiftmail.org> wrote:
> On 05/03/2010 12:04 PM, MRK wrote:
> > On 05/03/2010 04:17 AM, Neil Brown wrote:
> >> On Sat, 1 May 2010 23:44:04 +0200
> >> "Janos Haar"<janos.haar@netcenter.hu> wrote:
> >>
> >>> The general problem is, i have one single-degraded RAID6 + 2
> >>> badblock disk
> >>> inside wich have bads in different location.
> >>> The big question is how to keep the integrity or how to do the
> >>> rebuild by 2
> >>> step instead of one continous?
> >> Once you have the fix that has already been discussed in this thread,
> >> the
> >> only other problem I can see with this situation is if attempts to
> >> write good
> >> data over the read-errors results in a write-error which causes the
> >> device to
> >> be evicted from the array.
> >>
> >> And I think you have reported getting write
> >> errors.
> >
> > His dmesg AFAIR has never reported any error of the kind "raid5:%s:
> > read error NOT corrected!! " (the error message you get on failed
> > rewrite AFAIU)
> > Up to now (after my patch) he only tried with MD above DM-COW and DM
> > was dropping the drive on read error so I think MD didn't get any
> > opportunity to rewrite.
> >
> > It is not clear to me what kind of error MD got from DM:
> >
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: device-mapper: snapshots:
> > Invalidating snapshot: Error reading/writing.
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: ata8: EH complete
> > Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> > disabling device.
> >
> > I don't understand from what place the md_error() is called...
> > [CUT]
>
> Oh and there is another issue I wanted to expose:
>
> His last dmesg:
> http://download.netcenter.hu/bughunt/20100430/messages
>
> Much after the line:
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5: Disk failure on dm-1,
> disabling device.
>
> there are many lines like this:
> Apr 29 09:50:29 Clarus-gl2k10-2 kernel: raid5:md3: read error not
> correctable (sector 1662189872 on dm-1).
>
> How come MD still wants to read from a device it has disabled?
> looks like a problem to me...
There are often many IO requests in flight at the same time. When one
returns with an error we might fail the device but there are still lots more
that have not yet completed. As they complete we might write messages about
them - even after we have reported the device as 'failed'. But we never
initiate an IO after the device has been marked 'faulty'.
NeilBrown
> MD also scrubs failed devices during check?
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-05-03 21:04 UTC|newest]
Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-22 10:09 Suggestion needed for fixing RAID6 Janos Haar
2010-04-22 15:00 ` Mikael Abrahamsson
2010-04-22 15:12 ` Janos Haar
2010-04-22 15:18 ` Mikael Abrahamsson
2010-04-22 16:25 ` Janos Haar
2010-04-22 16:32 ` Peter Rabbitson
[not found] ` <4BD0AF2D.90207@stud.tu-ilmenau.de>
2010-04-22 20:48 ` Janos Haar
2010-04-23 6:51 ` Luca Berra
2010-04-23 8:47 ` Janos Haar
2010-04-23 12:34 ` MRK
2010-04-24 19:36 ` Janos Haar
2010-04-24 22:47 ` MRK
2010-04-25 10:00 ` Janos Haar
2010-04-26 10:24 ` MRK
2010-04-26 12:52 ` Janos Haar
2010-04-26 16:53 ` MRK
2010-04-26 22:39 ` Janos Haar
2010-04-26 23:06 ` Michael Evans
[not found] ` <7cfd01cae598$419e8d20$0400a8c0@dcccs>
2010-04-27 0:04 ` Michael Evans
2010-04-27 15:50 ` Janos Haar
2010-04-27 23:02 ` MRK
2010-04-28 1:37 ` Neil Brown
2010-04-28 2:02 ` Mikael Abrahamsson
2010-04-28 2:12 ` Neil Brown
2010-04-28 2:30 ` Mikael Abrahamsson
2010-05-03 2:29 ` Neil Brown
2010-04-28 12:57 ` MRK
2010-04-28 13:32 ` Janos Haar
2010-04-28 14:19 ` MRK
2010-04-28 14:51 ` Janos Haar
2010-04-29 7:55 ` Janos Haar
2010-04-29 15:22 ` MRK
2010-04-29 21:07 ` Janos Haar
2010-04-29 23:00 ` MRK
2010-04-30 6:17 ` Janos Haar
2010-04-30 23:54 ` MRK
[not found] ` <4BDB6DB6.5020306@sh iftmail.org>
2010-05-01 9:37 ` Janos Haar
2010-05-01 17:17 ` MRK
2010-05-01 21:44 ` Janos Haar
2010-05-02 23:05 ` MRK
2010-05-03 2:17 ` Neil Brown
2010-05-03 10:04 ` MRK
2010-05-03 10:21 ` MRK
2010-05-03 21:04 ` Neil Brown [this message]
2010-05-03 21:02 ` Neil Brown
[not found] ` <4BDE9FB6.80309@shiftmai! l.org>
2010-05-03 10:20 ` Janos Haar
2010-05-05 15:24 ` Suggestion needed for fixing RAID6 [SOLVED] Janos Haar
2010-05-05 19:27 ` MRK
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100504070432.371a9df9@notabene.brown \
--to=neilb@suse.de \
--cc=janos.haar@netcenter.hu \
--cc=linux-raid@vger.kernel.org \
--cc=mrk@shiftmail.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).