From: NeilBrown <neilb@suse.de>
To: Gavin Flower <gavinflower@yahoo.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
Date: Fri, 8 Apr 2011 21:50:00 +1000 [thread overview]
Message-ID: <20110408215000.15c881bb@notabene.brown> (raw)
In-Reply-To: <1215.67697.qm@web65110.mail.ac2.yahoo.com>
On Fri, 8 Apr 2011 02:59:52 -0700 (PDT) Gavin Flower <gavinflower@yahoo.com>
wrote:
>
> --- On Fri, 8/4/11, NeilBrown <neilb@suse.de> wrote:
>
> > From: NeilBrown <neilb@suse.de>
> > Subject: Re: RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive
> > To: "Gavin Flower" <gavinflower@yahoo.com>
> > Cc: linux-raid@vger.kernel.org
> > Date: Friday, 8 April, 2011, 21:34
> > On Thu, 7 Apr 2011 18:32:04 -0700
> > (PDT) Gavin Flower <gavinflower@yahoo.com>
> > wrote:
> >
> > > Hi Neil,
> > >
> > > My original email may have been eaten: as it did not
> > appear on the list, nor did I get an error message
> > back. So perhaps there was a problem with the attached
> > files.
> > >
> > > I will resend the attachments one at a time in
> > separate emails.
> > >
> > >
> > > Cheers,
> > > Gavin
> > >
> > > [begin original]
> > > Hi Neil,
> > >
> > > Your help (or anybody else's) would be greatly
> > appreciated, yet again
> >
> > Hi Gavin,
> > it isn't clear to me what help you want.
> >
> > Obviously there is some sort of hardware issue - possible a
> > drive, possibly a
> > bus problem - I really don't know.
> >
> > Apart from that things look normal.
> >
> > What exactly did you want explained?
> >
> > NeilBrown
>
> I guess I was surprised that the RAID system appeared normal and that it did not register any errors. I was hoping to get an idea as to which drive was problematic.
sdc2 was reporting read error. md/raid6 computed the data from the other
devices and wrote it back to sdc2. This appeared to work so md/raid6 assumed
everything was fine again. It reported this:
Apr 7 08:42:08 saturn kernel: [210414.109880] md/raid:md1: read error corrected (8 sectors at 17195840 on sdc2)
but didn't fail anything.
>
> I get the feeling, from your reply, that this is not specifically a RAID problem, that it just happens to affect a RAID array.
No, it was clearly a disk-drive problem.
e.g.
Apr 7 14:42:12 saturn kernel: [231957.756023] ata3.00: failed command: READ FPDMA QUEUED
a READ command sent to a n 'ata' device failed. i.e. disk error.
>
> I had thought that the RAID system should have been able to give me better diagnostics, but possibly I am being (inadvertently) unreasonable!
Well.... it did tell you that it got a read error and corrected it.
>
> Not sure what the significance of this mismatch is, and what I should do about it.
> # cat /sys/block/md2/md/mismatch_cnt
> 28904
> #
I'm not sure if read errors end up counting as mismatches.. They seem to for
raid1. The raid6 code is more complex and I don't feel like decoding it
right now.
In terms of "what to do about it" - the first thing must be to fix sdc.
Maybe there is a loose cable or a broken cable. Maybe the device needs to be
replaced.
Once you have resolved that and are fairly sure yours drives are all working,
echo check > /sys/block/md2/md/sync_action
once that finishes mismatch_cnt should ideally be zero. If it isn't, try
echo repair > /sys/block/md2/md/sync_action
but only do that if you are confident that your devices are good.
This will result in the same mismatch_cnt. However a subsequent 'check'
should then show zero.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-04-08 11:50 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-04-08 1:32 RAID6 data-check took almost 2 hours, clicking sounds, system unresponsive Gavin Flower
2011-04-08 9:34 ` NeilBrown
2011-04-08 9:59 ` Gavin Flower
2011-04-08 11:50 ` NeilBrown [this message]
2011-04-11 6:50 ` Gavin Flower
2011-04-12 21:30 ` Gavin Flower
2011-04-13 10:57 ` John Robinson
2011-04-13 11:13 ` NeilBrown
2011-04-13 11:58 ` John Robinson
2011-04-13 20:30 ` Gavin Flower
-- strict thread matches above, loose matches on Subject: below --
2011-04-14 21:14 Gavin Flower
2011-04-14 21:19 ` Mathias Burén
2011-04-14 23:15 ` John Robinson
2011-04-13 22:24 Gavin Flower
2011-04-13 22:28 ` Mathias Burén
2011-04-14 0:15 ` Gavin Flower
2011-04-14 4:08 ` Roman Mamedov
2011-04-14 13:16 ` Phil Turmel
2011-04-14 21:12 ` Gavin Flower
2011-04-14 22:23 ` Phil Turmel
2011-04-28 20:03 ` Gavin Flower
2011-04-28 20:11 ` Roman Mamedov
2011-04-28 22:11 ` Phil Turmel
2011-04-28 22:40 ` Phil Turmel
2011-04-13 23:09 ` NeilBrown
2011-04-08 2:01 Gavin Flower
2011-04-08 1:34 Gavin Flower
2011-04-07 21:58 Gavin Flower
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110408215000.15c881bb@notabene.brown \
--to=neilb@suse.de \
--cc=gavinflower@yahoo.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).