Re: read errors corrected

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Neil Brown <neilb@suse.de>
To: James <jtp@nc.rr.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: read errors corrected
Date: Fri, 31 Dec 2010 10:12:43 +1100	[thread overview]
Message-ID: <20101231101243.666e0f9e@notabene.brown> (raw)
In-Reply-To: <AANLkTi=vBWKjQStwpWZ3b=YL93LTDBfGk7b5jKUHLcO7@mail.gmail.com>

On Thu, 30 Dec 2010 11:35:59 -0500 James <jtp@nc.rr.com> wrote:

> Sorry Neil, I meant to reply-all.
> 
> -james
> 
> On Thu, Dec 30, 2010 at 11:35, James <jtp@nc.rr.com> wrote:
> > Inline.
> >
> > On Thu, Dec 30, 2010 at 04:15, Neil Brown <neilb@suse.de> wrote:
> >> On Thu, 30 Dec 2010 03:20:48 +0000 James <jtp@nc.rr.com> wrote:
> >>
> >>> All,
> >>>
> >>> I'm looking for a bit of guidance here. I have a RAID 6 set up on my
> >>> system and am seeing some errors in my logs as follows:
> >>>
> >>> # cat messages | grep "read erro"
> >>> Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
> >>> sectors at 974262528 on sda4)
> >>> Dec 15 15:40:34 nuova kernel: md/raid:md4: read error corrected (8
> >>> sectors at 974262536 on sda4)
> >> .....
> >>
> >>>
> >>> I've Google'd the heck out of this error message but am not seeing a
> >>> clear and concise message: is this benign? What would cause these
> >>> errors? Should I be concerned?
> >>>
> >>> There is an error message (read error corrected) on each of the drives
> >>> in the array. They all seem to be functioning properly. The I/O on the
> >>> drives is pretty heavy for some parts of the day.
> >>>
> >>> # cat /proc/mdstat
> >>> Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5]
> >>> [raid4] [multipath]
> >>> md1 : active raid6 sdb1[1] sda1[0] sdd1[3] sdc1[2]
> >>>       497792 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
> >>>
> >>> md2 : active raid6 sdb2[1] sda2[0] sdd2[3] sdc2[2]
> >>>       4000000 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
> >>>
> >>> md3 : active raid6 sdb3[1] sda3[0] sdd3[3] sdc3[2]
> >>>       25992960 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
> >>>
> >>> md4 : active raid6 sdb4[1] sda4[0] sdd4[3] sdc4[2]
> >>>       2899780480 blocks level 6, 64k chunk, algorithm 2 [4/4] [UUUU]
> >>>
> >>> unused devices: <none>
> >>>
> >>> I have a really hard time believing there's something wrong with all
> >>> of the drives in the array, although admittedly they're the same model
> >>> from the same manufacturer.
> >>>
> >>> Can someone point me in the right direction?
> >>> (a) what causes these errors precisely?
> >>
> >> When md/raid6 tries to read from a device and gets a read error, it try to
> >> read from other other devices.  When that succeeds it computes the data that
> >> it had tried to read and then write it back to the original drive.  If this
> >> succeeded is assumes that the read error has been correct by a write, and
> >> prints the message that you see.
> >>
> >>
> >>> (b) is the error benign? How can I determine if it is *likely* a
> >>> hardware problem? (I imagine it's probably impossible to tell if it's
> >>> HW until it's too late)
> >>
> >> A few occasional messages like this are fairly benign.  The could be a sign
> >> that the drive surface is degrading.  If you see lots of these messages, then
> >> you should seriously consider replacing the drive.
> >
> > Wow, this is hard for me to believe considering this is happening on
> > all the drives. It's not impossible, however, since the drives are
> > likely from the same batch.
> >
> >> As you are seeing these message across all devices, it is possible that the
> >> problem is with the sata controller rather than the disks.  Do know which you
> >> should check the errors that are reported in dmesg.  If you don't understand
> >> these message, then post them to the list - feel free to post several hundred
> >> lines of logs - too much is much much better than not enough.
> >
> > I posted a few errors in my response to the thread a bit ago -- here's
> > another snippet:
> >
> > Dec 29 01:55:03 nuova kernel: sd 1:0:0:0: [sdc] Unhandled error code
> > Dec 29 01:55:03 nuova kernel: sd 1:0:0:0: [sdc] Result: hostbyte=0x00
> > driverbyte=0x06
> > Dec 29 01:55:03 nuova kernel: sd 1:0:0:0: [sdc] CDB: cdb[0]=0x28: 28
> > 00 25 a2 a0 6a 00 00 80 00
> > Dec 29 01:55:03 nuova kernel: end_request: I/O error, dev sdc, sector 631414890
> > Dec 29 01:55:03 nuova kernel: sd 0:0:1:0: [sdb] Unhandled error code
> > Dec 29 01:55:03 nuova kernel: sd 0:0:1:0: [sdb] Result: hostbyte=0x00
> > driverbyte=0x06

"Unhandled error code" sounds like it could be a driver problem...

Try googling that error message...

http://us.generation-nt.com/answer/2-6-33-libata-issues-via-sata-pata-controller-help-197123882.html


"Also, please try the latest 2.6.34-rc kernel, as that has several fixes
for both pata_via and sata_via which did not make 2.6.33."

What kernel are  you running???

NeilBrown




> > Dec 29 01:55:03 nuova kernel: sd 0:0:1:0: [sdb] CDB: cdb[0]=0x28: 28
> > 00 25 a2 a0 ea 00 00 38 00
> > Dec 29 01:55:03 nuova kernel: end_request: I/O error, dev sdb, sector 631415018
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923648 on sdb4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923656 on sdb4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923664 on sdb4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923672 on sdb4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923680 on sdb4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923688 on sdb4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923696 on sdb4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923520 on sdc4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923528 on sdc4)
> > Dec 29 01:58:23 nuova kernel: md/raid:md4: read error corrected (8
> > sectors at 600923536 on sdc4)
> >
> > Is there a good way to determine if the issue is with the motherboard
> > (where the SATA controller is), or with the drives themselves?
> >
> >> NeilBrown
> >>
> >>
> >>
> >>> (c) are these errors expected in a RAID array that is heavily used?
> >>> (d) what kind of errors should I see regarding "read errors" that
> >>> *would* indicate an imminent hardware failure?
> >>>
> >>> Thoughts and ideas would be welcomed. I'm sure a thread where some
> >>> hefty discussion is thrown at this topic will help future Googlers
> >>> like me. :)
> >>>
> >>> -james
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2010-12-30 23:12 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-30  3:20 read errors corrected James
2010-12-30  5:24 ` Mikael Abrahamsson
2010-12-30 16:33   ` James
2010-12-30 16:44     ` Roman Mamedov
2010-12-30 16:51       ` James
2010-12-30 17:59         ` Ryan Wagoner
2010-12-30 18:03           ` James
2010-12-30  9:15 ` Neil Brown
     [not found]   ` <AANLkTik2+Gk1XveqD=crGMH5JshzJqQb_i77ZpOFUncB@mail.gmail.com>
2010-12-30 16:35     ` James
2010-12-30 23:12       ` Neil Brown [this message]
2010-12-31  1:48         ` James
2010-12-31  1:56           ` Guy Watkins
2010-12-31  2:08           ` Neil Brown
2010-12-30 10:13 ` Giovanni Tessore
2010-12-30 16:41   ` James
2011-01-15 12:00     ` Giovanni Tessore
2011-01-16  8:33       ` Jaap Crezee
  -- strict thread matches above, loose matches on Subject: below --
2010-12-30 20:19 Richard Scobie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101231101243.666e0f9e@notabene.brown \
    --to=neilb@suse.de \
    --cc=jtp@nc.rr.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).