Re: Raid 6 - TLER/CCTL/ERC

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Robinson <john.robinson@anonymous.org.uk>
To: Peter Zieba <pzieba@networkmayhem.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid 6 - TLER/CCTL/ERC
Date: Wed, 06 Oct 2010 21:24:41 +0100	[thread overview]
Message-ID: <4CACDB09.20808@anonymous.org.uk> (raw)
In-Reply-To: <556404795.674.1286344296358.JavaMail.root@mail.networkmayhem.com>

On 06/10/2010 06:51, Peter Zieba wrote:
> Hey all,
>
> I have a question regarding Linux raid and degraded arrays.
>
> My configuration involves:
>   - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)

I have some of these drives too. I wouldn't go so far as to call them 
terrible, though 2 out of 3 did manage to get to a couple of pending 
sectors, which went away when I ran badblocks and haven't reappeared.

>   - AOC-USAS-L8i Controller
>   - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
>   - Each drive has one maximum-sized partition.
>   - 8-drives are configured in a raid 6.
>
> My understanding is that with a raid 6, if a disk cannot return a given sector, it should still be possible to get what should have been returned from the first disk, from two other disks. My understanding is also that if this is successful, this should be written back to the disk that originally failed to read the given sector. I'm assuming that's what a message such as this indicates:
> Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8 sectors at 1647989048 on sde1)
>
> I was hoping to confirm my suspicion on the meaning of that message.

Yup.

> On occasion, I'll also see this:
> Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not correctable (sector 1647369400 on sdh1).
>
> This seems to involved the drive being kicked from the array, even though the drive is still readable for the most part (save for a few sectors).

The above indicates that a write failed. The drive should probably be 
replaced, though if you're seeing a lot of these I'd start suspecting 
cabling, drive chassis and/or SATA controller problems.

Hmm, is yours the SATA controller that doesn't like SMART commands? Or 
at least didn't in older kernels? Do you run smartd? Try without it for 
a bit... If that helps, look on Red Hat bugzilla and perhaps post a bug 
report.

> What exactly is the criteria for a disk being kicked out of an array?
>
> Furthermore, if an 8-disk raid 6 is running on the bare-minimum 6-disks, why on earth would it kick any more disks out? At this point, doesn't it makes sense to simply return an error to whatever tried to read from that part of the array instead of killing the array?

Because RAID isn't supposed to return bad data while bare drives are.

[...]
> Finally, why do the kernel messages that all say "raid5:" when it is clearly a raid 6?:

RAIDs 4, 5 and 6 are handled by the raid5 kernel module. Again I think 
the message has been changed in more recent kernels.

[...]
> Finally, I should mention that I have tried the smartctl erc commands:
> http://www.csc.liv.ac.uk/~greg/projects/erc/
>
> I could not pass them through the controller I was using, but was able to connect the drives to the controller on the motherboard, set the erc values, and still have drives dropping out.

Those settings don't stick across power cycles and presumably you 
powered the drives down to change which controller they were connected 
to, so your setting will have been lost.

Hope this helps.

Cheers,

John.

next prev parent reply	other threads:[~2010-10-06 20:24 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <904330941.660.1286340548064.JavaMail.root@mail.networkmayhem.com>
2010-10-06  5:51 ` Raid 6 - TLER/CCTL/ERC Peter Zieba
2010-10-06 11:57   ` Phil Turmel
2010-10-06 20:14   ` Richard Scobie
2010-10-06 20:24   ` John Robinson [this message]
2010-10-07  0:45   ` Michael Sallaway
     [not found] <30914146.21286374217265.JavaMail.SYSTEM@ninja>
2010-10-06 14:12 ` Lemur Kryptering
2010-10-06 21:22   ` Stefan /*St0fF*/ Hübner
     [not found] <6391773.361286404984328.JavaMail.SYSTEM@ninja>
2010-10-06 22:51 ` Lemur Kryptering
     [not found] <8469417.401286406060921.JavaMail.SYSTEM@ninja>
2010-10-06 23:11 ` Lemur Kryptering
2010-10-08  5:47   ` Stefan /*St0fF*/ Hübner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CACDB09.20808@anonymous.org.uk \
    --to=john.robinson@anonymous.org.uk \
    --cc=linux-raid@vger.kernel.org \
    --cc=pzieba@networkmayhem.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).