Raid 6 - TLER/CCTL/ERC

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Raid 6 - TLER/CCTL/ERC
       [not found] <904330941.660.1286340548064.JavaMail.root@mail.networkmayhem.com>
@ 2010-10-06  5:51 ` Peter Zieba
  2010-10-06 11:57   ` Phil Turmel
                     ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Peter Zieba @ 2010-10-06  5:51 UTC (permalink / raw)
  To: linux-raid

Hey all,

I have a question regarding Linux raid and degraded arrays.

My configuration involves:
 - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)
 - AOC-USAS-L8i Controller
 - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
 - Each drive has one maximum-sized partition.
 - 8-drives are configured in a raid 6.

My understanding is that with a raid 6, if a disk cannot return a given sector, it should still be possible to get what should have been returned from the first disk, from two other disks. My understanding is also that if this is successful, this should be written back to the disk that originally failed to read the given sector. I'm assuming that's what a message such as this indicates:
Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8 sectors at 1647989048 on sde1)

I was hoping to confirm my suspicion on the meaning of that message.

On occasion, I'll also see this:
Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not correctable (sector 1647369400 on sdh1).

This seems to involved the drive being kicked from the array, even though the drive is still readable for the most part (save for a few sectors).

What exactly is the criteria for a disk being kicked out of an array?

Furthermore, if an 8-disk raid 6 is running on the bare-minimum 6-disks, why on earth would it kick any more disks out? At this point, doesn't it makes sense to simply return an error to whatever tried to read from that part of the array instead of killing the array?

In other words, I would rather be able to read from a degraded raid-6 using something like dd with "conv=sync,noerror" (as I would be able to expect with a single disk with some bad sectors),
than have it kick out the last drive that it can possibly run on, and die completely. Is there a good reason for this behavior?

Finally, why do the kernel messages that all say "raid5:" when it is clearly a raid 6?:
<snip>
[root@doorstop log]# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]

md0 : active raid6 sdc1[8](F) sdf1[7] sde1[6] sdd1[5] sda1[3] sdb1[1]
      5860559616 blocks level 6, 64k chunk, algorithm 2 [8/5] [_U_U_UUU]

unused devices: <none>
</snip>

As for intimate details about the behavior of the drives themselves, I've noticed the following:
 - Over time, each disk develops a slowly increasing number of "Current_Pending_Sector" (ID 197).
 - The pending sector count returns to zero if a disk is removed from an array and filled with /dev/zero, or random data.
   - Interestingly, on some occasions, the pending sector count did not return to zero after wiping the partition i.e. /dev/sda1.
   - It did, however, return to zero when wiping the entire disk (/dev/sda)
   - I had a feeling that this was the result of the drive "reading ahead", into the small area of unusable space after the first partition, and before the end of the disk, and then making note of this in SMART, but not necessarily causing a noticeable problem, as the sector was never actually requested by the kernel.
   - I dd'd just that part of the drive, and the pending sectors went away in those cases
 - I have on rare occasion had these drives go completely bad before (i.e., there were non-zero values for either "Reallocated_Event_Count", "Reallocated_Sector_Ct", or "Offline_Uncorrectable" (#196, #5, #198, respectively), and the drive seemed unwilling to read any sectors. These were RMA'd.
 - As for the other drives, again, pending sectors do crop up, and always disappear when written to. I do not consider these drives bad. Flaky, sure. Slow to respond on error? Almost undoubtedly.

Finally, I should mention that I have tried the smartctl erc commands:
http://www.csc.liv.ac.uk/~greg/projects/erc/

I could not pass them through the controller I was using, but was able to connect the drives to the controller on the motherboard, set the erc values, and still have drives dropping out.

As a terrible band-aid, if I make sure to remove a drive when I see pending sectors, nuke it with random data (or /dev/zero), and resync the array, I get the drive pending sector count to return to zero and the array is happy. Once I have too many drives with pending sectors, however, a resync is almost guaranteed to fail, and I end up having to copy my data off and rebuild the array.

Instead of scripting the above (which sadly, I have done), is there any hope of saving the investment in disks? I have a feeling that this is simply something hitting a timeout, and likely causing problems for many more than just myself.

I greatly appreciate the time taken to read this, and any feedback provided.

Thank you,
Peter Zieba
312-285-3794

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 6 - TLER/CCTL/ERC
  2010-10-06  5:51 ` Raid 6 - TLER/CCTL/ERC Peter Zieba
@ 2010-10-06 11:57   ` Phil Turmel
  2010-10-06 20:14   ` Richard Scobie
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2010-10-06 11:57 UTC (permalink / raw)
  To: Peter Zieba; +Cc: linux-raid

On 10/06/2010 01:51 AM, Peter Zieba wrote:
> Hey all,
> 
> I have a question regarding Linux raid and degraded arrays.
> 
> My configuration involves:
>  - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)
>  - AOC-USAS-L8i Controller
>  - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
>  - Each drive has one maximum-sized partition.
>  - 8-drives are configured in a raid 6.
> 
> My understanding is that with a raid 6, if a disk cannot return a given sector, it should still be possible to get what should have been returned from the first disk, from two other disks. My understanding is also that if this is successful, this should be written back to the disk that originally failed to read the given sector. I'm assuming that's what a message such as this indicates:
> Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8 sectors at 1647989048 on sde1)
> 
> I was hoping to confirm my suspicion on the meaning of that message.
> 
> On occasion, I'll also see this:
> Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not correctable (sector 1647369400 on sdh1).
> 
> This seems to involved the drive being kicked from the array, even though the drive is still readable for the most part (save for a few sectors).

[snip /]

Hi Peter,

For read errors that aren't permanent (gone after writing to the affected sectors), a "repair" action is your friend.  I used to deal with occasional kicked-out drives in my arrays until I started running the following script in a weekly cron job:

#!/bin/bash
#
for x in /sys/block/md*/md/sync_action ; do
        echo repair >$x
done


HTH,

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 6 - TLER/CCTL/ERC
  2010-10-06  5:51 ` Raid 6 - TLER/CCTL/ERC Peter Zieba
  2010-10-06 11:57   ` Phil Turmel
@ 2010-10-06 20:14   ` Richard Scobie
  2010-10-06 20:24   ` John Robinson
  2010-10-07  0:45   ` Michael Sallaway
  3 siblings, 0 replies; 10+ messages in thread
From: Richard Scobie @ 2010-10-06 20:14 UTC (permalink / raw)
  To: 'Linux RAID'

Peter Zieba wrote:

>   - AOC-USAS-L8i Controller

> I could not pass them through the controller I was using, but was able to connect the drives to the controller on the motherboard, set the erc values, and still have drives dropping out.

This controller uses the LSI 1068 controller chip and up until kernel 
2.6.36, is likely to offline attached drives if smartctl or smartd is used.

If you update to this kernel or later , or apply the one line patch to 
the LSI driver in earlier ones, you will be able to safely use these 
monitoring utilities.

Patch as outined by the author on  the bug list and subsequently 
accepted by LSI:

"It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
a stricter alignment requirement.  If it does, sd does the right thing and
bounces misaligned buffers (see block/blk-map.c line 57).  The following
patch to 2.6.34-rc5 makes my symptoms go away.  I'm sure this is the wrong
place for this code, but it gets my idea across."

diff --git a/drivers/message/fusion/mptscsih.c 
b/drivers/message/fusion/mptscsih.c
index 6796597..1e034ad 100644
--- a/drivers/message/fusion/mptscsih.c
+++ b/drivers/message/fusion/mptscsih.c
@@ -2450,6 +2450,8 @@ mptscsih_slave_configure(struct scsi_device *sdev)
                 ioc->name,sdev->tagged_supported, sdev->simple_tags,
                 sdev->ordered_tags));

+       blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
+
         return 0;
  }

Regards,

Richard

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: Raid 6 - TLER/CCTL/ERC
  2010-10-06  5:51 ` Raid 6 - TLER/CCTL/ERC Peter Zieba
  2010-10-06 11:57   ` Phil Turmel
  2010-10-06 20:14   ` Richard Scobie
@ 2010-10-06 20:24   ` John Robinson
  2010-10-07  0:45   ` Michael Sallaway
  3 siblings, 0 replies; 10+ messages in thread
From: John Robinson @ 2010-10-06 20:24 UTC (permalink / raw)
  To: Peter Zieba; +Cc: linux-raid

On 06/10/2010 06:51, Peter Zieba wrote:
> Hey all,
>
> I have a question regarding Linux raid and degraded arrays.
>
> My configuration involves:
>   - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)

I have some of these drives too. I wouldn't go so far as to call them 
terrible, though 2 out of 3 did manage to get to a couple of pending 
sectors, which went away when I ran badblocks and haven't reappeared.

>   - AOC-USAS-L8i Controller
>   - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
>   - Each drive has one maximum-sized partition.
>   - 8-drives are configured in a raid 6.
>
> My understanding is that with a raid 6, if a disk cannot return a given sector, it should still be possible to get what should have been returned from the first disk, from two other disks. My understanding is also that if this is successful, this should be written back to the disk that originally failed to read the given sector. I'm assuming that's what a message such as this indicates:
> Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8 sectors at 1647989048 on sde1)
>
> I was hoping to confirm my suspicion on the meaning of that message.

Yup.

> On occasion, I'll also see this:
> Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not correctable (sector 1647369400 on sdh1).
>
> This seems to involved the drive being kicked from the array, even though the drive is still readable for the most part (save for a few sectors).

The above indicates that a write failed. The drive should probably be 
replaced, though if you're seeing a lot of these I'd start suspecting 
cabling, drive chassis and/or SATA controller problems.

Hmm, is yours the SATA controller that doesn't like SMART commands? Or 
at least didn't in older kernels? Do you run smartd? Try without it for 
a bit... If that helps, look on Red Hat bugzilla and perhaps post a bug 
report.

> What exactly is the criteria for a disk being kicked out of an array?
>
> Furthermore, if an 8-disk raid 6 is running on the bare-minimum 6-disks, why on earth would it kick any more disks out? At this point, doesn't it makes sense to simply return an error to whatever tried to read from that part of the array instead of killing the array?

Because RAID isn't supposed to return bad data while bare drives are.

[...]
> Finally, why do the kernel messages that all say "raid5:" when it is clearly a raid 6?:

RAIDs 4, 5 and 6 are handled by the raid5 kernel module. Again I think 
the message has been changed in more recent kernels.

[...]
> Finally, I should mention that I have tried the smartctl erc commands:
> http://www.csc.liv.ac.uk/~greg/projects/erc/
>
> I could not pass them through the controller I was using, but was able to connect the drives to the controller on the motherboard, set the erc values, and still have drives dropping out.

Those settings don't stick across power cycles and presumably you 
powered the drives down to change which controller they were connected 
to, so your setting will have been lost.

Hope this helps.

Cheers,

John.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 6 - TLER/CCTL/ERC
  2010-10-06  5:51 ` Raid 6 - TLER/CCTL/ERC Peter Zieba
                     ` (2 preceding siblings ...)
  2010-10-06 20:24   ` John Robinson
@ 2010-10-07  0:45   ` Michael Sallaway
  3 siblings, 0 replies; 10+ messages in thread
From: Michael Sallaway @ 2010-10-07  0:45 UTC (permalink / raw)
  To: Peter Zieba; +Cc: linux-raid

On 6/10/2010 3:51 PM, Peter Zieba wrote:
> I have a question regarding Linux raid and degraded arrays.
>
> My configuration involves:
>   - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)
>   - AOC-USAS-L8i Controller
>   - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
>   - Each drive has one maximum-sized partition.
>   - 8-drives are configured in a raid 6.
>
> My understanding is that with a raid 6, if a disk cannot return a given sector, it should still be possible to get what should have been returned from the first disk, from two other disks. My understanding is also that if this is successful, this should be written back to the disk that originally failed to read the given sector. I'm assuming that's what a message such as this indicates:
> Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8 sectors at 1647989048 on sde1)
>
> I was hoping to confirm my suspicion on the meaning of that message.
>
> On occasion, I'll also see this:
> Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not correctable (sector 1647369400 on sdh1).
>
> This seems to involved the drive being kicked from the array, even though the drive is still readable for the most part (save for a few sectors).

Hi Peter,

I've just been in the *exact* same situation recently, so I can probably 
answer some of your questions (only as another end-user, though!). I'm 
using similar samsung drives (the consumer 1.5TB drives), the 
AOC-USASLP-L8i, and ubuntu kernels.

First off, I don't think the LSI1068E really works properly in any 
non-recent kernel; I was using 2.6.32 (stock Ubuntu 10.04 kernel), and 
having all sorts of problems with the card (read errors, bus errors, 
timeouts, etc.). I ended up going back to my old controller for a while. 
However, I've recently changed kernel (to 2.6.35) for other reasons 
(described below), and now the card is working fine. So I'm not sure how 
different it will be in CentOS, but you may want to consider trying a 
newer kernel in case the card is causing problems.

As for the read errors/kicking drives from the array, I'm not sure why 
it gets kicked reading some sectors and not others, however I know there 
were changes to the md stuff which handled that more gracefully earlier 
this year. I had the same problem -- on my 2.6.32 kernel, a rebuild of 
one drive would hit a bad sector on another and drop the drive, then hit 
another bad sector on a different drive and drop it as well, making the 
array unusable. However, with a 2.6.35 kernel it recovers gracefully and 
keeps going with the rebuild. (I can't find the exact patch, but Neil 
had it in an earlier email to me on the list; maybe a month or two ago?) 
So again, I'd suggest trying a newer kernel if you're having trouble.

Mind you, this is only as another end-user, not a developer, so I'm sure 
I've probably got something wrong in all that. :-)  But that's what 
worked for me.

Hope that helps,
Michael

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <30914146.21286374217265.JavaMail.SYSTEM@ninja>]

* Re: Raid 6 - TLER/CCTL/ERC
       [not found] <30914146.21286374217265.JavaMail.SYSTEM@ninja>
@ 2010-10-06 14:12 ` Lemur Kryptering
  2010-10-06 21:22   ` Stefan /*St0fF*/ Hübner
  0 siblings, 1 reply; 10+ messages in thread
From: Lemur Kryptering @ 2010-10-06 14:12 UTC (permalink / raw)
  To: Phil Turmel; +Cc: linux-raid

I'll definitely give that a shot when I rebuild this thing.

In the meantime, is there anything that I can do to convince md not to kick the last disk (running on 6 out of 8 disks) when reading a bad spot? I've tried setting the array to read-only, but this didn't seem to help.

All I'm really trying to do is dd data off of it using "conv=sync,noerror". When it hits the unreadable spot, it simply kicks the drive from the array, leaving 4/8 disks active, taking down the array.

Again, I don't understand why md would take this action. It would make a lot more sense if it simply reported an IO error to whatever made the request.

Peter Zieba
312-285-3794

----- Original Message -----
From: "Phil Turmel" <philip@turmel.org>
To: "Peter Zieba" <pzieba@networkmayhem.com>
Cc: linux-raid@vger.kernel.org
Sent: Wednesday, October 6, 2010 6:57:58 AM GMT -06:00 US/Canada Central
Subject: Re: Raid 6 - TLER/CCTL/ERC

On 10/06/2010 01:51 AM, Peter Zieba wrote:
> Hey all,
> 
> I have a question regarding Linux raid and degraded arrays.
> 
> My configuration involves:
>  - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)
>  - AOC-USAS-L8i Controller
>  - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
>  - Each drive has one maximum-sized partition.
>  - 8-drives are configured in a raid 6.
> 
> My understanding is that with a raid 6, if a disk cannot return a given sector, it should still be possible to get what should have been returned from the first disk, from two other disks. My understanding is also that if this is successful, this should be written back to the disk that originally failed to read the given sector. I'm assuming that's what a message such as this indicates:
> Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8 sectors at 1647989048 on sde1)
> 
> I was hoping to confirm my suspicion on the meaning of that message.
> 
> On occasion, I'll also see this:
> Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not correctable (sector 1647369400 on sdh1).
> 
> This seems to involved the drive being kicked from the array, even though the drive is still readable for the most part (save for a few sectors).

[snip /]

Hi Peter,

For read errors that aren't permanent (gone after writing to the affected sectors), a "repair" action is your friend.  I used to deal with occasional kicked-out drives in my arrays until I started running the following script in a weekly cron job:

#!/bin/bash
#
for x in /sys/block/md*/md/sync_action ; do
        echo repair >$x
done

HTH,

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 6 - TLER/CCTL/ERC
  2010-10-06 14:12 ` Lemur Kryptering
@ 2010-10-06 21:22   ` Stefan /*St0fF*/ Hübner
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2010-10-06 21:22 UTC (permalink / raw)
  To: Lemur Kryptering, Linux RAID, philip

Hi,

it has been discussed many times before on the list ...

Am 06.10.2010 16:12, schrieb Lemur Kryptering:
> I'll definitely give that a shot when I rebuild this thing.
> 
> In the meantime, is there anything that I can do to convince md not to kick the last disk (running on 6 out of 8 disks) when reading a bad spot? I've tried setting the array to read-only, but this didn't seem to help.

You can set the ERC values of your drives.  Then they'll stop processing
their internal error recovery procedure after the timeout and continue
to react.  Without ERC-timeout, the drive tries to correct the error on
its own (not reacting on any requests), mdraid assumes an error after a
while and tries to rewrite the "missing" sector (assembled from the
other disks).  But the drive will still not react to the write request
as it is still doing its internal recovery procedure.  Now mdraid
assumes the disk to be bad and kicks it.

There's nothing you can do about this viscious circle except either
enabling ERC or using Raid-Edition disk (which have ERC enabled by default).

Stefan
> 
> All I'm really trying to do is dd data off of it using "conv=sync,noerror". When it hits the unreadable spot, it simply kicks the drive from the array, leaving 4/8 disks active, taking down the array.
> 
> Again, I don't understand why md would take this action. It would make a lot more sense if it simply reported an IO error to whatever made the request.
> 
> Peter Zieba
> 312-285-3794
> 
> ----- Original Message -----
> From: "Phil Turmel" <philip@turmel.org>
> To: "Peter Zieba" <pzieba@networkmayhem.com>
> Cc: linux-raid@vger.kernel.org
> Sent: Wednesday, October 6, 2010 6:57:58 AM GMT -06:00 US/Canada Central
> Subject: Re: Raid 6 - TLER/CCTL/ERC
> 
> On 10/06/2010 01:51 AM, Peter Zieba wrote:
>> Hey all,
>>
>> I have a question regarding Linux raid and degraded arrays.
>>
>> My configuration involves:
>>  - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)
>>  - AOC-USAS-L8i Controller
>>  - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
>>  - Each drive has one maximum-sized partition.
>>  - 8-drives are configured in a raid 6.
>>
>> My understanding is that with a raid 6, if a disk cannot return a given sector, it should still be possible to get what should have been returned from the first disk, from two other disks. My understanding is also that if this is successful, this should be written back to the disk that originally failed to read the given sector. I'm assuming that's what a message such as this indicates:
>> Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8 sectors at 1647989048 on sde1)
>>
>> I was hoping to confirm my suspicion on the meaning of that message.
>>
>> On occasion, I'll also see this:
>> Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not correctable (sector 1647369400 on sdh1).
>>
>> This seems to involved the drive being kicked from the array, even though the drive is still readable for the most part (save for a few sectors).
> 
> [snip /]
> 
> Hi Peter,
> 
> For read errors that aren't permanent (gone after writing to the affected sectors), a "repair" action is your friend.  I used to deal with occasional kicked-out drives in my arrays until I started running the following script in a weekly cron job:
> 
> #!/bin/bash
> #
> for x in /sys/block/md*/md/sync_action ; do
>         echo repair >$x
> done
> 
> 
> HTH,
> 
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <6391773.361286404984328.JavaMail.SYSTEM@ninja>]

* Re: Raid 6 - TLER/CCTL/ERC
       [not found] <6391773.361286404984328.JavaMail.SYSTEM@ninja>
@ 2010-10-06 22:51 ` Lemur Kryptering
  0 siblings, 0 replies; 10+ messages in thread
From: Lemur Kryptering @ 2010-10-06 22:51 UTC (permalink / raw)
  To: John Robinson; +Cc: linux-raid

----- "John Robinson" <john.robinson@anonymous.org.uk> wrote:

> On 06/10/2010 06:51, Peter Zieba wrote:
> > Hey all,
> >
> > I have a question regarding Linux raid and degraded arrays.
> >
> > My configuration involves:
> >   - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)
> 
> I have some of these drives too. I wouldn't go so far as to call them
> 
> terrible, though 2 out of 3 did manage to get to a couple of pending 
> sectors, which went away when I ran badblocks and haven't reappeared.
> 

Someone else suggested I echo "repair" into "sync_action" inside of sys on a weekly basis. I know CentOS has something like this already a similar cron job somewhere in there already. I will take a closer look at this.

> >   - AOC-USAS-L8i Controller
> >   - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
> >   - Each drive has one maximum-sized partition.
> >   - 8-drives are configured in a raid 6.
> >
> > My understanding is that with a raid 6, if a disk cannot return a
> given sector, it should still be possible to get what should have been
> returned from the first disk, from two other disks. My understanding
> is also that if this is successful, this should be written back to the
> disk that originally failed to read the given sector. I'm assuming
> that's what a message such as this indicates:
> > Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8
> sectors at 1647989048 on sde1)
> >
> > I was hoping to confirm my suspicion on the meaning of that
> message.
> 
> Yup.

Thanks! It's a simple message but I wanted to make sure I got the meaning right. I appreciate it.

> 
> > On occasion, I'll also see this:
> > Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not
> correctable (sector 1647369400 on sdh1).
> >
> > This seems to involved the drive being kicked from the array, even
> though the drive is still readable for the most part (save for a few
> sectors).
> 
> The above indicates that a write failed. The drive should probably be
> 
> replaced, though if you're seeing a lot of these I'd start suspecting
> 
> cabling, drive chassis and/or SATA controller problems.
> 
> Hmm, is yours the SATA controller that doesn't like SMART commands? Or
> 
> at least didn't in older kernels? Do you run smartd? Try without it
> for 
> a bit... If that helps, look on Red Hat bugzilla and perhaps post a
> bug 
> report.
> 

Yes, it does seem that my controller is indeed the one that has the smart issues. I'm fairly certain that I'm not actually experiencing any of the smart-related issues, however, as I've had the exact same problems cropping up while the disks were connected to the motherboard. It seems that this particular problem is exacerbated by running smart commands excessively (which I can do without seeing these errors). I will be looking into this a bit deeper to make sure, however.


> > What exactly is the criteria for a disk being kicked out of an
> array?
> >
> > Furthermore, if an 8-disk raid 6 is running on the bare-minimum
> 6-disks, why on earth would it kick any more disks out? At this point,
> doesn't it makes sense to simply return an error to whatever tried to
> read from that part of the array instead of killing the array?
> 
> Because RAID isn't supposed to return bad data while bare drives are.
> 

If it has no choice, however, it seems like this behavior would be preferable to dieing completely:
It could mean the difference between one file being being inaccessible, and an entire machine going down. I'm starting to wonder what it would take to change this functionality...

> [...]
> > Finally, why do the kernel messages that all say "raid5:" when it is
> clearly a raid 6?:
> 
> RAIDs 4, 5 and 6 are handled by the raid5 kernel module. Again I think
> 
> the message has been changed in more recent kernels.
> 

Thanks! I figured it was something simple like that, but feel better knowing for sure.

> [...]
> > Finally, I should mention that I have tried the smartctl erc
> commands:
> > http://www.csc.liv.ac.uk/~greg/projects/erc/
> >
> > I could not pass them through the controller I was using, but was
> able to connect the drives to the controller on the motherboard, set
> the erc values, and still have drives dropping out.
> 
> Those settings don't stick across power cycles and presumably you 
> powered the drives down to change which controller they were connected
> 
> to, so your setting will have been lost.

I'm aware the values don't stick across a power cycle. I had the array running off of the motherboard.

> 
> Hope this helps.
> 
> Cheers,
> 
> John.

Thanks! I appreciate your feedback!

Peter Zieba
312-285-3794

^ permalink raw reply	[flat|nested] 10+ messages in thread

[parent not found: <8469417.401286406060921.JavaMail.SYSTEM@ninja>]

* Re: Raid 6 - TLER/CCTL/ERC
       [not found] <8469417.401286406060921.JavaMail.SYSTEM@ninja>
@ 2010-10-06 23:11 ` Lemur Kryptering
  2010-10-08  5:47   ` Stefan /*St0fF*/ Hübner
  0 siblings, 1 reply; 10+ messages in thread
From: Lemur Kryptering @ 2010-10-06 23:11 UTC (permalink / raw)
  To: stefan huebner; +Cc: Linux RAID, philip


----- "Stefan /*St0fF*/ Hübner" <stefan.huebner@stud.tu-ilmenau.de> wrote:

> Hi,
> 
> it has been discussed many times before on the list ...

My apologies. I browsed a little into the past, but obviously not far enough.

> 
> Am 06.10.2010 16:12, schrieb Lemur Kryptering:
> > I'll definitely give that a shot when I rebuild this thing.
> > 
> > In the meantime, is there anything that I can do to convince md not
> to kick the last disk (running on 6 out of 8 disks) when reading a bad
> spot? I've tried setting the array to read-only, but this didn't seem
> to help.
> 
> You can set the ERC values of your drives.  Then they'll stop
> processing
> their internal error recovery procedure after the timeout and
> continue
> to react.  Without ERC-timeout, the drive tries to correct the error
> on
> its own (not reacting on any requests), mdraid assumes an error after
> a
> while and tries to rewrite the "missing" sector (assembled from the
> other disks).  But the drive will still not react to the write
> request
> as it is still doing its internal recovery procedure.  Now mdraid
> assumes the disk to be bad and kicks it.

That sounds exactly like what I'm seeing in the logs -- the sector initially reported as bad is indeed unreadable via dd. All of the subsequent problems reported in other sectors aren't actually problems when I check on them at a later point. Couldn't this be worked around by exposing whatever timeouts there are in mdraid to something that could be adjusted in /sys?

> 
> There's nothing you can do about this viscious circle except either
> enabling ERC or using Raid-Edition disk (which have ERC enabled by
> default).
> 

I tried connecting the drives directly to my motherboard (my controller didn't seem to want to let me pass the smart commands ERC commands to the drives). The ERC commands took, in so far as I was able to read them back with what I set them to. This didn't seem to help much with the issues I was having, however.

Lesson-learned on the non-raid edition disks. I would have spent the extra to avoid all this headache, but am now stuck with these things. I realize that not fixing the problem at the core (the drives themselves), essentially puts the burden on mdraid (which would be forced to block for a ridiculous amount of time waiting for the drive instead of just kicking it), however, in my particular case, this sort of delay would not be a cause for concern.

Would someone be able to nudge me in the right direction as far as where the logic that handles this is located?

> Stefan
> > 
> > All I'm really trying to do is dd data off of it using
> "conv=sync,noerror". When it hits the unreadable spot, it simply kicks
> the drive from the array, leaving 4/8 disks active, taking down the
> array.
> > 
> > Again, I don't understand why md would take this action. It would
> make a lot more sense if it simply reported an IO error to whatever
> made the request.
> > 
> > Peter Zieba
> > 312-285-3794
> > 
> > ----- Original Message -----
> > From: "Phil Turmel" <philip@turmel.org>
> > To: "Peter Zieba" <pzieba@networkmayhem.com>
> > Cc: linux-raid@vger.kernel.org
> > Sent: Wednesday, October 6, 2010 6:57:58 AM GMT -06:00 US/Canada
> Central
> > Subject: Re: Raid 6 - TLER/CCTL/ERC
> > 
> > On 10/06/2010 01:51 AM, Peter Zieba wrote:
> >> Hey all,
> >>
> >> I have a question regarding Linux raid and degraded arrays.
> >>
> >> My configuration involves:
> >>  - 8x Samsung HD103UJ 1TB drives (terrible consumer-grade)
> >>  - AOC-USAS-L8i Controller
> >>  - CentOS 5.5 2.6.18-194.11.1.el5xen (64-bit)
> >>  - Each drive has one maximum-sized partition.
> >>  - 8-drives are configured in a raid 6.
> >>
> >> My understanding is that with a raid 6, if a disk cannot return a
> given sector, it should still be possible to get what should have been
> returned from the first disk, from two other disks. My understanding
> is also that if this is successful, this should be written back to the
> disk that originally failed to read the given sector. I'm assuming
> that's what a message such as this indicates:
> >> Sep 17 04:01:12 doorstop kernel: raid5:md0: read error corrected (8
> sectors at 1647989048 on sde1)
> >>
> >> I was hoping to confirm my suspicion on the meaning of that
> message.
> >>
> >> On occasion, I'll also see this:
> >> Oct  1 01:50:53 doorstop kernel: raid5:md0: read error not
> correctable (sector 1647369400 on sdh1).
> >>
> >> This seems to involved the drive being kicked from the array, even
> though the drive is still readable for the most part (save for a few
> sectors).
> > 
> > [snip /]
> > 
> > Hi Peter,
> > 
> > For read errors that aren't permanent (gone after writing to the
> affected sectors), a "repair" action is your friend.  I used to deal
> with occasional kicked-out drives in my arrays until I started running
> the following script in a weekly cron job:
> > 
> > #!/bin/bash
> > #
> > for x in /sys/block/md*/md/sync_action ; do
> >         echo repair >$x
> > done
> > 
> > 
> > HTH,
> > 
> > Phil
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Raid 6 - TLER/CCTL/ERC
  2010-10-06 23:11 ` Lemur Kryptering
@ 2010-10-08  5:47   ` Stefan /*St0fF*/ Hübner
  0 siblings, 0 replies; 10+ messages in thread
From: Stefan /*St0fF*/ Hübner @ 2010-10-08  5:47 UTC (permalink / raw)
  To: Lemur Kryptering; +Cc: Linux RAID, philip

Am 07.10.2010 01:11, schrieb Lemur Kryptering:
> 
> [...]
> 
> That sounds exactly like what I'm seeing in the logs -- the sector initially reported as bad is indeed unreadable via dd. All of the subsequent problems reported in other sectors aren't actually problems when I check on them at a later point. Couldn't this be worked around by exposing whatever timeouts there are in mdraid to something that could be adjusted in /sys?
> 
>>
>> There's nothing you can do about this viscious circle except either
>> enabling ERC or using Raid-Edition disk (which have ERC enabled by
>> default).

I must say yesterday we had our first Hitachi UltraStar Drives - which
are supposed to be Raid-Edition.  They didn't have ERC enabled.  I'll
inquire Hitachi about that today.
>>
> 
> I tried connecting the drives directly to my motherboard (my controller didn't seem to want to let me pass the smart commands ERC commands to the drives). The ERC commands took, in so far as I was able to read them back with what I set them to. This didn't seem to help much with the issues I was having, however.

Which wouldn't work, as the SCT ERC settings are volatile.  I.e.:
they're gone after a power cycle.

> 
> Lesson-learned on the non-raid edition disks. I would have spent the extra to avoid all this headache, but am now stuck with these things. I realize that not fixing the problem at the core (the drives themselves), essentially puts the burden on mdraid (which would be forced to block for a ridiculous amount of time waiting for the drive instead of just kicking it), however, in my particular case, this sort of delay would not be a cause for concern.
> 
> Would someone be able to nudge me in the right direction as far as where the logic that handles this is located?
> 
>> [...]
>>> #!/bin/bash
>>> #
>>> for x in /sys/block/md*/md/sync_action ; do
>>>         echo repair >$x
>>> done
>>>[...]

That is probably the only thing you can try.  As this does indeed try to
reconstruct the sector from the redundancy.  But I'd try it with ERC
enabled.  Maybe you find a way where this works.  (i.e. move the whole
Raid to the other computer...)

Stefan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-10-08  5:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <904330941.660.1286340548064.JavaMail.root@mail.networkmayhem.com>
2010-10-06  5:51 ` Raid 6 - TLER/CCTL/ERC Peter Zieba
2010-10-06 11:57   ` Phil Turmel
2010-10-06 20:14   ` Richard Scobie
2010-10-06 20:24   ` John Robinson
2010-10-07  0:45   ` Michael Sallaway
     [not found] <30914146.21286374217265.JavaMail.SYSTEM@ninja>
2010-10-06 14:12 ` Lemur Kryptering
2010-10-06 21:22   ` Stefan /*St0fF*/ Hübner
     [not found] <6391773.361286404984328.JavaMail.SYSTEM@ninja>
2010-10-06 22:51 ` Lemur Kryptering
     [not found] <8469417.401286406060921.JavaMail.SYSTEM@ninja>
2010-10-06 23:11 ` Lemur Kryptering
2010-10-08  5:47   ` Stefan /*St0fF*/ Hübner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).