Advice requested re: hard drive setup for RAID arrays

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Advice requested re: hard drive setup for RAID arrays
@ 2015-11-04 12:02 o1bigtenor
  2015-11-04 13:13 ` Phil Turmel
  0 siblings, 1 reply; 10+ messages in thread
From: o1bigtenor @ 2015-11-04 12:02 UTC (permalink / raw)
  To: Brad Campbell; +Cc: Phil Turmel, Linux-RAID

On Tue, Nov 3, 2015 at 10:31 PM, Brad Campbell
<lists2009@fnarfbargle.com> wrote:
> On 04/11/15 12:05, o1bigtenor wrote:
>>
>> On Tue, Nov 3, 2015 at 10:01 PM, o1bigtenor <o1bigtenor@gmail.com> wrote:
>>>
>>>
>>>
>>>
>>> On Tue, Nov 3, 2015 at 10:08 AM, Phil Turmel <philip@turmel.org> wrote:
>>>>
>>>> One caveat -- don't do this part until you've corrected your timeout
>>>> mismatch
>>>> problem, or any latent UREs will break your array again.
>>>
>>>
>>>
>>>
>>> Read through the references.
>>>
>>> How do I do what you suggest?
>
>
> Here's how I do it. This script is run on every bootup.
>
> It iterates through all the drives and uses smartctl to try and set erc
> timeouts. If that fails it assumes the drive does not support it and it sets
> the timeout value to 180 seconds.
>
> #!/bin/bash
> for i in /dev/sd? ; do
>         if smartctl -l scterc,70,70 $i > /dev/null ; then
>                 echo -n $i " is good "
>         else
>                 echo 180 > /sys/block/${i/\/dev\/}/device/timeout
>                 echo -n $i " is  bad "
>         fi;
>         smartctl -i $i | egrep "(Device Model|Product:)"
>         blockdev --setra 1024 $i
> done
>
> I have a mix of 15k SAS drives, WD green & red and some left over bits and
> pieces. This ensures the timeouts all match the drives capability.
>
ran the script

root@debianbase:/# !/bin/bash
bash: !/bin/bash: event not found
root@debianbase:/# for i in /dev/sd? ; do
>         if smartctl -l scterc,70,70 $i > /dev/null ; then
>                 echo -n $i " is good "
>         else
>                 echo 180 > /sys/block/${i/\/dev\/}/device/timeout
>                 echo -n $i " is  bad "
>         fi;
>         smartctl -i $i | egrep "(Device Model|Product:)"
>         blockdev --setra 1024 $i
> done
/dev/sda  is  bad Device Model:     ST1000DM003-1ER162
/dev/sdb  is good Device Model:     ST31000524AS
/dev/sdc  is  bad Device Model:     ST1000DM003-1ER162
/dev/sdd  is  bad Device Model:     Corsair Force 3 SSD
/dev/sde  is good Device Model:     ST31000524AS
/dev/sdf  is good Device Model:     ST31000524AS
/dev/sdg  is  bad /dev/sdh  is  bad root@debianbase:/#


As sdh is supposed to be a NAS drive I'm now confused.

Is there anything that can be done to the drives already owned?

How does one find applicable hard drives?
Only buy Enterprise class drives?



> Regards,
> Brad

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 12:02 Advice requested re: hard drive setup for RAID arrays o1bigtenor
@ 2015-11-04 13:13 ` Phil Turmel
  2015-11-04 14:30   ` o1bigtenor
  2015-11-04 19:36   ` Edward Kuns
  0 siblings, 2 replies; 10+ messages in thread
From: Phil Turmel @ 2015-11-04 13:13 UTC (permalink / raw)
  To: o1bigtenor, Brad Campbell; +Cc: Linux-RAID

Good morning Dee, Brad,

On 11/04/2015 07:02 AM, o1bigtenor wrote:
> On Tue, Nov 3, 2015 at 10:31 PM, Brad Campbell

>> Here's how I do it. This script is run on every bootup.

A few notes here for Dee:

Running this script (or something similar) needs to be automatic.  In
older systems, that means including it in /etc/rc.local.  That file is
deprecated in some modern systems, and alternates vary by distro.  I
don't know what you should use in Debian 8.  {It still exists and works
in Ubuntu Server 14.04.}

>> It iterates through all the drives and uses smartctl to try and set erc
>> timeouts. If that fails it assumes the drive does not support it and it sets
>> the timeout value to 180 seconds.
>>
>> #!/bin/bash
>> for i in /dev/sd? ; do

This iterates through all sata drives, whether raid or not.

>>         if smartctl -l scterc,70,70 $i > /dev/null ; then
>>                 echo -n $i " is good "

"Good" clearly means the device has ERC support and the default timeout
is OK.

>>         else
>>                 echo 180 > /sys/block/${i/\/dev\/}/device/timeout
>>                 echo -n $i " is  bad "

"Bad" means it doesn't support ERC, so the timeout is set to the
work-around 180 seconds.  That's the best you can do for such drives.

>>         fi;
>>         smartctl -i $i | egrep "(Device Model|Product:)"

Your output was scrambled a bit at the end because a couple devices
didn't report model or product, which Brad relied on for a end-of-line
character.

>>         blockdev --setra 1024 $i
>> done
>>
>> I have a mix of 15k SAS drives, WD green & red and some left over bits and
>> pieces. This ensures the timeouts all match the drives capability.

Looks pretty good to me.

> ran the script

> /dev/sda  is  bad Device Model:     ST1000DM003-1ER162
> /dev/sdb  is good Device Model:     ST31000524AS
> /dev/sdc  is  bad Device Model:     ST1000DM003-1ER162
> /dev/sdd  is  bad Device Model:     Corsair Force 3 SSD
> /dev/sde  is good Device Model:     ST31000524AS
> /dev/sdf  is good Device Model:     ST31000524AS
> /dev/sdg  is  bad /dev/sdh  is  bad

> As sdh is supposed to be a NAS drive I'm now confused.

The script doesn't care what the drives are used for -- it just picked
out all that start with 'sd'.

> Is there anything that can be done to the drives already owned?

Already done by the script.  Not ideal, but not catastrophic.

> How does one find applicable hard drives?
> Only buy Enterprise class drives?

{ This was in your reading assignments.  You may need to re-read them. I
suggest you subscribe to this list -- let the normal flow of questions
and answers help teach you the concepts underneath all of this advice.
Anyways, }

Buy drives that clearly indicate 'raid' or 'enterprise' support, or have
OEM datasheets that explicitly show ERC support.

My latest purchases have been Western Digital "Red" drives.  They are
marketed to home & small business NAS applications.  I'm sure other
brands are now targeting that market, too.

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 13:13 ` Phil Turmel
@ 2015-11-04 14:30   ` o1bigtenor
  2015-11-04 15:05     ` Phil Turmel
  2015-11-05  6:02     ` Brad Campbell
  2015-11-04 19:36   ` Edward Kuns
  1 sibling, 2 replies; 10+ messages in thread
From: o1bigtenor @ 2015-11-04 14:30 UTC (permalink / raw)
  To: Phil Turmel; +Cc: Brad Campbell, Linux-RAID

Good morning Phil and whomever

On Wed, Nov 4, 2015 at 7:13 AM, Phil Turmel <philip@turmel.org> wrote:
> Good morning Dee, Brad,
>
> On 11/04/2015 07:02 AM, o1bigtenor wrote:
>> On Tue, Nov 3, 2015 at 10:31 PM, Brad Campbell
>
>>> Here's how I do it. This script is run on every bootup.
>
> A few notes here for Dee:
>
> Running this script (or something similar) needs to be automatic.  In
> older systems, that means including it in /etc/rc.local.  That file is
> deprecated in some modern systems, and alternates vary by distro.  I
> don't know what you should use in Debian 8.  {It still exists and works
> in Ubuntu Server 14.04.}

I used to have to mount my raid array on every reboot so I can handle running
the script every reboot. They happen at most weekly and for sure biweekly
because Firefox doesn't know how to use AND release RAM.

>
>>> It iterates through all the drives and uses smartctl to try and set erc
>>> timeouts. If that fails it assumes the drive does not support it and it sets
>>> the timeout value to 180 seconds.
>>>
>>> #!/bin/bash
>>> for i in /dev/sd? ; do
>
> This iterates through all sata drives, whether raid or not.
>
>>>         if smartctl -l scterc,70,70 $i > /dev/null ; then
>>>                 echo -n $i " is good "
>
> "Good" clearly means the device has ERC support and the default timeout
> is OK.
>
>>>         else
>>>                 echo 180 > /sys/block/${i/\/dev\/}/device/timeout
>>>                 echo -n $i " is  bad "
>
> "Bad" means it doesn't support ERC, so the timeout is set to the
> work-around 180 seconds.  That's the best you can do for such drives.
>
>>>         fi;
>>>         smartctl -i $i | egrep "(Device Model|Product:)"
>
> Your output was scrambled a bit at the end because a couple devices
> didn't report model or product, which Brad relied on for a end-of-line
> character.

Those 2 drives are in a separate USB connected tray.
When I ran the smartctl* command that you had me run to check the status
and configuration of the drives it borked on that drive telling me that it was
connected using USB and I needed to add some other command.
>
>>>         blockdev --setra 1024 $i
>>> done
>>>
>>> I have a mix of 15k SAS drives, WD green & red and some left over bits and
>>> pieces. This ensures the timeouts all match the drives capability.
>
> Looks pretty good to me.

Those are the drives Brad uses.
>
>> ran the script
>
>> /dev/sda  is  bad Device Model:     ST1000DM003-1ER162
>> /dev/sdb  is good Device Model:     ST31000524AS
>> /dev/sdc  is  bad Device Model:     ST1000DM003-1ER162
>> /dev/sdd  is  bad Device Model:     Corsair Force 3 SSD
>> /dev/sde  is good Device Model:     ST31000524AS
>> /dev/sdf  is good Device Model:     ST31000524AS
>> /dev/sdg  is  bad /dev/sdh  is  bad
>
>> As sdh is supposed to be a NAS drive I'm now confused.
>
> The script doesn't care what the drives are used for -- it just picked
> out all that start with 'sd'.
>
>> Is there anything that can be done to the drives already owned?
>
> Already done by the script.  Not ideal, but not catastrophic.
>
>> How does one find applicable hard drives?
>> Only buy Enterprise class drives?
>
> { This was in your reading assignments.  You may need to re-read them. I
> suggest you subscribe to this list -- let the normal flow of questions
> and answers help teach you the concepts underneath all of this advice.
> Anyways, }

I read the files listed.
There wasn't anything specific as to what to look for.

There is a lot of traffic which isn't really of interest as I hope that I don't
have to develop kernel skills. Was hoping to just occasionally browse
something that would be of interest.

Have looked for recommendations as to what is a good layout for an
array and have found little guidance. At present I am thinking that Raid
10 on 4 drives offers me the best security (if the drives don't act stupid)
for the amount of storage I see needing.

Any suggestions for which level of raid to use for ???    ?

>
> Buy drives that clearly indicate 'raid' or 'enterprise' support, or have
> OEM datasheets that explicitly show ERC support.


>
> My latest purchases have been Western Digital "Red" drives.  They are
> marketed to home & small business NAS applications.  I'm sure other
> brands are now targeting that market, too.

The last drives I bought, for use in a NAS box, are WD Red drives.

sdh is one of those drives - - - that's why I was asking if I had to buy
Enterprise drives. The tariff there is substantial - - - maybe your group can
come up with something to return to the idea RA Inexpensive drives.

Dee

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 14:30   ` o1bigtenor
@ 2015-11-04 15:05     ` Phil Turmel
       [not found]       ` <CAPpdf5_-3TOiCKq_dDTYGPcJMeEDMRD+xTAjkm-enmCnZPdtzg@mail.gmail.com>
  2015-11-05  6:02     ` Brad Campbell
  1 sibling, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2015-11-04 15:05 UTC (permalink / raw)
  To: o1bigtenor; +Cc: Brad Campbell, Linux-RAID

On 11/04/2015 09:30 AM, o1bigtenor wrote:

> I used to have to mount my raid array on every reboot so I can handle running
> the script every reboot. They happen at most weekly and for sure biweekly
> because Firefox doesn't know how to use AND release RAM.

Put an entry in fstab to mount your array at boot.  Put Brad's script in
/etc/rc.local.  Try "man fstab".

You don't need to reboot to clean up after Firefox.  At most, logout and
re-login will do.  Or just "killall firefox".

> Those are the drives Brad uses.

No. The script is generic.  It attempts to fix anyone's drives.  Clearly
falls down on your USB drives.  Oh, well.  Specify the drives you want
instead of using a wildcard.

> I read the files listed.
> There wasn't anything specific as to what to look for.

The following was in your reading list.  The options A-D at the end
couldn't be more clear.

http://marc.info/?l=linux-raid&m=135811522817345&w=1

> There is a lot of traffic which isn't really of interest as I hope that I don't
> have to develop kernel skills. Was hoping to just occasionally browse
> something that would be of interest.

This list is mostly administration skills.  You need those if you are
going to be running raid arrays.  I'm sorry if *reading* inconveniences
you, but the whining won't change anything.  Except to make me less
enthusiastic about helping you.

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
       [not found]       ` <CAPpdf5_-3TOiCKq_dDTYGPcJMeEDMRD+xTAjkm-enmCnZPdtzg@mail.gmail.com>
@ 2015-11-04 16:43         ` Phil Turmel
  2015-11-04 17:27           ` Rudy Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Phil Turmel @ 2015-11-04 16:43 UTC (permalink / raw)
  To: o1bigtenor, Linux-RAID

On 11/04/2015 10:43 AM, o1bigtenor wrote:
> Greetings
> 
> I'd like to make this response a to all but your comments are directed to me
> in particular and therefore don't really apply to 'all'.

You didn't reply-to-all -- I've added the list back.  It is important
that these conversations occur in public.

[trim /]

> I'm sorry - - - my computer is a tool, my computers are tools (I have more
> than one and am looking at setting up quite a few more to organize and
> manage any one of my 4 businesses.
> 
> Reading does not inconvenience me but I fail to see why I need to become
> an expert in every tool used by my business to use that tool. That is not
> normally part of the system nor my goal. Most of the businessmen that
> I have met know far less about their systems than even I do yet you are
> suggesting that I'm 'whining'.
> 
> Your phrasing  is suggesting that I am 'less than responsible' - - - rather
> I think that you have made computers to be a major item in your day and
> I would prefer that computers are tools. If tools do not work then I change them
> as there just aren't enough hours in the day as it is with what I have to do.
> 
> I do appreciate your assistance but if you have a hard time agreeing to disagree
> about the extent of knowledge needed to use the computers I have collected
> to run my business then I would rather not ask any more questions here.
> 
> Trusting that that will be acceptable to you - - -

No, it's not acceptable.

I too am a business owner -- over a decade now.  As an engineer and
consultant, I have and use many tools, both physical and virtual.  The
ones I use myself, I learn enough to use to the professional standards
needed.  Where I cannot, I pay for others to do the necessary work, or I
pay to switch tools.

I've benefited from the fact that linux is free, and provides many free
tools, and benefited from the many ways to get free advice and
assistance for it.  I take time away from my own business to offer free
advice and assistance on this list as a volunteer, as partial payback
for what I've received.

I tolerate a certain amount of neediness in newcomers, as skill varies
greatly.  But when my advice is repeatedly rejected and denigrated while
I'm trying to save someone's digital life, I have to draw a line.  I'm
no longer surprised your off-list assistance faded away.  Linux
distributors like Canonical and Red Hat offer paid tech support for
people with your requirements.

You've implied that your business has been hostage to this crisis for
the past two months.  If that's not irresponsible, I don't know what is.

Good Day,

Phil Turmel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 16:43         ` Phil Turmel
@ 2015-11-04 17:27           ` Rudy Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: Rudy Zijlstra @ 2015-11-04 17:27 UTC (permalink / raw)
  To: Phil Turmel, o1bigtenor, Linux-RAID



On 04/11/15 17:43, Phil Turmel wrote:
> You've implied that your business has been hostage to this crisis for 
> the past two months. If that's not irresponsible, I don't know what is. 

+1

Cheers


Rudy (another consultant)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 13:13 ` Phil Turmel
  2015-11-04 14:30   ` o1bigtenor
@ 2015-11-04 19:36   ` Edward Kuns
  2015-11-04 19:42     ` Wols Lists
  2015-11-04 20:09     ` Phil Turmel
  1 sibling, 2 replies; 10+ messages in thread
From: Edward Kuns @ 2015-11-04 19:36 UTC (permalink / raw)
  To: Phil Turmel; +Cc: o1bigtenor, Brad Campbell, Linux-RAID

On Wed, Nov 4, 2015 at 7:13 AM, Phil Turmel <philip@turmel.org> wrote:
> >>         if smartctl -l scterc,70,70 $i > /dev/null ; then
> >>                 echo -n $i " is good "
>
> "Good" clearly means the device has ERC support and the default timeout
> is OK.

To be technical, it means that smartctl was able to *set* the two
timeouts to 70 deciseconds aka 7.0 seconds.  Rather than query and
check the setting, that script just forces the setting and detects
whether or not it was set successfully.  I get this:

/dev/sda  is good Device Model:     Samsung SSD 840 PRO Series
/dev/sdb  is  bad Device Model:     SAMSUNG SSD 830 Series
/dev/sdc  is good Device Model:     HGST HDN724040ALE640
/dev/sdd  is good Device Model:     HGST HDN724040ALE640

From looking at smartctl information from before doing this, on all 3
of my "good" drives the feature was disabled initially.  Ouch.  That
explains so much.  Now I understand why one specific drive (no longer
in my system) would sometimes fall out of the array even though it
wasn't bad.  I now have this script in rc.local, still supported in my
Fedora version.  I checked.

> "Bad" means it doesn't support ERC, so the timeout is set to the
> work-around 180 seconds.  That's the best you can do for such drives.

Is there a reasonable way of finding out if a shorter setting is
appropriate for any specific drive?  Or would you say in general it's
not worth the effort of trying to find out?  Would you expect this
behavior to be any different for an SSD?

On computers being a tool.... I choose to look at it this way:  My car
is a tool.  It's on me to make sure I understand what maintenance is
required to keep it functioning properly if I care about uptime.
Linux distributions could maybe do a better job here for the
uninitiated, when you configure MD at install time, maybe have a
couple pointers on the install screens to let you know there are
certain things you really must do, as is discussed here regularly.
It's so easy to install modern Linux distributions that it's really
pretty easy to not realize that you're skipping *mandatory*
maintenance.

I experienced a drive failure some weeks ago and got very lucky.  I've
been watching this list since then and have learned of some mandatory
maintenance I wasn't doing.  I'm correcting that error, step by step.
:)

                  Eddie

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 19:36   ` Edward Kuns
@ 2015-11-04 19:42     ` Wols Lists
  2015-11-04 20:09     ` Phil Turmel
  1 sibling, 0 replies; 10+ messages in thread
From: Wols Lists @ 2015-11-04 19:42 UTC (permalink / raw)
  To: Edward Kuns; +Cc: Linux-RAID

On 04/11/15 19:36, Edward Kuns wrote:
> Is there a reasonable way of finding out if a shorter setting is
> appropriate for any specific drive?  Or would you say in general it's
> not worth the effort of trying to find out?  Would you expect this
> behavior to be any different for an SSD?

aiui,what matters is that the linux-level timeout is longer than the
disk timeout. Given that, the length of the linux timeout is irrelevant
- once the disk times out linux will handle the problem.

SSDs - I wouldn't have a clue ...

Cheers,
Wol

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 19:36   ` Edward Kuns
  2015-11-04 19:42     ` Wols Lists
@ 2015-11-04 20:09     ` Phil Turmel
  1 sibling, 0 replies; 10+ messages in thread
From: Phil Turmel @ 2015-11-04 20:09 UTC (permalink / raw)
  To: Edward Kuns; +Cc: Brad Campbell, Linux-RAID

Hi Edward,

On 11/04/2015 02:36 PM, Edward Kuns wrote:

> Is there a reasonable way of finding out if a shorter setting is 
> appropriate for any specific drive?

When I first learned all this (the hard way) with Seagate drives, 120
seconds was enough.  You'll find that in old archives, 2011-ish.  I
don't remember who, but someone had a drive that took longer and
suggested 180 seconds.

> Or would you say in general it's not worth the effort of trying to
> find out?

Not worth the effort.  It's a work-around for unsuitable devices until
such time as you can retire them.

> Would you expect this behavior to be any different for an SSD?

I'd expect an SSD's worst case error recovery to be much shorter --
there's no positioning to wait for, nor any mechanical effects that'll
make retrys meaningful.

Phil

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Advice requested re: hard drive setup for RAID arrays
  2015-11-04 14:30   ` o1bigtenor
  2015-11-04 15:05     ` Phil Turmel
@ 2015-11-05  6:02     ` Brad Campbell
  1 sibling, 0 replies; 10+ messages in thread
From: Brad Campbell @ 2015-11-05  6:02 UTC (permalink / raw)
  To: o1bigtenor, Phil Turmel; +Cc: Linux-RAID

On 04/11/15 22:30, o1bigtenor wrote:
>
> Those 2 drives are in a separate USB connected tray.
> When I ran the smartctl* command that you had me run to check the status
> and configuration of the drives it borked on that drive telling me that it was
> connected using USB and I needed to add some other command.

Right. USB is a bit of a strange animal for storage. Different USB-SATA 
interfaces pass or block different commands and it's very vendor 
specific. I'd not have expected that script to work on USB drives 
because it was written to solve *my* particular issue and I don't use 
USB connected drives for anything vaguely critical.

You might be able to get it to work on USB drives, but I'd be very 
surprised if their error recovery behaviour was suitable or deterministic.

This is what it looks like on my system.
root@srv:~# bin/set_sct
/dev/sda  is good Device Model:     SAMSUNG SSD 830 Series
/dev/sdb  is good Device Model:     SAMSUNG SSD 830 Series
/dev/sdc  is good Device Model:     WDC WD20EFRX-68AX9N0
/dev/sdd  is good Device Model:     SAMSUNG SSD 830 Series
/dev/sde  is good Device Model:     WDC WD20EFRX-68EUZN0
/dev/sdf  is good Product:              ST3300655SS
/dev/sdg  is good Product:              ST3300655SS
/dev/sdh  is good Product:              ST3300655SS
/dev/sdi  is good Product:              ST3300655SS
/dev/sdj  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdk  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdl  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdm  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdn  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdo  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdp  is good Device Model:     WDC WD20EFRX-68AX9N0
/dev/sdq  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdr  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sds  is good Device Model:     WDC WD20EFRX-68AX9N0
/dev/sdt  is good Device Model:     WDC WD20EARS-60MVWB0
/dev/sdu  is  bad Device Model:     INTEL SSDSC2CT240A3
/dev/sdv  is  bad Device Model:     INTEL SSDSC2CT240A3
/dev/sdw  is  bad Device Model:     INTEL SSDSC2CT240A3
/dev/sdx  is good Device Model:     WDC WD20EFRX-68AX9N0

Notice all my WD Green drives support ERC. I must have got the very last 
of the drives before they knobbled the firmware.

I iterate *every* drive in the system because every drive in the system 
is part of an array. Again, it was written to scratch my itch and 
possibly serve as an (perhaps bad) example to others.

There are 3 arrays on that system. 6xSSD in a RAID10, 4x15k SAS in a 
RAID10 and 14x2TB SATA in a RAID6. I've experienced catastrophic data 
loss with RAID twice in 10 years. The first time due to a bad IDE 
controller dropping multiple drives and md not being as robust about 
recovery as it is these days (and it was a RADI5), and the second due to 
a SIL PCIe controller silently corrupting writes which gently sprinkled 
corruption across 16TB over a long period. Both times by backups were 
inadequate because I believed RAID==Backup. I know better now.

You need to pay close attention to the whole storage stack to get a 
reliable system. I'd be doing something to replace those USB connections 
with something more suitable, but that's just me.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-11-05  6:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-11-04 12:02 Advice requested re: hard drive setup for RAID arrays o1bigtenor
2015-11-04 13:13 ` Phil Turmel
2015-11-04 14:30   ` o1bigtenor
2015-11-04 15:05     ` Phil Turmel
     [not found]       ` <CAPpdf5_-3TOiCKq_dDTYGPcJMeEDMRD+xTAjkm-enmCnZPdtzg@mail.gmail.com>
2015-11-04 16:43         ` Phil Turmel
2015-11-04 17:27           ` Rudy Zijlstra
2015-11-05  6:02     ` Brad Campbell
2015-11-04 19:36   ` Edward Kuns
2015-11-04 19:42     ` Wols Lists
2015-11-04 20:09     ` Phil Turmel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).