* Raid Degradation best practices
@ 2009-11-06 12:47 Andrew Dunn
2009-11-07 11:51 ` Justin Piszcz
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Dunn @ 2009-11-06 12:47 UTC (permalink / raw)
To: linux-raid, nfbrown
This morning my array lost a drive, then another. This was not due to
drive failures of hardware issues.
I did not have any data on the array so I wiped the drives and
re-created it. So far everything seems fine.
I would like to know what some of your practices are for the scenario
where you loose a drive or more on a mission critical array. I chose the
newbish route because I didn't have enough time or expertise to diagnose
the problems further. Typically this means it will happen again, and I
will still be unprepared.
Would appreciate your input, thanks.
--
Andrew Dunn
http://agdunn.net
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Raid Degradation best practices
2009-11-06 12:47 Raid Degradation best practices Andrew Dunn
@ 2009-11-07 11:51 ` Justin Piszcz
2009-11-07 13:12 ` Andrew Dunn
0 siblings, 1 reply; 4+ messages in thread
From: Justin Piszcz @ 2009-11-07 11:51 UTC (permalink / raw)
To: Andrew Dunn; +Cc: linux-raid, nfbrown
On Fri, 6 Nov 2009, Andrew Dunn wrote:
> This morning my array lost a drive, then another. This was not due to
> drive failures of hardware issues.
>
> I did not have any data on the array so I wiped the drives and
> re-created it. So far everything seems fine.
>
> I would like to know what some of your practices are for the scenario
> where you loose a drive or more on a mission critical array. I chose the
> newbish route because I didn't have enough time or expertise to diagnose
> the problems further. Typically this means it will happen again, and I
> will still be unprepared.
>
> Would appreciate your input, thanks.
>
> --
> Andrew Dunn
> http://agdunn.net
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
How did you lose the drives then?
If it is a mission-critical array, what RAID type are you using?
Are you backing up the data regularly, since it is mission-critical?
Justin.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Raid Degradation best practices
2009-11-07 11:51 ` Justin Piszcz
@ 2009-11-07 13:12 ` Andrew Dunn
2009-11-07 15:02 ` Goswin von Brederlow
0 siblings, 1 reply; 4+ messages in thread
From: Andrew Dunn @ 2009-11-07 13:12 UTC (permalink / raw)
To: Justin Piszcz; +Cc: linux-raid, nfbrown
I am using RAID6, on 9 WD1001FALS drives.
The VERY important data is backed up to multiple external drives and
stored at a separate location.
I figured out my issue last night. I had an issue with the array where
it was doing the silly /dev/md_d0 thing, so when I stopped that and
started the new one I did '--assume-clean' then when I started copying
my information back to the array multiple devices dropped out. Their
SMART information passes just fine, so it must have been the array was
not clean.
This was my mistake, but in the future when I have a real drive failure
I was curious to see how you approach that issue.
Thanks,
Justin Piszcz wrote:
>
>
> On Fri, 6 Nov 2009, Andrew Dunn wrote:
>
>> This morning my array lost a drive, then another. This was not due to
>> drive failures of hardware issues.
>>
>> I did not have any data on the array so I wiped the drives and
>> re-created it. So far everything seems fine.
>>
>> I would like to know what some of your practices are for the scenario
>> where you loose a drive or more on a mission critical array. I chose the
>> newbish route because I didn't have enough time or expertise to diagnose
>> the problems further. Typically this means it will happen again, and I
>> will still be unprepared.
>>
>> Would appreciate your input, thanks.
>>
>> --
>> Andrew Dunn
>> http://agdunn.net
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> How did you lose the drives then?
>
> If it is a mission-critical array, what RAID type are you using?
>
> Are you backing up the data regularly, since it is mission-critical?
>
> Justin.
>
--
Andrew Dunn
http://agdunn.net
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Raid Degradation best practices
2009-11-07 13:12 ` Andrew Dunn
@ 2009-11-07 15:02 ` Goswin von Brederlow
0 siblings, 0 replies; 4+ messages in thread
From: Goswin von Brederlow @ 2009-11-07 15:02 UTC (permalink / raw)
To: Andrew Dunn; +Cc: Justin Piszcz, linux-raid, nfbrown
Andrew Dunn <andrew.g.dunn@gmail.com> writes:
> I am using RAID6, on 9 WD1001FALS drives.
>
> The VERY important data is backed up to multiple external drives and
> stored at a separate location.
>
> I figured out my issue last night. I had an issue with the array where
> it was doing the silly /dev/md_d0 thing, so when I stopped that and
> started the new one I did '--assume-clean' then when I started copying
> my information back to the array multiple devices dropped out. Their
> SMART information passes just fine, so it must have been the array was
> not clean.
--assume-clean just skips the resync. If the array is actualy not
clean you just get an increased mismatch_cnt when you run a check and
bad data when a disk fails. It never causes a disk to drop out.
> This was my mistake, but in the future when I have a real drive failure
> I was curious to see how you approach that issue.
Having bitmaps helps since when a disk temporarily drops out and you
ad it back you only need to resync the bits that have changed. But
that only reduces the window where another (or a third for raid6) disk
failure is critical. If you get 3 failed disk, temporary or not, at
the same time then your raid6 breaks and you need to put the pices
back together yourself. Depending on each components state you might
have data loss or corruption.
The best thing to do is to make sure your hardware is fit and does not
just drop out for a minute.
MfG
Goswin
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2009-11-07 15:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-06 12:47 Raid Degradation best practices Andrew Dunn
2009-11-07 11:51 ` Justin Piszcz
2009-11-07 13:12 ` Andrew Dunn
2009-11-07 15:02 ` Goswin von Brederlow
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).