making raid5 more robust after a crash?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* making raid5 more robust after a crash?
@ 2006-03-17 13:02 Chris Allen
  2006-03-17 21:13 ` Neil Brown
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Allen @ 2006-03-17 13:02 UTC (permalink / raw)
  To: linux-raid

Dear All,

We have a number of machines running 4TB raid5 arrays.
Occasionally one of these machines will lock up solid and
will need power cycling. Often when this happens, the
array will refuse to restart with 'cannot start dirty
degraded array'. Usually  mdadm --assemble --force will
get the thing going again - although it will then do
a complete resync.


My question is: Is there any way I can make the array
more robust? I don't mind it losing a single drive and
having to resync when we get a lockup - but having to
do a forced assemble always makes me nervous, and means
that this sort of crash has to be escalated to a senior
engineer.

Is there any way of making the array so that there is
never more than one drive out of sync? I don't mind
if it slows things down *lots* - I'd just much prefer
robustness over performance.


Thanks,

Chris Allen.


---------------------------------


Typical syslog:


Mar 17 10:45:24 snap27 kernel: md: Autodetecting RAID arrays.
Mar 17 10:45:24 snap27 kernel: md: autorun ...
Mar 17 10:45:24 snap27 kernel: md: considering sdh1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdh1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdg1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdf1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sde1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdd1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdc1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sda1 ...
Mar 17 10:45:24 snap27 kernel: md: created md0
Mar 17 10:45:24 snap27 kernel: md: bind<sda1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdc1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdd1>
Mar 17 10:45:24 snap27 kernel: md: bind<sde1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdf1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdg1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdh1>
Mar 17 10:45:24 snap27 kernel: md: running: <sdh1><sdg1><sdf1><sde1><sdd1><sdc1><sda1>
Mar 17 10:45:24 snap27 kernel: md: md0: raid array is not clean -- starting background reconstruction
Mar 17 10:45:24 snap27 kernel: raid5: device sdh1 operational as raid disk 4
Mar 17 10:45:24 snap27 kernel: raid5: device sdg1 operational as raid disk 5
Mar 17 10:45:24 snap27 kernel: raid5: device sdf1 operational as raid disk 6
Mar 17 10:45:24 snap27 kernel: raid5: device sde1 operational as raid disk 7
Mar 17 10:45:24 snap27 kernel: raid5: device sdd1 operational as raid disk 3
Mar 17 10:45:24 snap27 kernel: raid5: device sdc1 operational as raid disk 2
Mar 17 10:45:24 snap27 kernel: raid5: device sda1 operational as raid disk 0
Mar 17 10:45:24 snap27 kernel: raid5: cannot start dirty degraded array for md0
Mar 17 10:45:24 snap27 kernel: RAID5 conf printout:
Mar 17 10:45:24 snap27 kernel:  --- rd:8 wd:7 fd:1
Mar 17 10:45:24 snap27 kernel:  disk 0, o:1, dev:sda1
Mar 17 10:45:24 snap27 kernel:  disk 2, o:1, dev:sdc1
Mar 17 10:45:24 snap27 kernel:  disk 3, o:1, dev:sdd1
Mar 17 10:45:24 snap27 kernel:  disk 4, o:1, dev:sdh1
Mar 17 10:45:24 snap27 kernel:  disk 5, o:1, dev:sdg1
Mar 17 10:45:24 snap27 kernel:  disk 6, o:1, dev:sdf1
Mar 17 10:45:24 snap27 kernel:  disk 7, o:1, dev:sde1
Mar 17 10:45:24 snap27 kernel: raid5: failed to run raid set md0
Mar 17 10:45:24 snap27 kernel: md: pers->run() failed ...
Mar 17 10:45:24 snap27 kernel: md: do_md_run() returned -22
Mar 17 10:45:24 snap27 kernel: md: md0 stopped.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: making raid5 more robust after a crash?
  2006-03-17 13:02 making raid5 more robust after a crash? Chris Allen
@ 2006-03-17 21:13 ` Neil Brown
  2006-03-20 17:41   ` Martin Cracauer
  2006-03-29 13:19   ` Chris Allen
  0 siblings, 2 replies; 5+ messages in thread
From: Neil Brown @ 2006-03-17 21:13 UTC (permalink / raw)
  To: Chris Allen; +Cc: linux-raid

On Friday March 17, chris@cjx.com wrote:
> Dear All,
> 
> We have a number of machines running 4TB raid5 arrays.
> Occasionally one of these machines will lock up solid and
> will need power cycling. Often when this happens, the
> array will refuse to restart with 'cannot start dirty
> degraded array'. Usually  mdadm --assemble --force will
> get the thing going again - although it will then do
> a complete resync.
> 
> 
> My question is: Is there any way I can make the array
> more robust? I don't mind it losing a single drive and
> having to resync when we get a lockup - but having to
> do a forced assemble always makes me nervous, and means
> that this sort of crash has to be escalated to a senior
> engineer.

Why is the array degraded?

Having a crash while the array is degraded can cause undetectable data
loss.  That is why md won't assemble the array itself: you need to
know there could be a problem.

But a crash with a degraded array should be fairly unusual.  If it is
happening a lot, then there must be something wrong with your config:
either you are running degraded a lot (which is not safe, don't do
it), or md cannot find all the devices to assemble.
> 
> 
> Typical syslog:
> 
> 
> Mar 17 10:45:24 snap27 kernel: md: Autodetecting RAID arrays.
> Mar 17 10:45:24 snap27 kernel: md: autorun ...
> Mar 17 10:45:24 snap27 kernel: md: considering sdh1 ...
> Mar 17 10:45:24 snap27 kernel: md:  adding sdh1 ...
> Mar 17 10:45:24 snap27 kernel: md:  adding sdg1 ...
> Mar 17 10:45:24 snap27 kernel: md:  adding sdf1 ...
> Mar 17 10:45:24 snap27 kernel: md:  adding sde1 ...
> Mar 17 10:45:24 snap27 kernel: md:  adding sdd1 ...
> Mar 17 10:45:24 snap27 kernel: md:  adding sdc1 ...
> Mar 17 10:45:24 snap27 kernel: md:  adding sda1 ...
> Mar 17 10:45:24 snap27 kernel: md: created md0
> Mar 17 10:45:24 snap27 kernel: md: bind<sda1>
> Mar 17 10:45:24 snap27 kernel: md: bind<sdc1>
> Mar 17 10:45:24 snap27 kernel: md: bind<sdd1>
> Mar 17 10:45:24 snap27 kernel: md: bind<sde1>
> Mar 17 10:45:24 snap27 kernel: md: bind<sdf1>
> Mar 17 10:45:24 snap27 kernel: md: bind<sdg1>
> Mar 17 10:45:24 snap27 kernel: md: bind<sdh1>
> Mar 17 10:45:24 snap27 kernel: md: running: <sdh1><sdg1><sdf1><sde1><sdd1><sdc1><sda1>
> Mar 17 10:45:24 snap27 kernel: md: md0: raid array is not clean -- starting background reconstruction
> Mar 17 10:45:24 snap27 kernel: raid5: device sdh1 operational as raid disk 4
> Mar 17 10:45:24 snap27 kernel: raid5: device sdg1 operational as raid disk 5
> Mar 17 10:45:24 snap27 kernel: raid5: device sdf1 operational as raid disk 6
> Mar 17 10:45:24 snap27 kernel: raid5: device sde1 operational as raid disk 7
> Mar 17 10:45:24 snap27 kernel: raid5: device sdd1 operational as raid disk 3
> Mar 17 10:45:24 snap27 kernel: raid5: device sdc1 operational as raid disk 2
> Mar 17 10:45:24 snap27 kernel: raid5: device sda1 operational as raid disk 0
> Mar 17 10:45:24 snap27 kernel: raid5: cannot start dirty degraded
> array for md0

So where is 'disk 1' ??  Presumably it should be 'sdb1'.  Does that
drive exist?  Is is marked for auto-detect like the others?

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: making raid5 more robust after a crash?
  2006-03-17 21:13 ` Neil Brown
@ 2006-03-20 17:41   ` Martin Cracauer
  2006-03-29 13:19   ` Chris Allen
  1 sibling, 0 replies; 5+ messages in thread
From: Martin Cracauer @ 2006-03-20 17:41 UTC (permalink / raw)
  To: Neil Brown; +Cc: Chris Allen, linux-raid

Neil Brown wrote on Sat, Mar 18, 2006 at 08:13:48AM +1100: 
> On Friday March 17, chris@cjx.com wrote:
> > Dear All,
> > 
> > We have a number of machines running 4TB raid5 arrays.
> > Occasionally one of these machines will lock up solid and
> > will need power cycling. Often when this happens, the
> > array will refuse to restart with 'cannot start dirty
> > degraded array'. Usually  mdadm --assemble --force will
> > get the thing going again - although it will then do
> > a complete resync.

First of all you need to make sure you can see the kernel messages
from this.  If /var/log/messages lives on the array affected you won't
see messages explaining what happens even if the kernel printed them.

What you see here is probably similar to a problem I just had: by
using software RAID you are subject to errors below the RAID level
that are not disk errors.  In my case a BIOS problem on my board made
the SATA driver run out of space, on requests for two of the disks on
my RAID-5, simultaneously.  The driver had to report an error upstream
and the RAID software on top of it cannot tell such a non-disk error
from a disk error.  It treats everything as a disk error and drops the
disk out of the array because it has seen errors on requests for two
disks.

I have more info on my accident here:
http://forums.2cpu.com/showthread.php?t=73705

As I said, you need to have a logfile on a disk not in the array, or
(better) you need to be able to watch kernel messages on the console
when this happens.

It sounds to me you have a similar problem to what I had: a software
error above the disks but below the raid level.

> > 
> > 
> > My question is: Is there any way I can make the array
> > more robust? I don't mind it losing a single drive and
> > having to resync when we get a lockup - but having to
> > do a forced assemble always makes me nervous, and means
> > that this sort of crash has to be escalated to a senior
> > engineer.

The re-sync is actually a big problem because actually losing a drive
physically during the re-sync will kill your array (unless it is the
re-syncing disk).

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@cons.org>   http://www.cons.org/cracauer/
FreeBSD - where you want to go, today.      http://www.freebsd.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: making raid5 more robust after a crash?
  2006-03-17 21:13 ` Neil Brown
  2006-03-20 17:41   ` Martin Cracauer
@ 2006-03-29 13:19   ` Chris Allen
  2006-03-29 22:17     ` Neil Brown
  1 sibling, 1 reply; 5+ messages in thread
From: Chris Allen @ 2006-03-29 13:19 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid

On Sat, Mar 18, 2006 at 08:13:48AM +1100, Neil Brown wrote:
> On Friday March 17, chris@cjx.com wrote:
> > Dear All,
> > 
> > We have a number of machines running 4TB raid5 arrays.
> > Occasionally one of these machines will lock up solid and
> > will need power cycling. Often when this happens, the
> > array will refuse to restart with 'cannot start dirty
> > degraded array'. Usually  mdadm --assemble --force will
> > get the thing going again - although it will then do
> > a complete resync.
> > 
> > 
> > My question is: Is there any way I can make the array
> > more robust? I don't mind it losing a single drive and
> > having to resync when we get a lockup - but having to
> > do a forced assemble always makes me nervous, and means
> > that this sort of crash has to be escalated to a senior
> > engineer.
> 
> Why is the array degraded?
> 
> Having a crash while the array is degraded can cause undetectable data
> loss.  That is why md won't assemble the array itself: you need to
> know there could be a problem.
> 
> But a crash with a degraded array should be fairly unusual.  If it is
> happening a lot, then there must be something wrong with your config:
> either you are running degraded a lot (which is not safe, don't do
> it), or md cannot find all the devices to assemble.

Thanks for your reply. As you guessed, this was a problem
with our hardware/config and nothing to do with the raid software.

After much investigation we found that we had two separate problems.
The first of these was a SATA driver problem. This would occasionally
return hard errors for a drive in the array, after which it would
get kicked. The second was XFS over NFS using up too much kernel
stack and hanging the machine. If both happened before we noticed
(say during the night), the result would be one drive dirty because
of the SATA driver and one dirty because of the lockup.

The real sting in the tail is that (for some reason) the drive lost through the SATA
problem would not be marked as dirty - so if the array was force rebuilt it
would be used in place of the more recent failure - causing horrible
synchronisation problems.

Can anybody point me to the syntax I could use for saying:

"force rebuild the array using drives ABCD but not E, even though
E looks fresh and D doesn't".

?

> > 
> > Typical syslog:
> > 
> > 
> > Mar 17 10:45:24 snap27 kernel: md: Autodetecting RAID arrays.
> > Mar 17 10:45:24 snap27 kernel: raid5: cannot start dirty degraded
> > array for md0
> 
> So where is 'disk 1' ??  Presumably it should be 'sdb1'.  Does that
> drive exist?  Is is marked for auto-detect like the others?

Ok, this syslog was a complete red herring for the above problem - 
and you hit the nail right on the head - in this particular case I
had installed a new sdb1 and forgot to set the autodetect flag :-)

Chris.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: making raid5 more robust after a crash?
  2006-03-29 13:19   ` Chris Allen
@ 2006-03-29 22:17     ` Neil Brown
  0 siblings, 0 replies; 5+ messages in thread
From: Neil Brown @ 2006-03-29 22:17 UTC (permalink / raw)
  To: Chris Allen; +Cc: linux-raid

On Wednesday March 29, chris@cjx.com wrote:
> 
> Thanks for your reply. As you guessed, this was a problem
> with our hardware/config and nothing to do with the raid software.

I'm glad you have found your problem!
> 
> Can anybody point me to the syntax I could use for saying:
> 
> "force rebuild the array using drives ABCD but not E, even though
> E looks fresh and D doesn't".

mdadm -Af /dev/mdX A B C D

i.e. don't even tell mdadm about E.

NeilBrown

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-03-29 22:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-17 13:02 making raid5 more robust after a crash? Chris Allen
2006-03-17 21:13 ` Neil Brown
2006-03-20 17:41   ` Martin Cracauer
2006-03-29 13:19   ` Chris Allen
2006-03-29 22:17     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).