making raid5 more robust after a crash?

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Chris Allen <chris@cjx.com>
To: linux-raid@vger.kernel.org
Subject: making raid5 more robust after a crash?
Date: Fri, 17 Mar 2006 13:02:47 +0000	[thread overview]
Message-ID: <20060317130247.GA19878@cjx.com> (raw)

Dear All,

We have a number of machines running 4TB raid5 arrays.
Occasionally one of these machines will lock up solid and
will need power cycling. Often when this happens, the
array will refuse to restart with 'cannot start dirty
degraded array'. Usually  mdadm --assemble --force will
get the thing going again - although it will then do
a complete resync.

My question is: Is there any way I can make the array
more robust? I don't mind it losing a single drive and
having to resync when we get a lockup - but having to
do a forced assemble always makes me nervous, and means
that this sort of crash has to be escalated to a senior
engineer.

Is there any way of making the array so that there is
never more than one drive out of sync? I don't mind
if it slows things down *lots* - I'd just much prefer
robustness over performance.

Thanks,

Chris Allen.

---------------------------------

Typical syslog:

Mar 17 10:45:24 snap27 kernel: md: Autodetecting RAID arrays.
Mar 17 10:45:24 snap27 kernel: md: autorun ...
Mar 17 10:45:24 snap27 kernel: md: considering sdh1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdh1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdg1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdf1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sde1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdd1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sdc1 ...
Mar 17 10:45:24 snap27 kernel: md:  adding sda1 ...
Mar 17 10:45:24 snap27 kernel: md: created md0
Mar 17 10:45:24 snap27 kernel: md: bind<sda1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdc1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdd1>
Mar 17 10:45:24 snap27 kernel: md: bind<sde1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdf1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdg1>
Mar 17 10:45:24 snap27 kernel: md: bind<sdh1>
Mar 17 10:45:24 snap27 kernel: md: running: <sdh1><sdg1><sdf1><sde1><sdd1><sdc1><sda1>
Mar 17 10:45:24 snap27 kernel: md: md0: raid array is not clean -- starting background reconstruction
Mar 17 10:45:24 snap27 kernel: raid5: device sdh1 operational as raid disk 4
Mar 17 10:45:24 snap27 kernel: raid5: device sdg1 operational as raid disk 5
Mar 17 10:45:24 snap27 kernel: raid5: device sdf1 operational as raid disk 6
Mar 17 10:45:24 snap27 kernel: raid5: device sde1 operational as raid disk 7
Mar 17 10:45:24 snap27 kernel: raid5: device sdd1 operational as raid disk 3
Mar 17 10:45:24 snap27 kernel: raid5: device sdc1 operational as raid disk 2
Mar 17 10:45:24 snap27 kernel: raid5: device sda1 operational as raid disk 0
Mar 17 10:45:24 snap27 kernel: raid5: cannot start dirty degraded array for md0
Mar 17 10:45:24 snap27 kernel: RAID5 conf printout:
Mar 17 10:45:24 snap27 kernel:  --- rd:8 wd:7 fd:1
Mar 17 10:45:24 snap27 kernel:  disk 0, o:1, dev:sda1
Mar 17 10:45:24 snap27 kernel:  disk 2, o:1, dev:sdc1
Mar 17 10:45:24 snap27 kernel:  disk 3, o:1, dev:sdd1
Mar 17 10:45:24 snap27 kernel:  disk 4, o:1, dev:sdh1
Mar 17 10:45:24 snap27 kernel:  disk 5, o:1, dev:sdg1
Mar 17 10:45:24 snap27 kernel:  disk 6, o:1, dev:sdf1
Mar 17 10:45:24 snap27 kernel:  disk 7, o:1, dev:sde1
Mar 17 10:45:24 snap27 kernel: raid5: failed to run raid set md0
Mar 17 10:45:24 snap27 kernel: md: pers->run() failed ...
Mar 17 10:45:24 snap27 kernel: md: do_md_run() returned -22
Mar 17 10:45:24 snap27 kernel: md: md0 stopped.

next             reply	other threads:[~2006-03-17 13:02 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-03-17 13:02 Chris Allen [this message]
2006-03-17 21:13 ` making raid5 more robust after a crash? Neil Brown
2006-03-20 17:41   ` Martin Cracauer
2006-03-29 13:19   ` Chris Allen
2006-03-29 22:17     ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060317130247.GA19878@cjx.com \
    --to=chris@cjx.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).