linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Why does MD overwrite the superblock upon temporary disconnect?
@ 2010-09-21  2:49 Jim Schatzman
  2010-09-21  3:34 ` Richard Scobie
  2010-09-21  4:09 ` Neil Brown
  0 siblings, 2 replies; 5+ messages in thread
From: Jim Schatzman @ 2010-09-21  2:49 UTC (permalink / raw)
  To: linux-raid

I have seen quite a number of people writing about what happens to their SATA RAID5 arrays when several drives get accidentally unplugged. Especially with port multipliers and external drive boxes with eSata cables, this can happen rather easily.

In my case, an 8-drive RAID6 array had this happen to it. Even though the array was not in active use at the time, MD immediately marked the 4 temporarily-disconnected drives as "Spare". No combination of "assemble" options seems able to fix this. 4 drives still have "active slot N" status; the other 4 are "spare", wiping out the slot metadata. Apparently, you have to use "create" to recreate the metadata, marking two slots as "missing" (for RAID 6); check the resulting RAID data; then add the "missing" drives back in. Presumably, you should write down the slot numbers or this may be difficult.

This procedure works, but for a 12 TB array, the resyncs take a long time. 1 second of cable disconnect for 2 days of resync.

I have some questions-

1) When MD detects that so many drives are offline that the RAID can't function, why not just put the RAID in "stop" state and avoid changing any metadata? Yes, I know that there is a risk of data corruption, but isn't minor data corruption often better than total data loss? 

2) Couldn't there be a way to put the RAID back together tentatively (with all 8 drives), check the parity, and go with the result if the parity is o.k.? That would save some time as compared to resyncing two disks from scratch.

3) In my case, I am more concerned about catastrophic data loss than 100% up time. I want to have the raids assembled with "--no-degraded". Is it possible to tell the kernel to do this, or would it be better to specify "AUTO -1.x" in mdadm.conf and use a cron.reboot script to start the RAID? 

4) Also, would "--no-degraded" help prevent MD from overwriting the metadata (switching drives to "spare" state)?

Thanks!

Jim






 


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-10-05  0:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-21  2:49 Why does MD overwrite the superblock upon temporary disconnect? Jim Schatzman
2010-09-21  3:34 ` Richard Scobie
2010-09-21  4:09 ` Neil Brown
2010-09-21 13:16   ` Jim Schatzman
2010-10-05  0:22     ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).