linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Irritating RAID problem (kept spare, kicked data disks due to timestamp)
@ 2005-01-10 16:53 Scott Laird
  2005-01-10 17:16 ` maarten
  0 siblings, 1 reply; 3+ messages in thread
From: Scott Laird @ 2005-01-10 16:53 UTC (permalink / raw)
  To: linux-raid

I found an interesting problem with software RAID 5 in 2.6.10:

I have a RAID 5 array, recently created with mdadm.  It consists of 4 
160 GB drives plus a spare.  All 4 drives were active and fully synced 
when the box locked up due to some sort of hardware problem.  When I 
rebooted, the kernel refused to start the array because all 4 drives 
had an older timestamp then the spare.  So the RAID code kicked them 
out, one after another, until it was left with just a single spare 
disk.  Since it can't start an array with 0/4 disks, it failed.  I was 
able to repeat this with 2.6.10 and 2.6.2 (the only other kernel I had 
handy).  Pulling the spare disk and rebooting fixed everything.

I don't have an record of the logs during this period--the box was in 
single-user mode with disk problems, and I didn't want to write 
anything to the disk.

Logically, it seems like the kernel's RAID recovery code shouldn't look 
for the newest disk, it should really look for a quorum, even if that 
means kicking out newer timestamps.  *Especially* when the newer 
timestamp is the spare disk.


Scott


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Irritating RAID problem (kept spare, kicked data disks due to timestamp)
  2005-01-10 16:53 Irritating RAID problem (kept spare, kicked data disks due to timestamp) Scott Laird
@ 2005-01-10 17:16 ` maarten
  2005-01-10 17:35   ` Scott Laird
  0 siblings, 1 reply; 3+ messages in thread
From: maarten @ 2005-01-10 17:16 UTC (permalink / raw)
  To: linux-raid

On Monday 10 January 2005 17:53, Scott Laird wrote:
> I found an interesting problem with software RAID 5 in 2.6.10:
>
> I have a RAID 5 array, recently created with mdadm.  It consists of 4
> 160 GB drives plus a spare.  All 4 drives were active and fully synced
> when the box locked up due to some sort of hardware problem.  When I
> rebooted, the kernel refused to start the array because all 4 drives
> had an older timestamp then the spare.  So the RAID code kicked them
> out, one after another, until it was left with just a single spare
> disk.  Since it can't start an array with 0/4 disks, it failed.  I was
> able to repeat this with 2.6.10 and 2.6.2 (the only other kernel I had
> handy).  Pulling the spare disk and rebooting fixed everything.
>
> Logically, it seems like the kernel's RAID recovery code shouldn't look
> for the newest disk, it should really look for a quorum, even if that
> means kicking out newer timestamps.  *Especially* when the newer
> timestamp is the spare disk.

Or rather, find out how the spare can have a newer timestamp, since this is 
not something that should ever happen.  Afaik.  This might be a bug ?

Maarten


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Irritating RAID problem (kept spare, kicked data disks due to timestamp)
  2005-01-10 17:16 ` maarten
@ 2005-01-10 17:35   ` Scott Laird
  0 siblings, 0 replies; 3+ messages in thread
From: Scott Laird @ 2005-01-10 17:35 UTC (permalink / raw)
  To: maarten; +Cc: linux-raid


On Jan 10, 2005, at 9:16 AM, maarten wrote:
> Or rather, find out how the spare can have a newer timestamp, since 
> this is
> not something that should ever happen.  Afaik.  This might be a bug ?

Well, my disk controller locked up right before this happened, so it's 
probably possible that the timestamp writes to the other drives were 
lost.  I've went through disk hell over the past week reconstructing 
multiple failed RAID arrays (disk failure, my screwup, nothing 
mysterious, just irritating), and this is just the most recent problem. 
  I wouldn't necessarily read too much into how the timestamp weirdness 
happened.


Scott


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-01-10 17:35 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-10 16:53 Irritating RAID problem (kept spare, kicked data disks due to timestamp) Scott Laird
2005-01-10 17:16 ` maarten
2005-01-10 17:35   ` Scott Laird

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).