* Irritating RAID problem (kept spare, kicked data disks due to timestamp)
@ 2005-01-10 16:53 Scott Laird
2005-01-10 17:16 ` maarten
0 siblings, 1 reply; 3+ messages in thread
From: Scott Laird @ 2005-01-10 16:53 UTC (permalink / raw)
To: linux-raid
I found an interesting problem with software RAID 5 in 2.6.10:
I have a RAID 5 array, recently created with mdadm. It consists of 4
160 GB drives plus a spare. All 4 drives were active and fully synced
when the box locked up due to some sort of hardware problem. When I
rebooted, the kernel refused to start the array because all 4 drives
had an older timestamp then the spare. So the RAID code kicked them
out, one after another, until it was left with just a single spare
disk. Since it can't start an array with 0/4 disks, it failed. I was
able to repeat this with 2.6.10 and 2.6.2 (the only other kernel I had
handy). Pulling the spare disk and rebooting fixed everything.
I don't have an record of the logs during this period--the box was in
single-user mode with disk problems, and I didn't want to write
anything to the disk.
Logically, it seems like the kernel's RAID recovery code shouldn't look
for the newest disk, it should really look for a quorum, even if that
means kicking out newer timestamps. *Especially* when the newer
timestamp is the spare disk.
Scott
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Irritating RAID problem (kept spare, kicked data disks due to timestamp)
2005-01-10 16:53 Irritating RAID problem (kept spare, kicked data disks due to timestamp) Scott Laird
@ 2005-01-10 17:16 ` maarten
2005-01-10 17:35 ` Scott Laird
0 siblings, 1 reply; 3+ messages in thread
From: maarten @ 2005-01-10 17:16 UTC (permalink / raw)
To: linux-raid
On Monday 10 January 2005 17:53, Scott Laird wrote:
> I found an interesting problem with software RAID 5 in 2.6.10:
>
> I have a RAID 5 array, recently created with mdadm. It consists of 4
> 160 GB drives plus a spare. All 4 drives were active and fully synced
> when the box locked up due to some sort of hardware problem. When I
> rebooted, the kernel refused to start the array because all 4 drives
> had an older timestamp then the spare. So the RAID code kicked them
> out, one after another, until it was left with just a single spare
> disk. Since it can't start an array with 0/4 disks, it failed. I was
> able to repeat this with 2.6.10 and 2.6.2 (the only other kernel I had
> handy). Pulling the spare disk and rebooting fixed everything.
>
> Logically, it seems like the kernel's RAID recovery code shouldn't look
> for the newest disk, it should really look for a quorum, even if that
> means kicking out newer timestamps. *Especially* when the newer
> timestamp is the spare disk.
Or rather, find out how the spare can have a newer timestamp, since this is
not something that should ever happen. Afaik. This might be a bug ?
Maarten
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Irritating RAID problem (kept spare, kicked data disks due to timestamp)
2005-01-10 17:16 ` maarten
@ 2005-01-10 17:35 ` Scott Laird
0 siblings, 0 replies; 3+ messages in thread
From: Scott Laird @ 2005-01-10 17:35 UTC (permalink / raw)
To: maarten; +Cc: linux-raid
On Jan 10, 2005, at 9:16 AM, maarten wrote:
> Or rather, find out how the spare can have a newer timestamp, since
> this is
> not something that should ever happen. Afaik. This might be a bug ?
Well, my disk controller locked up right before this happened, so it's
probably possible that the timestamp writes to the other drives were
lost. I've went through disk hell over the past week reconstructing
multiple failed RAID arrays (disk failure, my screwup, nothing
mysterious, just irritating), and this is just the most recent problem.
I wouldn't necessarily read too much into how the timestamp weirdness
happened.
Scott
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2005-01-10 17:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-10 16:53 Irritating RAID problem (kept spare, kicked data disks due to timestamp) Scott Laird
2005-01-10 17:16 ` maarten
2005-01-10 17:35 ` Scott Laird
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).