* Re: Bizarre RAID "failure"
2004-02-19 22:44 Bizarre RAID "failure" Tom Maddox
@ 2004-02-19 23:38 ` Måns Rullgård
2004-02-20 0:39 ` Kanoa Withington
` (2 subsequent siblings)
3 siblings, 0 replies; 6+ messages in thread
From: Måns Rullgård @ 2004-02-19 23:38 UTC (permalink / raw)
To: linux-raid
Tom Maddox <tmaddox@thereinc.com> writes:
> If the system goes down unexpectedly (e.g., because of a power failure),
> the RAID array comes back up dirty and begins to rebuild itself, which
> is odd enough on its own.
This is supposedly much better in 2.6 kernels.
--
Måns Rullgård
mru@kth.se
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Bizarre RAID "failure"
2004-02-19 22:44 Bizarre RAID "failure" Tom Maddox
2004-02-19 23:38 ` Måns Rullgård
@ 2004-02-20 0:39 ` Kanoa Withington
2004-02-20 0:44 ` Tom Maddox
2004-02-20 8:33 ` Nathan Hunsperger
2004-03-02 16:56 ` Corey McGuire
3 siblings, 1 reply; 6+ messages in thread
From: Kanoa Withington @ 2004-02-20 0:39 UTC (permalink / raw)
To: Tom Maddox; +Cc: linux-raid
It might not be related but I've seen odd behaviour with that kernel
and XFS family when xfs_repair/XFS Recovery is run while the array is
resyncing.
If you have the opportunity to try some tests, you might try booting
the system without the volume mounted to avoid the automatic XFS
checking. When the resync is complete, then try mounting the volume.
-Kanoa
On Thu, 19 Feb 2004, Tom Maddox wrote:
> Hi, all,
>
> I'm encountering a bizarre problem with software RAID 5 under Linux that
> I'm hoping someone on this list can help me solve or at least
> understand.
>
> I've got a box running Red Hat 7.3 with SGI's 2.4.18 XFS 1.1 kernel.
> It's using three FastTrak TX 2000 (PDC20271) cards in non-RAID mode with
> three Western Digital 200 GB drives. I'm using those controllers
> because they were handy and they support large drives. The drives are
> in an XFS-formatted RAID 5 array using md, which has never given me
> problems before. In this case, however, I'm running into some seriously
> anomalous behavior.
>
> If the system goes down unexpectedly (e.g., because of a power failure),
> the RAID array comes back up dirty and begins to rebuild itself, which
> is odd enough on its own. What's worse is that, whenever this happens,
> the rebuild hangs at about 2.4%. When it reaches that point, the array
> becomes totally nonresponsive--I can't even query its status with mdadm
> or any other tool, although I can use "cat /proc/mdstat" to see the
> status of the rebuild. Any command that attempts to access the RAID
> drive hangs.
>
> My assumption would normally be that there's a hardware failure
> somewhere, but I've swapped out each component individually (including
> cables!) and the same problem keeps happening.
>
> Has anyone seen this behavior before, and can you recommend a solution?
>
> Thanks,
>
> Tom
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bizarre RAID "failure"
2004-02-20 0:39 ` Kanoa Withington
@ 2004-02-20 0:44 ` Tom Maddox
0 siblings, 0 replies; 6+ messages in thread
From: Tom Maddox @ 2004-02-20 0:44 UTC (permalink / raw)
To: Kanoa Withington; +Cc: linux-raid
On Thu, 2004-02-19 at 16:39, Kanoa Withington wrote:
> It might not be related but I've seen odd behaviour with that kernel
> and XFS family when xfs_repair/XFS Recovery is run while the array is
> resyncing.
>
> If you have the opportunity to try some tests, you might try booting
> the system without the volume mounted to avoid the automatic XFS
> checking. When the resync is complete, then try mounting the volume.
>
> -Kanoa
Yep, tried that. I got identical results, unfortunately.
Thanks,
Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bizarre RAID "failure"
2004-02-19 22:44 Bizarre RAID "failure" Tom Maddox
2004-02-19 23:38 ` Måns Rullgård
2004-02-20 0:39 ` Kanoa Withington
@ 2004-02-20 8:33 ` Nathan Hunsperger
2004-03-02 16:56 ` Corey McGuire
3 siblings, 0 replies; 6+ messages in thread
From: Nathan Hunsperger @ 2004-02-20 8:33 UTC (permalink / raw)
To: Tom Maddox; +Cc: linux-raid
On Thu, Feb 19, 2004 at 02:44:52PM -0800, Tom Maddox wrote:
<SNIP>
> If the system goes down unexpectedly (e.g., because of a power failure),
> the RAID array comes back up dirty and begins to rebuild itself, which
> is odd enough on its own. What's worse is that, whenever this happens,
> the rebuild hangs at about 2.4%. When it reaches that point, the array
> becomes totally nonresponsive--I can't even query its status with mdadm
> or any other tool, although I can use "cat /proc/mdstat" to see the
> status of the rebuild. Any command that attempts to access the RAID
> drive hangs.
<SNIP>
> Has anyone seen this behavior before, and can you recommend a solution?
Tom,
I have had problems very similar to this before. I was running 14 fibre
channel disks on a QLA2100 HBA w/ various 2.4 kernels. What I found
was that after a while of heavy IO, all access to the disks stopped,
and the rebuild would hang. Additionally, any command that required
access to any filesystem data that wasn't cached (on any filesystem)
would hang. By switching between the 3 or so available QLA drivers,
I could affect the delta between reboot and stall. I knew the hardware
was fine, as it worked flawlessly under Solaris. In the end, I had
to upgrade the HBA to a QLA2200, at which time I had no more problems.
Because the hardware works under different OSs, I have to believe that
my problem was an incompatability between the QLA2100 and the drivers
(even though they claimed to work for it).
I hope that at least gives you some possible insight.
- Nathan
>
> Thanks,
>
> Tom
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Bizarre RAID "failure"
2004-02-19 22:44 Bizarre RAID "failure" Tom Maddox
` (2 preceding siblings ...)
2004-02-20 8:33 ` Nathan Hunsperger
@ 2004-03-02 16:56 ` Corey McGuire
3 siblings, 0 replies; 6+ messages in thread
From: Corey McGuire @ 2004-03-02 16:56 UTC (permalink / raw)
To: Tom Maddox; +Cc: linux-raid
I don't know if this till help, but I was having lots of trouble with my promise controllers until 2.4.23... before that, they were locking up drives all the time. I had one drop out entirely, with two drives. I had to rebuild a raid5 with two dead drives.
Once I upgraded, I stopped having such trouble.
One more thing I might add, I too had 3 promise controllers and that was hard to manage as well. I moved two drives to my onboard controller which not only made things a bit less flaky (I am assuming 33% less flaky) but also seemed to speed things up (later VIA chipsets take the HDD controller off of the PCI bus, and three PCI HDD controllers can easily saturate that.)
On Thursday 19 February 2004 02:44 pm, Tom Maddox wrote:
> Hi, all,
>
> I'm encountering a bizarre problem with software RAID 5 under Linux that
> I'm hoping someone on this list can help me solve or at least
> understand.
>
> I've got a box running Red Hat 7.3 with SGI's 2.4.18 XFS 1.1 kernel.
> It's using three FastTrak TX 2000 (PDC20271) cards in non-RAID mode with
> three Western Digital 200 GB drives. I'm using those controllers
> because they were handy and they support large drives. The drives are
> in an XFS-formatted RAID 5 array using md, which has never given me
> problems before. In this case, however, I'm running into some seriously
> anomalous behavior.
>
> If the system goes down unexpectedly (e.g., because of a power failure),
> the RAID array comes back up dirty and begins to rebuild itself, which
> is odd enough on its own. What's worse is that, whenever this happens,
> the rebuild hangs at about 2.4%. When it reaches that point, the array
> becomes totally nonresponsive--I can't even query its status with mdadm
> or any other tool, although I can use "cat /proc/mdstat" to see the
> status of the rebuild. Any command that attempts to access the RAID
> drive hangs.
>
> My assumption would normally be that there's a hardware failure
> somewhere, but I've swapped out each component individually (including
> cables!) and the same problem keeps happening.
>
> Has anyone seen this behavior before, and can you recommend a solution?
>
> Thanks,
>
> Tom
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread