* Problems with RAID1 using SATA disks.
@ 2004-05-12 16:50 Gustavo Franco
2004-05-12 17:09 ` Christoph Hellwig
[not found] ` <1084381339.26186.5.camel@localhost>
0 siblings, 2 replies; 5+ messages in thread
From: Gustavo Franco @ 2004-05-12 16:50 UTC (permalink / raw)
To: linux-raid
Hello list,
[ Send your reply Cc: to me, i'm not subscribed. ]
I've already reported[0] my problem on Apr 12, with few details on
lkml.Since then, only Steve Lord (xfs) have replied to me writing that
Christoph was looking into it.I'll explain my problem here again, since
with my new tests i can see clearly that it isn't a xfs problem.I just don't
known if it's a hardware, software raid or libata problem.
I've 4 SATA drives, using initially two arrays, a mirror for the root
partition
that is working without any problem and a linear one to store the backup.
My TODO is establish a new linear one and put a mirror on top of both
linears, but i haven't assembled it yet, i'm having problems when i put
heavy i/o on the linear array cited before.
I've tested the setup described above using both 2.6.5 and 2.6.6 kernels,
you can read the kernel configuration, mdadm output, lspci, dmesg and the
df+mount output on my post to lkml[0].Note that since then, i've migrated
the contents from xfs to ext3, with the same problem after, but obviously
it now shows on the call trace, functions related to ext3.
Today, i've migrated the linear array content (ext3) to a external
partition (xfs),
but on a SATA disk too, but it's just a workaround and a test to check
if it's a
software raid problem (or any fail on the configuration) or a libata bug.
Am i missing something, or is it a strange bug?
[0] = http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.1/0706.html
Thanks in advance,
Gustavo Franco
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Problems with RAID1 using SATA disks.
2004-05-12 16:50 Problems with RAID1 using SATA disks Gustavo Franco
@ 2004-05-12 17:09 ` Christoph Hellwig
[not found] ` <1084381339.26186.5.camel@localhost>
1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2004-05-12 17:09 UTC (permalink / raw)
To: Gustavo Franco; +Cc: linux-raid, linux-xfs
On Wed, May 12, 2004 at 01:50:29PM -0300, Gustavo Franco wrote:
> I've already reported[0] my problem on Apr 12, with few details on
> lkml.Since then, only Steve Lord (xfs) have replied to me writing that
> Christoph was looking into it.I'll explain my problem here again, since
> with my new tests i can see clearly that it isn't a xfs problem.I just don't
> known if it's a hardware, software raid or libata problem.
The thing at [0] certainly is not a raid problem. Most likely it's a
XFS problem although there's a small but non-zero chance that either
the VFS or nfsd are at fault. I currently have a bunch of more important
issue on my plate, but for now the patch at:
http://oss.sgi.com/bugzilla/attachment.cgi?id=108&action=view
should fix it at the cost of slower deletions.
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <1084381339.26186.5.camel@localhost>]
* Re: Problems with RAID1 using SATA disks.
[not found] ` <1084381339.26186.5.camel@localhost>
@ 2004-05-12 23:51 ` Gustavo Franco
2004-05-13 2:31 ` John Lange
0 siblings, 1 reply; 5+ messages in thread
From: Gustavo Franco @ 2004-05-12 23:51 UTC (permalink / raw)
To: linux-raid
John Lange wrote:
>On Wed, 2004-05-12 at 11:50, Gustavo Franco wrote:
>
>
>>linears, but i haven't assembled it yet, i'm having problems when i put
>>heavy i/o on the linear array cited before.
>>
>>
>
>You don't say what the problems are. Can you provide more details?
>
>Is it ONLY on the linear array?
>
>
Hi,
Sorry i've not clarified it on my post to linux-raid, only to
lkml[0].I've seen two oops
and the kernel BUG (posted to lkml), and already comented here by
Christoph, only
when i run the rsync clients against the rsync server running on this
machine storing
files on a linear array. I can try to reproduce it again to post the
oops (one using xfs
and the later with ext3) here, but i was hoping that someone have
already hit that.
By the way...Christoph can you give more details about that patch? Was
it merged
on some kernel already released, i haven't checked it. Isn't strange
that a similar
problem can be seen using ext3? FYI, at the moment i'm running on a
partition without
raid linear, only xfs and in two days i'll report if the heavy i/o load
breaks it again or not.
It seems that i've reached more than one bug through this process.
<http://marc.theaimsgroup.com/?l=linux-raid&m=108438093424360&w=2#-0>[0] = http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.1/0706.html
Thanks again,
Gustavo Franco
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: Problems with RAID1 using SATA disks.
2004-05-12 23:51 ` Gustavo Franco
@ 2004-05-13 2:31 ` John Lange
2004-05-14 0:37 ` Gustavo Franco
0 siblings, 1 reply; 5+ messages in thread
From: John Lange @ 2004-05-13 2:31 UTC (permalink / raw)
To: Gustavo Franco; +Cc: LinuxRaid
My original hunch was that you have a hardware problem of some kind. You
mentioned that you had a "crash" of some kind related to hardware before
and this further reinforces my feeling that its a hardware failure.
Your recent tests with dd seem to confirm this. Now its a process of
elimination. The easiest thing to try first is a memory test so put
memtest on a bootable CD and try that. I don't think its a RAM problem
because the times I've had bad RAM it causes a kernel panic, not a
hard-lock.
If your RAM checks out I'd remove the RAID card and try the drives
without the card. I don't suspect the drives themselves because you said
it locked up on all drives.
If you still get hard locks during any of these tests then it could be
the Motherboard or the CPU. Could the CPU overheating? The one other
thing that comes to mind is perhaps your power supply is not strong
enough to power everything? And finally, its a long shot but it could be
a bad network or video card. Just keep swapping things until the problem
goes away.
John
On Wed, 2004-05-12 at 18:51, Gustavo Franco wrote:
> John Lange wrote:
>
> >On Wed, 2004-05-12 at 11:50, Gustavo Franco wrote:
> >
> >
> >>linears, but i haven't assembled it yet, i'm having problems when i put
> >>heavy i/o on the linear array cited before.
> >>
> >>
> >
> >You don't say what the problems are. Can you provide more details?
> >
> >Is it ONLY on the linear array?
> >
> >
> Hi,
>
> Sorry i've not clarified it on my post to linux-raid, only to
> lkml[0].I've seen two oops
> and the kernel BUG (posted to lkml), and already comented here by
> Christoph, only
> when i run the rsync clients against the rsync server running on this
> machine storing
> files on a linear array. I can try to reproduce it again to post the
> oops (one using xfs
> and the later with ext3) here, but i was hoping that someone have
> already hit that.
>
> By the way...Christoph can you give more details about that patch? Was
> it merged
> on some kernel already released, i haven't checked it. Isn't strange
> that a similar
> problem can be seen using ext3? FYI, at the moment i'm running on a
> partition without
> raid linear, only xfs and in two days i'll report if the heavy i/o load
> breaks it again or not.
> It seems that i've reached more than one bug through this process.
>
> <http://marc.theaimsgroup.com/?l=linux-raid&m=108438093424360&w=2#-0>[0] = http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.1/0706.html
>
> Thanks again,
> Gustavo Franco
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
John Lange
BigHostBox.com
(204) 885 0872
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Problems with RAID1 using SATA disks.
2004-05-13 2:31 ` John Lange
@ 2004-05-14 0:37 ` Gustavo Franco
0 siblings, 0 replies; 5+ messages in thread
From: Gustavo Franco @ 2004-05-14 0:37 UTC (permalink / raw)
To: LinuxRaid
John Lange wrote:
>My original hunch was that you have a hardware problem of some kind. You
>mentioned that you had a "crash" of some kind related to hardware before
>and this further reinforces my feeling that its a hardware failure.
>
>Your recent tests with dd seem to confirm this. Now its a process of
>elimination. The easiest thing to try first is a memory test so put
>memtest on a bootable CD and try that. I don't think its a RAM problem
>because the times I've had bad RAM it causes a kernel panic, not a
>hard-lock.
>
>If your RAM checks out I'd remove the RAID card and try the drives
>without the card. I don't suspect the drives themselves because you said
>it locked up on all drives.
>
>If you still get hard locks during any of these tests then it could be
>the Motherboard or the CPU. Could the CPU overheating? The one other
>thing that comes to mind is perhaps your power supply is not strong
>enough to power everything? And finally, its a long shot but it could be
>a bad network or video card. Just keep swapping things until the problem
>goes away.
>
>
>
John, thank you for all your sugestions.I've already done memtest run
through many hours,
and a new test today that seems to be the end of my posts here. :)
I've tested the same process against a partition "released" from that
linear array and the machine
still freezes.I can't say if it was a BUG(), a oops, or anything like
that because i can't go to the
data center check today.I'll look into the patch described by Chrystoph,
because it's a random
and strange hardware failure (maybe the controller) or a libata bug and
not only a xfs bug (read
my previous post).I'll try get this machine back to the lab to do all
the tests necessary and report
to lkml if it isn't a hardware failure.
Thank you,
Gustavo Franco
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2004-05-14 0:37 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-12 16:50 Problems with RAID1 using SATA disks Gustavo Franco
2004-05-12 17:09 ` Christoph Hellwig
[not found] ` <1084381339.26186.5.camel@localhost>
2004-05-12 23:51 ` Gustavo Franco
2004-05-13 2:31 ` John Lange
2004-05-14 0:37 ` Gustavo Franco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).