linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Problems with RAID1 using SATA disks.
@ 2004-05-12 16:50 Gustavo Franco
  2004-05-12 17:09 ` Christoph Hellwig
       [not found] ` <1084381339.26186.5.camel@localhost>
  0 siblings, 2 replies; 5+ messages in thread
From: Gustavo Franco @ 2004-05-12 16:50 UTC (permalink / raw)
  To: linux-raid

Hello list,

[ Send your reply Cc: to me, i'm not subscribed. ]

I've already reported[0] my problem on Apr 12, with few details on
lkml.Since then, only Steve Lord (xfs) have replied to me writing that
Christoph was looking into it.I'll explain my problem here again, since
with my new tests i can see clearly that it isn't a xfs problem.I just don't
known if it's a hardware, software raid or libata problem.

I've 4 SATA drives, using initially two arrays, a mirror for the root 
partition
that is working without any problem and a linear one to store the backup.
My TODO is establish a new linear one and put a mirror on top of both
linears, but i haven't assembled it yet, i'm having problems when i put
heavy i/o on the linear array cited before.

I've tested the setup described above using both 2.6.5 and 2.6.6 kernels,
you can read the kernel configuration, mdadm output, lspci, dmesg and the
df+mount output on my post to lkml[0].Note that since then, i've migrated
the contents from xfs to ext3, with the same problem after, but obviously
it now shows on the call trace, functions related to ext3.

Today, i've migrated the linear array content (ext3) to a external 
partition (xfs),
but on a SATA disk too, but it's just a workaround and a test to check 
if it's a
software raid problem (or any fail on the configuration) or a libata bug.

Am i missing something, or is it a strange bug?

[0] = http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.1/0706.html


Thanks in advance,
Gustavo Franco


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problems with RAID1 using SATA disks.
  2004-05-12 16:50 Problems with RAID1 using SATA disks Gustavo Franco
@ 2004-05-12 17:09 ` Christoph Hellwig
       [not found] ` <1084381339.26186.5.camel@localhost>
  1 sibling, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2004-05-12 17:09 UTC (permalink / raw)
  To: Gustavo Franco; +Cc: linux-raid, linux-xfs

On Wed, May 12, 2004 at 01:50:29PM -0300, Gustavo Franco wrote:
> I've already reported[0] my problem on Apr 12, with few details on
> lkml.Since then, only Steve Lord (xfs) have replied to me writing that
> Christoph was looking into it.I'll explain my problem here again, since
> with my new tests i can see clearly that it isn't a xfs problem.I just don't
> known if it's a hardware, software raid or libata problem.

The thing at [0] certainly is not a raid problem.  Most likely it's a
XFS problem although there's a small but non-zero chance that either
the VFS or nfsd are at fault.  I currently have a bunch of more important
issue on my plate, but for now the patch at:

	http://oss.sgi.com/bugzilla/attachment.cgi?id=108&action=view

should fix it at the cost of slower deletions.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problems with RAID1 using SATA disks.
       [not found] ` <1084381339.26186.5.camel@localhost>
@ 2004-05-12 23:51   ` Gustavo Franco
  2004-05-13  2:31     ` John Lange
  0 siblings, 1 reply; 5+ messages in thread
From: Gustavo Franco @ 2004-05-12 23:51 UTC (permalink / raw)
  To: linux-raid

John Lange wrote:

>On Wed, 2004-05-12 at 11:50, Gustavo Franco wrote:
>  
>
>>linears, but i haven't assembled it yet, i'm having problems when i put
>>heavy i/o on the linear array cited before.
>>    
>>
>
>You don't say what the problems are. Can you provide more details?
>
>Is it ONLY on the linear array?
>  
>
Hi,

Sorry i've not clarified it on my post to linux-raid, only to 
lkml[0].I've seen two oops
and the kernel BUG (posted to lkml), and already comented here by 
Christoph, only
when i run the rsync clients against the rsync server running on this 
machine storing
files on a linear array. I can try to reproduce it again to post the 
oops (one using xfs
and the later with ext3) here, but i was hoping that someone have 
already hit that.

By the way...Christoph can you give more details about that patch? Was 
it merged
on some kernel already released, i haven't checked it. Isn't strange 
that a similar
problem can be seen using ext3? FYI, at the moment i'm running on a 
partition without
raid linear, only xfs and in two days i'll report if the heavy i/o load 
breaks it again or not.
It seems that i've reached more than one bug through this process.

 <http://marc.theaimsgroup.com/?l=linux-raid&m=108438093424360&w=2#-0>[0] = http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.1/0706.html

Thanks again,
Gustavo Franco



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problems with RAID1 using SATA disks.
  2004-05-12 23:51   ` Gustavo Franco
@ 2004-05-13  2:31     ` John Lange
  2004-05-14  0:37       ` Gustavo Franco
  0 siblings, 1 reply; 5+ messages in thread
From: John Lange @ 2004-05-13  2:31 UTC (permalink / raw)
  To: Gustavo Franco; +Cc: LinuxRaid

My original hunch was that you have a hardware problem of some kind. You
mentioned that you had a "crash" of some kind related to hardware before
and this further reinforces my feeling that its a hardware failure.

Your recent tests with dd seem to confirm this. Now its a process of
elimination. The easiest thing to try first is a memory test so put
memtest on a bootable CD and try that. I don't think its a RAM problem
because the times I've had bad RAM it causes a kernel panic, not a
hard-lock.

If your RAM checks out I'd remove the RAID card and try the drives
without the card. I don't suspect the drives themselves because you said
it locked up on all drives.

If you still get hard locks during any of these tests then it could be
the Motherboard or the CPU. Could the CPU overheating? The one other
thing that comes to mind is perhaps your power supply is not strong
enough to power everything? And finally, its a long shot but it could be
a bad network or video card. Just keep swapping things until the problem
goes away.

John

On Wed, 2004-05-12 at 18:51, Gustavo Franco wrote:
> John Lange wrote:
> 
> >On Wed, 2004-05-12 at 11:50, Gustavo Franco wrote:
> >  
> >
> >>linears, but i haven't assembled it yet, i'm having problems when i put
> >>heavy i/o on the linear array cited before.
> >>    
> >>
> >
> >You don't say what the problems are. Can you provide more details?
> >
> >Is it ONLY on the linear array?
> >  
> >
> Hi,
> 
> Sorry i've not clarified it on my post to linux-raid, only to 
> lkml[0].I've seen two oops
> and the kernel BUG (posted to lkml), and already comented here by 
> Christoph, only
> when i run the rsync clients against the rsync server running on this 
> machine storing
> files on a linear array. I can try to reproduce it again to post the 
> oops (one using xfs
> and the later with ext3) here, but i was hoping that someone have 
> already hit that.
> 
> By the way...Christoph can you give more details about that patch? Was 
> it merged
> on some kernel already released, i haven't checked it. Isn't strange 
> that a similar
> problem can be seen using ext3? FYI, at the moment i'm running on a 
> partition without
> raid linear, only xfs and in two days i'll report if the heavy i/o load 
> breaks it again or not.
> It seems that i've reached more than one bug through this process.
> 
>  <http://marc.theaimsgroup.com/?l=linux-raid&m=108438093424360&w=2#-0>[0] = http://www.uwsg.iu.edu/hypermail/linux/kernel/0404.1/0706.html
> 
> Thanks again,
> Gustavo Franco
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
John Lange
BigHostBox.com
(204) 885 0872


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Problems with RAID1 using SATA disks.
  2004-05-13  2:31     ` John Lange
@ 2004-05-14  0:37       ` Gustavo Franco
  0 siblings, 0 replies; 5+ messages in thread
From: Gustavo Franco @ 2004-05-14  0:37 UTC (permalink / raw)
  To: LinuxRaid

John Lange wrote:

>My original hunch was that you have a hardware problem of some kind. You
>mentioned that you had a "crash" of some kind related to hardware before
>and this further reinforces my feeling that its a hardware failure.
>
>Your recent tests with dd seem to confirm this. Now its a process of
>elimination. The easiest thing to try first is a memory test so put
>memtest on a bootable CD and try that. I don't think its a RAM problem
>because the times I've had bad RAM it causes a kernel panic, not a
>hard-lock.
>
>If your RAM checks out I'd remove the RAID card and try the drives
>without the card. I don't suspect the drives themselves because you said
>it locked up on all drives.
>
>If you still get hard locks during any of these tests then it could be
>the Motherboard or the CPU. Could the CPU overheating? The one other
>thing that comes to mind is perhaps your power supply is not strong
>enough to power everything? And finally, its a long shot but it could be
>a bad network or video card. Just keep swapping things until the problem
>goes away.
>
>  
>
John, thank you for all your sugestions.I've already done memtest run 
through many hours,
and a new test today that seems to be the end of my posts here. :)

I've tested the same process against a partition "released" from that 
linear array and the machine
still freezes.I can't say if it was a BUG(), a oops, or anything like 
that because i can't go to the
data center check today.I'll look into the patch described by Chrystoph, 
because it's a random
and strange hardware failure (maybe the controller) or a libata bug and 
not only a xfs bug (read
my previous post).I'll try get this machine back to the lab to do all 
the tests necessary and report
to lkml if it isn't a hardware failure.

Thank you,
Gustavo Franco

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-05-14  0:37 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-05-12 16:50 Problems with RAID1 using SATA disks Gustavo Franco
2004-05-12 17:09 ` Christoph Hellwig
     [not found] ` <1084381339.26186.5.camel@localhost>
2004-05-12 23:51   ` Gustavo Franco
2004-05-13  2:31     ` John Lange
2004-05-14  0:37       ` Gustavo Franco

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).