All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.5.49: Severe PIIX4/ATA filesystem corruption
@ 2002-11-26 21:07 H. Peter Anvin
  2002-11-27  0:32 ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: H. Peter Anvin @ 2002-11-26 21:07 UTC (permalink / raw)
  To: linux-kernel

So, I finally braved it and tried running 2.5.49 on my workstation to
test out my RAID-6 patches.  There were no patches outside the md
area, and the ordinary filesystems aren't on md drives.

The two SCSI drives (SymBIOS controller) work just fine, but I have
gotten repeated, severe data corruption on the one ATA drive in the
system after only a few hours of operation.

Just thought I'd warn people...

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.49: Severe PIIX4/ATA filesystem corruption
  2002-11-27  0:32 ` Alan Cox
@ 2002-11-27  0:13   ` H. Peter Anvin
  2002-11-27  1:03     ` Alan Cox
  0 siblings, 1 reply; 6+ messages in thread
From: H. Peter Anvin @ 2002-11-27  0:13 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Alan Cox wrote:
> On Tue, 2002-11-26 at 21:07, H. Peter Anvin wrote:
> 
>>So, I finally braved it and tried running 2.5.49 on my workstation to
>>test out my RAID-6 patches.  There were no patches outside the md
>>area, and the ordinary filesystems aren't on md drives.
>>
>>The two SCSI drives (SymBIOS controller) work just fine, but I have
>>gotten repeated, severe data corruption on the one ATA drive in the
>>system after only a few hours of operation.
> 
> 
> If you mash the innards of the page cache you'll get corruption
> everywhere, its one of the charms of testing out that area of the code
> on Linux. You might want to debug using 2.5.49 user mode linux rather
> than on raw disks. Its so much easier to use "cp" to generate a
> replacement root_fs 8)
> 

Yes, that's true.  However, the heavily used two SCSI disks saw no 
corruption whatsoever, whereas the single, lightly used ATA disk saw 
heavy corruption; if it was due to experimental unrelated code one would 
have expected corruption everywhere.  This does not mean that it is not 
my fault (as far as UML is concerned, I tried building it quite a few 
times before giving up), but given the severity of the corruption I was 
seeing I thought I'd raise a red flag.

	-hpa



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.49: Severe PIIX4/ATA filesystem corruption
  2002-11-26 21:07 2.5.49: Severe PIIX4/ATA filesystem corruption H. Peter Anvin
@ 2002-11-27  0:32 ` Alan Cox
  2002-11-27  0:13   ` H. Peter Anvin
  0 siblings, 1 reply; 6+ messages in thread
From: Alan Cox @ 2002-11-27  0:32 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List

On Tue, 2002-11-26 at 21:07, H. Peter Anvin wrote:
> So, I finally braved it and tried running 2.5.49 on my workstation to
> test out my RAID-6 patches.  There were no patches outside the md
> area, and the ordinary filesystems aren't on md drives.
> 
> The two SCSI drives (SymBIOS controller) work just fine, but I have
> gotten repeated, severe data corruption on the one ATA drive in the
> system after only a few hours of operation.

If you mash the innards of the page cache you'll get corruption
everywhere, its one of the charms of testing out that area of the code
on Linux. You might want to debug using 2.5.49 user mode linux rather
than on raw disks. Its so much easier to use "cp" to generate a
replacement root_fs 8)


Alan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.49: Severe PIIX4/ATA filesystem corruption
  2002-11-27  1:03     ` Alan Cox
@ 2002-11-27  0:36       ` H. Peter Anvin
  2002-12-17 20:41       ` H. Peter Anvin
  1 sibling, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2002-11-27  0:36 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

Alan Cox wrote:
> 
> The base 2.5.47/8/9 Linus tree PIIX code has had no corruption reports
> (except someone whose box failed memtest86) and its about the most
> tested IDE controller.
> 
> I would be interested to know what happens if you boot a base 2.5.49
> without raid6 adulteration and stress it on your hw there, just to be
> sure.
 >

I will try it once the system gets put back together, which will be in 
about two weeks (we're moving, and today was computer teardown day.) 
All I have ATM is my laptop, which I won't be running experimental stuff 
on :-/

	-hpa



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.49: Severe PIIX4/ATA filesystem corruption
  2002-11-27  0:13   ` H. Peter Anvin
@ 2002-11-27  1:03     ` Alan Cox
  2002-11-27  0:36       ` H. Peter Anvin
  2002-12-17 20:41       ` H. Peter Anvin
  0 siblings, 2 replies; 6+ messages in thread
From: Alan Cox @ 2002-11-27  1:03 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List

On Wed, 2002-11-27 at 00:13, H. Peter Anvin wrote:
> Yes, that's true.  However, the heavily used two SCSI disks saw no 
> corruption whatsoever, whereas the single, lightly used ATA disk saw 
> heavy corruption; if it was due to experimental unrelated code one would 
> have expected corruption everywhere.  This does not mean that it is not 

IDE does handle the I/O quite differently so Im not sure about that.
Accidentally fiddling with an in flight request tends to do horrible
things on IDE but not on scsi for example.

The base 2.5.47/8/9 Linus tree PIIX code has had no corruption reports
(except someone whose box failed memtest86) and its about the most
tested IDE controller.

I would be interested to know what happens if you boot a base 2.5.49
without raid6 adulteration and stress it on your hw there, just to be
sure.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.5.49: Severe PIIX4/ATA filesystem corruption
  2002-11-27  1:03     ` Alan Cox
  2002-11-27  0:36       ` H. Peter Anvin
@ 2002-12-17 20:41       ` H. Peter Anvin
  1 sibling, 0 replies; 6+ messages in thread
From: H. Peter Anvin @ 2002-12-17 20:41 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <1038359021.3267.110.camel@irongate.swansea.linux.org.uk>
By author:    Alan Cox <alan@lxorguk.ukuu.org.uk>
In newsgroup: linux.dev.kernel
> 
> I would be interested to know what happens if you boot a base 2.5.49
> without raid6 adulteration and stress it on your hw there, just to be
> sure.
> 

Well, I finally got the system up and running again, after moving, and
ran it without loading any of the md modules (thus nothing modified by
the raid6 code.)  Leaving it running overnight at the shell prompt but
cron jobs running -- including the one that backs up the SCSI drives
onto the IDE drive -- left me with tons of ext3fs error messages in
the morning on the IDE drive in question.

This is unfortunately all the information I have right at the moment.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2002-12-17 20:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-11-26 21:07 2.5.49: Severe PIIX4/ATA filesystem corruption H. Peter Anvin
2002-11-27  0:32 ` Alan Cox
2002-11-27  0:13   ` H. Peter Anvin
2002-11-27  1:03     ` Alan Cox
2002-11-27  0:36       ` H. Peter Anvin
2002-12-17 20:41       ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.