public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Kernel 2.4.18 + ext3 = filesystem corruption
@ 2002-05-10  1:56 Silvan
  2002-05-10  3:14 ` Andrew Morton
  0 siblings, 1 reply; 2+ messages in thread
From: Silvan @ 2002-05-10  1:56 UTC (permalink / raw)
  To: linux-kernel

I posted this to the newsgroup linux.kernel before I realized it was a one-way 
gateway.  I'm finally getting around to forwarding this to the list proper.

I've just heard from a fellow in New Zealand running a completely different 
distro (Debian Woody) with a completely different motherboard (some Abit, and 
he's running an SMP kernel, so I assume he has an SMP box), yet experiencing 
the same 2.4.18+ext3 corruption relationship.  I've sent him a message 
offering to compare hardware, but have not heard back yet.  If I find a 
common thread, I'll be sure to let you know.  Thus far I don't see anything 
we share, but I suspect that if I _do_ find common hardware that will narrow 
your focus in addressing this bug.  My hearing about his plight has prompted 
me to make a further effort to bring this to your attention, as it 
strengthens my opinion that this is a bonafide bug at work.

The following is the post I made to the newsgroup:

,--------------- Forwarded message (begin)

 Subject: Kernel 2.4.18 + ext3 = filesystem corruption
 From: Silvan <silvan@windows-sucks.com>
 Date: Fri, 26 Apr 2002 11:08:10 -0400

 I had a filesystem explosion (across the board corruption on all ext3 
 partitions, brought to my attention by a rather nasty series of EXT3_fs 
 errors and an immediate crash) about a month back.  I lost some data, and 
 there were numerous errors that had to be repaired.  I realized that I 
 couldn't ever remember seeing a fsck since moving to ext3, so I 
 experimented with tune2fs and dumpe2fs.  All of the partitions were set 
 with a random, large negative number for the maximum mount count.  I have 
 no idea whether that was true before the crash or not.
 
 I began to eliminate suspects.  I checked RAM, running memtest 86 for 24 
 hours with no errors.  I reverted to kernel 2.4.16, removed my journals and 
 mounted ext2, switched to conservative settings all around (lower PCI bus 
 speed, conservative hdparm options, etc.)
 
 I've been gradually bringing things back to their previous state, fscking my 
 partitions regularly (daily at first, then gradually moving to a 5 mounts or 
 7 days interval) as I've gone.  I had no filesystem errors of any kind 
 since the last disaster, and then I decided to move back to 2.4.18.  No 
 problems for a few days, so I installed new journals and moved back to 
 ext3.
 
 Within four days, I had another filesystem explosion, though it was less 
 severe than the last, affecting only hde6 and hde9.  Files weren't there, 
 entire chunks of directories were missing, and so on, and everything was 
 pretty well hosed in that state.  Upon recovery, while fscking during the 
 subsequent boot, there were numerous, numerous errors in the filesystem, 
 and I've lost more data.
 
 Both crashes were seemingly spontaneous, triggered by everyday activities, 
 probably brought on when I finally accessed a file that had a hole in it 
 and brought the whole house of cards tumbling down.  I have no record of 
 anything that happened, as nothing useful made its way into any of my logs.
 
 I'm running Mandrake 8.1 with a generic kernel 2.4.18.  No patches.  2.4.16 
 and 2.4.18 were compiled with the same config, and I don't think there is 
 anything noteworthy there.  Everything works perfectly, except that I 
 sometimes experience video display corruption when switching virtual 
 terminals.  More a problem with 2.4.16, but it happens with both kernels.  
 (A separate issue, and not the source of my concern.  I'll take video 
 corruption over data loss any day.)
 
 My hardware:
 
 AMD K7-1000 on ASUS A7V (VIA Apollo KT133a chipset, integrated Promise 
 ATA-100 controller), 256 MB RAM, Linksys 10/100E NIC, USR PCI Performance 
 Pro modem, SB PCI 128, Riva TNT2 AGP video (running at 4X in BIOS and in X), 
 CREATIVE CD-RW RW8439E, CD-950E/TKU, Maxtor 94610H6, generic PS/2 mouse, 
 generic FD, and a 104-key keyboard.
 
 Kernel boots (LILO) with:  append=" devfs=nomount hda=ide-scsi 
 mem=nopentium"
 
 Partitions are all ext3.
 
 I can think of nothing further to add.  I've done my best to isolate this, 
 but I haven't been able to repeat this in a controlled way so that I can 
 attribute a precise event as triggering the corruption.  It seems that the 
 corruption is on-going over time while 2.4.18 is running, and eventually I 
 stumble into a hole.  I personally suspect the journaling code, but that's 
 speculation borne of almost pure ignorance.
 
`--------------- Forwarded message (end)

-- 
Michael McIntyre  zone 6b in SW VA
Silvan Pagan
umount /mnt/windows;mke2fs /dev/hde1;tune2fs -j /dev/hde1
www.geocities.com/Paris/Rue/5407/index.html

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Kernel 2.4.18 + ext3 = filesystem corruption
  2002-05-10  1:56 Kernel 2.4.18 + ext3 = filesystem corruption Silvan
@ 2002-05-10  3:14 ` Andrew Morton
  0 siblings, 0 replies; 2+ messages in thread
From: Andrew Morton @ 2002-05-10  3:14 UTC (permalink / raw)
  To: Silvan; +Cc: linux-kernel

Interesting domain name.

Silvan wrote:
> 
> ...
>  I had a filesystem explosion (across the board corruption on all ext3
>  partitions, brought to my attention by a rather nasty series of EXT3_fs
>  errors and an immediate crash) about a month back.
> ...

I've just re-reviewed the 2.4.16 -> 2.4.18 diffs.  There's really
nothing there which could explain this.  We have:

- lots of s/bread/sb_bread/etc.  Which is rather unfortunate because
  it complicates any attempt to back out to 2.4.16's ext3.

- A bug fix for locking journal buffers (the infamous "request_list
  destroyed" bug)

- Some error-path-only code which remounts the fs readonly rather than taking
  down the machine when the unexpected happens.

> 
>  My hardware:
> 
>  AMD K7-1000 on ASUS A7V (VIA Apollo KT133a chipset, integrated Promise
>  ATA-100 controller), 256 MB RAM, Linksys 10/100E NIC, USR PCI Performance
>  Pro modem, SB PCI 128, Riva TNT2 AGP video (running at 4X in BIOS and in X),
>  CREATIVE CD-RW RW8439E, CD-950E/TKU, Maxtor 94610H6, generic PS/2 mouse,
>  generic FD, and a 104-key keyboard.

I'd be suspecting this, frankly.   Might be an IDE failure.

-

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2002-05-10  3:11 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-05-10  1:56 Kernel 2.4.18 + ext3 = filesystem corruption Silvan
2002-05-10  3:14 ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox