public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* mkfs.ext2 triggerd RAM corruption
@ 2007-05-04 14:59 Bernd Schubert
  2007-05-04 18:49 ` Theodore Tso
  2007-05-04 20:39 ` Jan-Benedict Glaw
  0 siblings, 2 replies; 10+ messages in thread
From: Bernd Schubert @ 2007-05-04 14:59 UTC (permalink / raw)
  To: linux-kernel

Hi,

I'm presently rather puzzled, if this is really a kernel bug, its a big bug. 

Summary: The system ramdisk (initrd) gets corrupted while running mkfs.ext2 on 
a local sata disk partition.

Reproduced on kernel versions: vanilla 2.6.16 - 2.6.20 (<2.6.16 doesn't run on 
any of the systems I can do tests with).
Please note: I could reproduce this on serveral systems, all of them use ECC 
memory and the memory of most of them the memory is monitored using EDAC. 

Details:

1.) Our systems boot from an initrd, all system services are running from the 
initrd/ramdisk.

2.) While setting up a lustre meta data storage server, lustre runs 
mkfs.ext2 -j -b 4096 -F -i 4096 -J size=400 -I 512 /dev/sda4
(Please note, I first observed this while using a lustre patched kernel, but I 
could reproduce this with vanilla kernels).


While this mkfs.ext2 command was running, suddenly running commands such as 
ps, top, ls, etc. resulted in segmentation faults.

To see whats going on, I copied the entire / (so the initrd) into a tmpfs 
root, chrooted into it, also bind mounted the main / into this chroot and 
compared several times /bin of chroot/bin and the bind-mounted /bin while the 
mkfs.ext2 command was running.

beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/sleep and /oldroot/bin/sleep differ
beo-05:/# diff -r /bin /oldroot/bin/
Binary files /bin/bsd-csh and /oldroot/bin/bsd-csh differ
Binary files /bin/cat and /oldroot/bin/cat differ
...

Also tested different schedulers, at least happens with deadline and 
anticipatory.

The corruption does NOT happen on running the mkfs command on /dev/sda1, but 
happens with sda2, sda3 and sda3. Also doesn't happen with extended 
partitions of sda1.

Any idea whats going on?


Thanks,
Bernd


-- 
Bernd Schubert
Q-Leap Networks GmbH

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-05-07 18:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-04 14:59 mkfs.ext2 triggerd RAM corruption Bernd Schubert
2007-05-04 18:49 ` Theodore Tso
2007-05-05  1:36   ` Bernd Schubert
2007-05-05 18:57     ` Theodore Tso
2007-05-05 19:12       ` Jan Engelhardt
2007-05-05 22:06         ` Bernd Schubert
2007-05-05 23:09       ` Bernd Schubert
2007-05-04 20:39 ` Jan-Benedict Glaw
2007-05-05  1:38   ` Bernd Schubert
2007-05-07 18:42   ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox