public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* File Corruption in Kernel 2.4.18
@ 2002-07-18  2:00 J. Hart
  2002-07-18  3:11 ` Kelledin
  2002-07-18  7:21 ` Ville Herva
  0 siblings, 2 replies; 9+ messages in thread
From: J. Hart @ 2002-07-18  2:00 UTC (permalink / raw)
  To: linux-kernel


     A large directory tree (70652 files, 7.6G) is copied recursively to an
empty destination directory using the following commands :

     mkdir aminet1/
     cp -a aminet aminet1/

     The source and destination directories are then compared using
the following commands:

     diff -r aminet aminet1/aminet > difflist

     A few of the files at the copy destination, typically three or four, will
usually be corrupt while the source files will be correct.  Occasionally the
copy will be done without any corrupt files at the destination.  The
mem=nopentium option appears to have no effect on this.  An overnight test using
the memtest86 utility shows no memory errors.  The corruption in each file
occurs in precise 4096 byte blocks.  An overnight test using the memtest86
utility shows no memory errors.  The corruption in each file occurs in precise
4096 byte blocks.  System logs show no evidence of any trouble, and no kernel
panics, warning messages or crashes are observed.  If there is any other user
activity while the copy is running, the system will frequently lock up requiring
a hard reset and reboot.  This forces a file system check due to the lack of a
clean unmount.  System logs also show no evidence of any trouble after the
lockup, and no kernel panics or other messages have been observed.

     If a tar file is made of the source directory and then extracted, and the
resultant extracted directory compared with the original, similar effects are
observed.

     Are there any kernel boot or build parameters which could be used
to give additional diagnostics ?

motherboard   : ASYS-A7V
Linux version : Slackware 8
Kernel        : 2.4.18
hard disk     : ATA100 IBM-DTLA-307045 45gb
hd controller : Promise Technology, Inc. 20265
cpu           : 900mhz AMD Athlon

^ permalink raw reply	[flat|nested] 9+ messages in thread
* Re: File Corruption in Kernel 2.4.18
@ 2002-07-18  4:16 Kelledin
  0 siblings, 0 replies; 9+ messages in thread
From: Kelledin @ 2002-07-18  4:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: jhart

Ok, the test:

I chose directory /home/kelledin/gnutella.  It contains
approximately 10GB of files, ranging in size from ~5MB to
~700MB.  Most are ~600-650MB.

System specs are here:
http://www.anandtech.com/mysystemrig.html?rigid=5092

Only out-of-date info on that page is the kernel--I'm running
2.4.18+XFS-1.0.2+RML-preempt, compiled with gcc-2.95.3.  Kernel
was booted with "acpi=no-idle mem=nopentium" options.

While I was compiling jdk-1.4.0, I did the following:

[ kelledin@valhalla ~ ] # mkdir gnutella2
[ kelledin@valhalla ~ ] # cp -a gnutella gnutella2
[ kelledin@valhalla ~ ] # for FNAME in gnutella/*; do cmp
"$FNAME" "gnutella2/$FNAME"; done

The "cp -a" operation took 19 minutes, during which the system
load reached approximately 4.0 and the CPU temperature held at
54 C.  Ambient case temperature held at 26 C.  Swap usage did
not change.  System was somewhat sluggish but responsive enough
to play an mp3 and allow me to open terminal windows.  j2sdk
compile is still apparently going strong.

The comparison check...well, it finished while I was away
 getting a snack.  It printed no output, which means the check
 probably completed successfully.  Maybe I'll run some md5 sums
 later, just to be sure.

System load stayed at about 3.0, and temperatures remained
approximately the same as during the copy operation.

The relevant software:

kernel...well, you know.
glibc-2.2.5+linuxthreads+LSB+blowfish+math patches
libacl-2.0.11
libattr-2.0.8
bash-2.05a (Just for you, Hell.Surfers, just for you ;)
fileutils 4.1.8 with ACL patches and a Kelledin special. 
 Tarball can be found at:

ftp://skarpsey.dyndns.org/fileutils-4.1.8acl-kelledin.tar.bz2

Things that might be causing the corruption in our friend
J.Hart's case:

Buggy chipset (damn VIA!!!)
Faulty CPU (heat damage, chipped core?)
Faulty hard drive (hey, it's a DeathStar.)
Faulty IDE controller (if using offboard IDE)
Flaky cable (80-conductor ATA cable doesn't like being folded,
stacked, crumpled, etc., not even slightly)
Buggy IDE driver in the kernel
Buggy filesystem driver
Buggy fileutils
Buggy VM

I can't really test any of the possible software problems,
because I'm all SCSI, all XFS, bleeding-edge fileutils, and
didn't have any really significant swapping going on.  There's a
production server I could possibly test it on, but...well...it's
a production machine.  Maybe I'll repeat the test a few times
later.

On Wednesday 17 July 2002 10:11 pm, Kelledin wrote:
> This could possibly be a problem with your hard drive.
> Judging from the model number, you have a 45GB IBM DeskStar
> 75GXP, one of the first IBM drives to earn the nickname
> "DeathStar" for its high failure rate.  What does IBM's Drive
> Fitness Test tell you?
>
> I'll see about performing your test tonight; I've got a hefty
> little DivX directory I can throw around as I wait for
> j2sdk-1.4.0 to finish compiling.  Such a test should be
> sufficient...
>
> This could also be a recurrence of ye olde VIA686B PCI+IDE
> issue. IIRC, some VIA686B motherboards that had that flaw were
> effectively unfixable, simply because certain motherboard
> manufacturers spotted the problem before everyone else (even
> VIA?) and tried their own partial kludge fixes for it.  Gotta
> love VIA.

--
Kelledin
"If a server crashes in a server farm and no one pings it, does
it still cost four figures to fix?"

^ permalink raw reply	[flat|nested] 9+ messages in thread
* Re: File Corruption in Kernel 2.4.18
@ 2002-07-23  2:56 J. Hart
  2002-07-23  3:04 ` Thunder from the hill
  0 siblings, 1 reply; 9+ messages in thread
From: J. Hart @ 2002-07-23  2:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: jhart


Here is a further update on the file corruption question :

     I ran the DFT utility which picked up two bad sectors, which I then
repaired.  A rerun of DFT after that gave no further reports of any problems.  I
then tried the directory tree copy (cp -a aminet aminet1) which produced one
corrupted file at the destination.  All the other destination files in the tree
(70652 files, 7.6G) appeared to be correct.

     An additional rerun of the IBM DFT utility after this reported no problems
despite the corrupt copy.

     In order to resolve this issue, my employer is considering the replacement
of my current machine with a new one having the following specifications :


motherboard: Asus P4T AGP Pro/4X
ram        : 1Gb
OS         : Linux 2.4.7-10 i686 unknown
CPU        : Intel(R) Pentium(R) 4 CPU 1800MHz
Gfx        : Matrox Graphics, Inc. MGA G400 AGP
drives     : Seagate 40gb UATA ST340810A (two of these)
controller : Intel PIIX4 Ultra 100 Chipset
           : (Intel Corporation 82801BA IDE U100)
chipset    : Intel Corporation 82850 850 (Tehama) Chipset Host Bridge (MCH)

     Are there any outstanding issues with machines of this new configuration as
there seemed to be with my old machine ?

With Thanks,

     J. Hart

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2002-07-23  3:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-18  2:00 File Corruption in Kernel 2.4.18 J. Hart
2002-07-18  3:11 ` Kelledin
2002-07-18  7:21 ` Ville Herva
2002-07-18  7:47   ` Wilfried Weissmann
2002-07-21  2:52     ` J. Hart
     [not found]     ` <20020718081630.GX1465@niksula.cs.hut.fi>
2002-07-22 10:10       ` Wilfried Weissmann
  -- strict thread matches above, loose matches on Subject: below --
2002-07-18  4:16 Kelledin
2002-07-23  2:56 J. Hart
2002-07-23  3:04 ` Thunder from the hill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox