All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [linux-lvm] Random file system errors
@ 2009-04-29  6:19 Gaute Lund
  2009-06-07 11:44 ` Gaute Lund
  0 siblings, 1 reply; 13+ messages in thread
From: Gaute Lund @ 2009-04-29  6:19 UTC (permalink / raw)
  To: LVM general discussion and development

Thanks, and also to others who gave feedback. The approach with md5summing devices came from another source too, and I'll try a systematic approach as soon as time allows.

-gaute

----- Opprinnelig melding -----
Fra: f-lvm@media.mit.edu
Sendt: 29. april 2009 05:52
Til: linux-lvm@redhat.com
Emne: [linux-lvm] Random file system errors

Btw, one way to proceed on the test-your-hardware angle without
yanking disks (or even opening the case) and possibly turning this
into a heisenbug if it really -is- something like cabling would be
to do something like this:

   dd if=/dev/hda bs=1M count=1000 | md5sum

for each of hdX and sdX or whatever describes the raw physical
devices.  Do this with the LVM -completely deactivated- so you
know that absolutely nothing can be writing to the disks; you
should probably boot from a LiveCD to ensure this.

Run each test at least twice for the same disk and record the results;
I'll bet that at least one of your disks will return inconsistent
data; perhaps all disks on one IDE channel or one SATA channel will,
or perhaps every single disk will if you've got RAM, PSU, or
bridge-chip troubles, etc.

If you're seeing a very low frequency of bit flips, raise the count on
the dd to something larger, like maybe 10000 instead or whatever;
that'll slow down the test but raise your confidence in it.

Either way, try it on a USB device as well.  Very different hardware
and software paths.  Might be illuminating.

Just make -damned- sure that your dd is using "if" and not "of"!

If you -can't- make it fail, you might get fancier and try something
that forces lots of head seeking (since that will consume more power
and maybe stress your PSU), or try running all the disk tests in
parallel (since that will chew up more CPU) or perhaps run something
that runs your CPU flat out in one process while doing the dd in
another.

If you still can't make it fail, try activating the LVM -from a LiveCD- 
(e.g., -not- booted from it) and then repeat the tests on the LV's.
If it fails on LV's that have no mounted filesystems and aren't being
touched, but works on the raw devices, -then- you're starting to point
a finger at LVM...  (And if you have to mount a FS to start getting
failures, only then might we start thinking about write barriers or
whatever...)

If everything you do doesn't make it fail, but it fails when you're
booted and running from that LVM, I'd start to suspect LVM and/or
kernel issues in the actual software you're running.  But I'll bet
that you'll see a failure before that point.

And report back; it'd be good to close the loop on this if it's proven
-not- to be an LVM issue.

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/

^ permalink raw reply	[flat|nested] 13+ messages in thread
* [linux-lvm] Random file system errors
@ 2009-04-28  1:52 Gaute Lund
  2009-04-28  3:32 ` f-lvm
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Gaute Lund @ 2009-04-28  1:52 UTC (permalink / raw)
  To: linux-lvm

I have searched the web and the mailing list without finding anything
similar to this.

At home I have an LVM setup. Reading data gives random errors. I only
recently discovered it's an LVM issue. I think.

The issue: If I md5sum largeish files, or test archives, I sometimes get
errors or randomly different md5sums. Like now, I have 11 folders, all with
rar files in parts: some 300 15MB pieces in 6 folders/sets, totaling 4,2GB,
and 560 50MB pieces in 5 folders/sets, totaling 23G.

OK, so I "rar t" all of these 5 times over. Errors pop up randomly, 52 times
in the 50MB pieces, 10 times in the 15MB pieces. That's about 1 error for
every 2,1GB of data read. Md5suming multiple files gives about the same
error rate.

If I run repeated test on a rar set small enough to fit in cache mem, I get
errors, but they are indentical with each run. 

Is it really an lvm problem? Well, I have created new LVs and use different
filesystems, ext3, xfs, jfs - they're all the same. If I create an md on
some other disks, and put a filesystem on it, without LVM, no problems.

I can't find any other errors, in any logs or dmesg. The errors weren't
there to begin with, they came at one point and got worse. It took a while
before I realized it was a generic disk problem, and for a period I kind of
gave up on it. So it's been there for ... maybe six months?

The VG consist of two software RAID 5 md's, one consisting of four 200GB
IDEs, one of five 500GB SATAs, yielding av VG totaling 2,37TB. Other
hardware is 4GB memory and a Core 2 Duo 6600 CPU.

Machine runs Ubuntu 8.10 with kernel 2.6.27-11, and
  LVM version:     2.02.39 (2008-06-27)
  Library version: 1.02.27 (2008-06-25)
  Driver version:  4.14.0

But the VG was originally created long ago, on LVM1 even.

Well, I guess that's it. Any other information that could be helpful? Any
way I could debug this?

Best regards
Gaute Lund

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-06-07 15:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-29  6:19 [linux-lvm] Random file system errors Gaute Lund
2009-06-07 11:44 ` Gaute Lund
2009-06-07 15:16   ` Clyde E. Kunkel
  -- strict thread matches above, loose matches on Subject: below --
2009-04-28  1:52 Gaute Lund
2009-04-28  3:32 ` f-lvm
2009-04-28  3:50 ` Steer, Geoff
2009-04-28 14:41 ` Clyde E. Kunkel
2009-04-28 17:00   ` Greg Freemyer
2009-04-29  3:52     ` f-lvm
2009-04-29 19:02       ` Clyde E. Kunkel
2009-04-30 23:33       ` Clyde E. Kunkel
2009-05-01  1:28         ` f-lvm
2009-04-30 16:17 ` Philipp Schmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.