* [linux-lvm] Random file system errors
@ 2009-04-28 1:52 Gaute Lund
2009-04-28 3:32 ` f-lvm
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Gaute Lund @ 2009-04-28 1:52 UTC (permalink / raw)
To: linux-lvm
I have searched the web and the mailing list without finding anything
similar to this.
At home I have an LVM setup. Reading data gives random errors. I only
recently discovered it's an LVM issue. I think.
The issue: If I md5sum largeish files, or test archives, I sometimes get
errors or randomly different md5sums. Like now, I have 11 folders, all with
rar files in parts: some 300 15MB pieces in 6 folders/sets, totaling 4,2GB,
and 560 50MB pieces in 5 folders/sets, totaling 23G.
OK, so I "rar t" all of these 5 times over. Errors pop up randomly, 52 times
in the 50MB pieces, 10 times in the 15MB pieces. That's about 1 error for
every 2,1GB of data read. Md5suming multiple files gives about the same
error rate.
If I run repeated test on a rar set small enough to fit in cache mem, I get
errors, but they are indentical with each run.
Is it really an lvm problem? Well, I have created new LVs and use different
filesystems, ext3, xfs, jfs - they're all the same. If I create an md on
some other disks, and put a filesystem on it, without LVM, no problems.
I can't find any other errors, in any logs or dmesg. The errors weren't
there to begin with, they came at one point and got worse. It took a while
before I realized it was a generic disk problem, and for a period I kind of
gave up on it. So it's been there for ... maybe six months?
The VG consist of two software RAID 5 md's, one consisting of four 200GB
IDEs, one of five 500GB SATAs, yielding av VG totaling 2,37TB. Other
hardware is 4GB memory and a Core 2 Duo 6600 CPU.
Machine runs Ubuntu 8.10 with kernel 2.6.27-11, and
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
Driver version: 4.14.0
But the VG was originally created long ago, on LVM1 even.
Well, I guess that's it. Any other information that could be helpful? Any
way I could debug this?
Best regards
Gaute Lund
^ permalink raw reply [flat|nested] 13+ messages in thread
* [linux-lvm] Random file system errors
2009-04-28 1:52 Gaute Lund
@ 2009-04-28 3:32 ` f-lvm
2009-04-28 3:50 ` Steer, Geoff
` (2 subsequent siblings)
3 siblings, 0 replies; 13+ messages in thread
From: f-lvm @ 2009-04-28 3:32 UTC (permalink / raw)
To: linux-lvm
I suspect two things: RAM and one of your disk controllers.
Going for the latter first---when you created non-LVM tests, were you
using the same disk? Probably not. Same IDE or SATA channels? Maybe
you only get random errors from one of your IDE channels, or only one
of your SATA channels, or perhaps everything that passes through the
Northbridge, or something like that. You may have to swap disks
around to do fault isolation. The fact that the errors -stick- once
the data's in RAM makes me think that it's getting trashed on the way
in but that otherwise your RAM might be good.
But maybe it's not. I had a bizarre failure once where I thought I
had a network problem, since I detected it when dd'ing 500GB from one
machine to another. Turns out the problem was bad behavior in my RAM,
but ONLY WHEN when the CPU was throttled down! Once I turned off CPU
throttling, the random errors went away. And, of course, memtest86+
never saw it, because it -always- nails the CPU...
(In my case, I saw bit flips via md5sum whether the data was coming
from IDE, SATA, or even a USB stick---and there's very little hardware
in common there. A tight loop md5summing the same file (one which fit
in RAM) got wrong, wrong, right right right right and the "rights"
suddenly started getting spit out much faster---it was at that moment
that I realized the wrong values were probably being produced while
the CPU was throttled. [And putting a 10-second sleep in between each
md5sum got mostly wrong values---but as soon as I started nailing the
CPU in another process, the values being spit out by the loop, even
with the pauses, became correct...]. The dd didn't use enough CPU
time to throttle up, so I saw errors---and at about the same rate as
you, maybe one bit flip every few gig. And I -knew- the data coming
in from the disks -had- to be good because it was a crypto filesystem
---bit errors there would trash entire blocks, and that wasn't
happening.)
^ permalink raw reply [flat|nested] 13+ messages in thread
* RE: [linux-lvm] Random file system errors
2009-04-28 1:52 Gaute Lund
2009-04-28 3:32 ` f-lvm
@ 2009-04-28 3:50 ` Steer, Geoff
2009-04-28 14:41 ` Clyde E. Kunkel
2009-04-30 16:17 ` Philipp Schmidt
3 siblings, 0 replies; 13+ messages in thread
From: Steer, Geoff @ 2009-04-28 3:50 UTC (permalink / raw)
To: LVM general discussion and development
I've a server with a very similar problem to this.
Running a dd on the /var file system always gives a SCSI error at the
same spot. An fsck will fix the filesystem without errors but it will
always get remounted as read only after a short while.
I'd assumed a disk/controller problem but at the raid card firmware and
bios level, no disk errors are logged. Disks have been rescanned for bad
blocks with no errors reported.
It has 6 disks in a RAID 5 array. IBM 3650 with Serveraid 8K controller.
Redhat 5.3 with latest patches.
Regards
Geoff
-----Original Message-----
From: linux-lvm-bounces@redhat.com [mailto:linux-lvm-bounces@redhat.com]
On Behalf Of Gaute Lund
Sent: Tuesday, 28 April 2009 11:52 AM
To: linux-lvm@redhat.com
Subject: [linux-lvm] Random file system errors
I have searched the web and the mailing list without finding anything
similar to this.
At home I have an LVM setup. Reading data gives random errors. I only
recently discovered it's an LVM issue. I think.
The issue: If I md5sum largeish files, or test archives, I sometimes get
errors or randomly different md5sums. Like now, I have 11 folders, all
with
rar files in parts: some 300 15MB pieces in 6 folders/sets, totaling
4,2GB,
and 560 50MB pieces in 5 folders/sets, totaling 23G.
OK, so I "rar t" all of these 5 times over. Errors pop up randomly, 52
times
in the 50MB pieces, 10 times in the 15MB pieces. That's about 1 error
for
every 2,1GB of data read. Md5suming multiple files gives about the same
error rate.
If I run repeated test on a rar set small enough to fit in cache mem, I
get
errors, but they are indentical with each run.
Is it really an lvm problem? Well, I have created new LVs and use
different
filesystems, ext3, xfs, jfs - they're all the same. If I create an md on
some other disks, and put a filesystem on it, without LVM, no problems.
I can't find any other errors, in any logs or dmesg. The errors weren't
there to begin with, they came at one point and got worse. It took a
while
before I realized it was a generic disk problem, and for a period I kind
of
gave up on it. So it's been there for ... maybe six months?
The VG consist of two software RAID 5 md's, one consisting of four 200GB
IDEs, one of five 500GB SATAs, yielding av VG totaling 2,37TB. Other
hardware is 4GB memory and a Core 2 Duo 6600 CPU.
Machine runs Ubuntu 8.10 with kernel 2.6.27-11, and
LVM version: 2.02.39 (2008-06-27)
Library version: 1.02.27 (2008-06-25)
Driver version: 4.14.0
But the VG was originally created long ago, on LVM1 even.
Well, I guess that's it. Any other information that could be helpful?
Any
way I could debug this?
Best regards
Gaute Lund
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
This is an email from Fujitsu Australia Limited, ABN 19 001 011 427. It is confidential to the ordinary user of the email address to which it was addressed and may contain copyright and/or legally privileged information. No one else may read, print, store, copy or forward all or any of it or its attachments. If you receive this email in error, please return to sender. Thank you.
If you do not wish to receive commercial email messages from Fujitsu Australia Limited, please email unsubscribe@au.fujitsu.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
2009-04-28 1:52 Gaute Lund
2009-04-28 3:32 ` f-lvm
2009-04-28 3:50 ` Steer, Geoff
@ 2009-04-28 14:41 ` Clyde E. Kunkel
2009-04-28 17:00 ` Greg Freemyer
2009-04-30 16:17 ` Philipp Schmidt
3 siblings, 1 reply; 13+ messages in thread
From: Clyde E. Kunkel @ 2009-04-28 14:41 UTC (permalink / raw)
To: LVM general discussion and development
On 04/27/2009 09:52 PM, Gaute Lund wrote:
> I have searched the web and the mailing list without finding anything
> similar to this.
>
> At home I have an LVM setup. Reading data gives random errors. I only
> recently discovered it's an LVM issue. I think.
>
> The issue: If I md5sum largeish files, or test archives, I sometimes get
> errors or randomly different md5sums. Like now, I have 11 folders, all with
> rar files in parts: some 300 15MB pieces in 6 folders/sets, totaling 4,2GB,
> and 560 50MB pieces in 5 folders/sets, totaling 23G.
>
> OK, so I "rar t" all of these 5 times over. Errors pop up randomly, 52 times
> in the 50MB pieces, 10 times in the 15MB pieces. That's about 1 error for
> every 2,1GB of data read. Md5suming multiple files gives about the same
> error rate.
>
> If I run repeated test on a rar set small enough to fit in cache mem, I get
> errors, but they are indentical with each run.
>
> Is it really an lvm problem? Well, I have created new LVs and use different
> filesystems, ext3, xfs, jfs - they're all the same. If I create an md on
> some other disks, and put a filesystem on it, without LVM, no problems.
>
> I can't find any other errors, in any logs or dmesg. The errors weren't
> there to begin with, they came at one point and got worse. It took a while
> before I realized it was a generic disk problem, and for a period I kind of
> gave up on it. So it's been there for ... maybe six months?
>
> The VG consist of two software RAID 5 md's, one consisting of four 200GB
> IDEs, one of five 500GB SATAs, yielding av VG totaling 2,37TB. Other
> hardware is 4GB memory and a Core 2 Duo 6600 CPU.
>
> Machine runs Ubuntu 8.10 with kernel 2.6.27-11, and
> LVM version: 2.02.39 (2008-06-27)
> Library version: 1.02.27 (2008-06-25)
> Driver version: 4.14.0
>
> But the VG was originally created long ago, on LVM1 even.
>
> Well, I guess that's it. Any other information that could be helpful? Any
> way I could debug this?
>
> Best regards
> Gaute Lund
>
I am seeing the same thing with large (distros on DVDs) ISO files also.
Running md5sum or sha1sum on the file gives different results each time
and burning the iso gives a dvd that contains files with errors. I ran
memory tests over night and all was good. I turned on smartd checking
and ran disk checks and all is ok and I continue to look for disk errors
on a periodic basis and all is well.
The linux system is Fedora rawhide, but the problem also exists in
Fedora 9 and 10. The files are being downloaded with wget to a Download
directory on my home directory which is an ext3 LV mounted on an ext4
home filesystem. Wgeting to a standard non-LV ext3 parition results in
good isos which demonstrate consistent sha1sums. If I cp the good iso
to the LV Download directory, problems again occur. So far the problem
only manifests with dvd size iso files. CD size iso files are fine.
I first noticed this problem several months ago, but have not bz'd it
since I cannot yet for sure say it is LVM causing the problem. However,
I think at this point I have eliminated wget as the problem but not
ext4. I need to create an ext3 LV for / to test on.
Any guidance on error capturing or any testing features of LVM2 that can
be turned on?
Thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
2009-04-28 14:41 ` Clyde E. Kunkel
@ 2009-04-28 17:00 ` Greg Freemyer
2009-04-29 3:52 ` f-lvm
0 siblings, 1 reply; 13+ messages in thread
From: Greg Freemyer @ 2009-04-28 17:00 UTC (permalink / raw)
To: LVM general discussion and development
On Tue, Apr 28, 2009 at 10:41 AM, Clyde E. Kunkel
<rascal.jumper-747@cox.net> wrote:
> On 04/27/2009 09:52 PM, Gaute Lund wrote:
>>
>> I have searched the web and the mailing list without finding anything
>> similar to this.
>>
>> At home I have an LVM setup. Reading data gives random errors. I only
>> recently discovered it's an LVM issue. I think.
>>
>> The issue: If I md5sum largeish files, or test archives, I sometimes get
>> errors or randomly different md5sums. Like now, I have 11 folders, all
>> with
>> rar files in parts: some 300 15MB pieces in 6 folders/sets, totaling
>> 4,2GB,
>> and 560 50MB pieces in 5 folders/sets, totaling 23G.
>>
>> OK, so I "rar t" all of these 5 times over. Errors pop up randomly, 52
>> times
>> in the 50MB pieces, 10 times in the 15MB pieces. That's about 1 error for
>> every 2,1GB of data read. Md5suming multiple files gives about the same
>> error rate.
>>
>> If I run repeated test on a rar set small enough to fit in cache mem, I
>> get
>> errors, but they are indentical with each run.
>>
>> Is it really an lvm problem? Well, I have created new LVs and use
>> different
>> filesystems, ext3, xfs, jfs - they're all the same. If I create an md on
>> some other disks, and put a filesystem on it, without LVM, no problems.
>>
>> I can't find any other errors, in any logs or dmesg. The errors weren't
>> there to begin with, they came at one point and got worse. It took a while
>> before I realized it was a generic disk problem, and for a period I kind
>> of
>> gave up on it. So it's been there for ... maybe six months?
>>
>> The VG consist of two software RAID 5 md's, one consisting of four 200GB
>> IDEs, one of five 500GB SATAs, yielding av VG totaling 2,37TB. Other
>> hardware is 4GB memory and a Core 2 Duo 6600 CPU.
>>
>> Machine runs Ubuntu 8.10 with kernel 2.6.27-11, and
>> � LVM version: � � 2.02.39 (2008-06-27)
>> � Library version: 1.02.27 (2008-06-25)
>> � Driver version: �4.14.0
>>
>> But the VG was originally created long ago, on LVM1 even.
>>
>> Well, I guess that's it. Any other information that could be helpful? Any
>> way I could debug this?
>>
>> Best regards
>> Gaute Lund
>>
>
> I am seeing the same thing with large (distros on DVDs) ISO files also.
> �Running md5sum or sha1sum on the file gives different results each time and
> burning the iso gives a dvd that contains files with errors. �I ran memory
> tests over night and all was good. �I turned on smartd checking and ran disk
> checks and all is ok and I continue to look for disk errors on a periodic
> basis and all is well.
>
> The linux system is Fedora rawhide, but the problem also exists in Fedora 9
> and 10. �The files are being downloaded with wget to a Download directory on
> my home directory which is an ext3 LV mounted on an ext4 home filesystem.
> �Wgeting to a standard non-LV ext3 parition results in good isos which
> demonstrate consistent sha1sums. �If I cp the good iso to the LV Download
> directory, problems again occur. �So far the problem only manifests with dvd
> size iso files. �CD size iso files are fine.
>
> I first noticed this problem several months ago, but have not bz'd it since
> I cannot yet for sure say it is LVM causing the problem. �However, I think
> at this point I have eliminated wget as the problem but not ext4. �I need to
> create an ext3 LV for / to test on.
>
> Any guidance on error capturing or any testing features of LVM2 that can be
> turned on?
>
> Thanks.
I'll be shocked if this is not a hardware problem.
I've seen unreliable data that SMART / dmesg will miss caused by:
bad disk ide (the actual electronics on the disk itself).
bad cables
bad connectors
bad power supply (or undersized)
bad controller ports
bad controller cards
bad pci slot (etc)
bad ram
bad cpu/L1 cache
bad L2 cache
So far you have not ruled most of the above out. The most likely in
my experience is the cables. And luckily they are cheap.
I think you need to do the old part swap thing until you eliminate the
above prior to moving on to assuming it is bad software.
Greg
--
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com
^ permalink raw reply [flat|nested] 13+ messages in thread
* [linux-lvm] Random file system errors
2009-04-28 17:00 ` Greg Freemyer
@ 2009-04-29 3:52 ` f-lvm
2009-04-29 19:02 ` Clyde E. Kunkel
2009-04-30 23:33 ` Clyde E. Kunkel
0 siblings, 2 replies; 13+ messages in thread
From: f-lvm @ 2009-04-29 3:52 UTC (permalink / raw)
To: linux-lvm
Btw, one way to proceed on the test-your-hardware angle without
yanking disks (or even opening the case) and possibly turning this
into a heisenbug if it really -is- something like cabling would be
to do something like this:
dd if=/dev/hda bs=1M count=1000 | md5sum
for each of hdX and sdX or whatever describes the raw physical
devices. Do this with the LVM -completely deactivated- so you
know that absolutely nothing can be writing to the disks; you
should probably boot from a LiveCD to ensure this.
Run each test at least twice for the same disk and record the results;
I'll bet that at least one of your disks will return inconsistent
data; perhaps all disks on one IDE channel or one SATA channel will,
or perhaps every single disk will if you've got RAM, PSU, or
bridge-chip troubles, etc.
If you're seeing a very low frequency of bit flips, raise the count on
the dd to something larger, like maybe 10000 instead or whatever;
that'll slow down the test but raise your confidence in it.
Either way, try it on a USB device as well. Very different hardware
and software paths. Might be illuminating.
Just make -damned- sure that your dd is using "if" and not "of"!
If you -can't- make it fail, you might get fancier and try something
that forces lots of head seeking (since that will consume more power
and maybe stress your PSU), or try running all the disk tests in
parallel (since that will chew up more CPU) or perhaps run something
that runs your CPU flat out in one process while doing the dd in
another.
If you still can't make it fail, try activating the LVM -from a LiveCD-
(e.g., -not- booted from it) and then repeat the tests on the LV's.
If it fails on LV's that have no mounted filesystems and aren't being
touched, but works on the raw devices, -then- you're starting to point
a finger at LVM... (And if you have to mount a FS to start getting
failures, only then might we start thinking about write barriers or
whatever...)
If everything you do doesn't make it fail, but it fails when you're
booted and running from that LVM, I'd start to suspect LVM and/or
kernel issues in the actual software you're running. But I'll bet
that you'll see a failure before that point.
And report back; it'd be good to close the loop on this if it's proven
-not- to be an LVM issue.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
@ 2009-04-29 6:19 Gaute Lund
2009-06-07 11:44 ` Gaute Lund
0 siblings, 1 reply; 13+ messages in thread
From: Gaute Lund @ 2009-04-29 6:19 UTC (permalink / raw)
To: LVM general discussion and development
Thanks, and also to others who gave feedback. The approach with md5summing devices came from another source too, and I'll try a systematic approach as soon as time allows.
-gaute
----- Opprinnelig melding -----
Fra: f-lvm@media.mit.edu
Sendt: 29. april 2009 05:52
Til: linux-lvm@redhat.com
Emne: [linux-lvm] Random file system errors
Btw, one way to proceed on the test-your-hardware angle without
yanking disks (or even opening the case) and possibly turning this
into a heisenbug if it really -is- something like cabling would be
to do something like this:
dd if=/dev/hda bs=1M count=1000 | md5sum
for each of hdX and sdX or whatever describes the raw physical
devices. Do this with the LVM -completely deactivated- so you
know that absolutely nothing can be writing to the disks; you
should probably boot from a LiveCD to ensure this.
Run each test at least twice for the same disk and record the results;
I'll bet that at least one of your disks will return inconsistent
data; perhaps all disks on one IDE channel or one SATA channel will,
or perhaps every single disk will if you've got RAM, PSU, or
bridge-chip troubles, etc.
If you're seeing a very low frequency of bit flips, raise the count on
the dd to something larger, like maybe 10000 instead or whatever;
that'll slow down the test but raise your confidence in it.
Either way, try it on a USB device as well. Very different hardware
and software paths. Might be illuminating.
Just make -damned- sure that your dd is using "if" and not "of"!
If you -can't- make it fail, you might get fancier and try something
that forces lots of head seeking (since that will consume more power
and maybe stress your PSU), or try running all the disk tests in
parallel (since that will chew up more CPU) or perhaps run something
that runs your CPU flat out in one process while doing the dd in
another.
If you still can't make it fail, try activating the LVM -from a LiveCD-
(e.g., -not- booted from it) and then repeat the tests on the LV's.
If it fails on LV's that have no mounted filesystems and aren't being
touched, but works on the raw devices, -then- you're starting to point
a finger at LVM... (And if you have to mount a FS to start getting
failures, only then might we start thinking about write barriers or
whatever...)
If everything you do doesn't make it fail, but it fails when you're
booted and running from that LVM, I'd start to suspect LVM and/or
kernel issues in the actual software you're running. But I'll bet
that you'll see a failure before that point.
And report back; it'd be good to close the loop on this if it's proven
-not- to be an LVM issue.
_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
2009-04-29 3:52 ` f-lvm
@ 2009-04-29 19:02 ` Clyde E. Kunkel
2009-04-30 23:33 ` Clyde E. Kunkel
1 sibling, 0 replies; 13+ messages in thread
From: Clyde E. Kunkel @ 2009-04-29 19:02 UTC (permalink / raw)
To: LVM general discussion and development
On 04/28/2009 11:52 PM, f-lvm@media.mit.edu wrote:
> Btw, one way to proceed on the test-your-hardware angle without
> yanking disks (or even opening the case) and possibly turning this
> into a heisenbug if it really -is- something like cabling would be
> to do something like this:
>
> dd if=/dev/hda bs=1M count=1000 | md5sum
>
> for each of hdX and sdX or whatever describes the raw physical
> devices. Do this with the LVM -completely deactivated- so you
> know that absolutely nothing can be writing to the disks; you
> should probably boot from a LiveCD to ensure this.
>
> Run each test at least twice for the same disk and record the results;
> I'll bet that at least one of your disks will return inconsistent
> data; perhaps all disks on one IDE channel or one SATA channel will,
> or perhaps every single disk will if you've got RAM, PSU, or
> bridge-chip troubles, etc.
>
> If you're seeing a very low frequency of bit flips, raise the count on
> the dd to something larger, like maybe 10000 instead or whatever;
> that'll slow down the test but raise your confidence in it.
>
> Either way, try it on a USB device as well. Very different hardware
> and software paths. Might be illuminating.
>
> Just make -damned- sure that your dd is using "if" and not "of"!
>
> If you -can't- make it fail, you might get fancier and try something
> that forces lots of head seeking (since that will consume more power
> and maybe stress your PSU), or try running all the disk tests in
> parallel (since that will chew up more CPU) or perhaps run something
> that runs your CPU flat out in one process while doing the dd in
> another.
>
> If you still can't make it fail, try activating the LVM -from a LiveCD-
> (e.g., -not- booted from it) and then repeat the tests on the LV's.
> If it fails on LV's that have no mounted filesystems and aren't being
> touched, but works on the raw devices, -then- you're starting to point
> a finger at LVM... (And if you have to mount a FS to start getting
> failures, only then might we start thinking about write barriers or
> whatever...)
>
> If everything you do doesn't make it fail, but it fails when you're
> booted and running from that LVM, I'd start to suspect LVM and/or
> kernel issues in the actual software you're running. But I'll bet
> that you'll see a failure before that point.
>
> And report back; it'd be good to close the loop on this if it's proven
> -not- to be an LVM issue.
>
> _______________________________________________
> linux-lvm mailing list
> linux-lvm@redhat.com
> https://www.redhat.com/mailman/listinfo/linux-lvm
> read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/
>
>
Excellent methodology...will give this a try. Will take some time since
the box is a test box maxed out with SATA drives and additional IDE
controller. Stay tuned...and thanks.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
2009-04-28 1:52 Gaute Lund
` (2 preceding siblings ...)
2009-04-28 14:41 ` Clyde E. Kunkel
@ 2009-04-30 16:17 ` Philipp Schmidt
3 siblings, 0 replies; 13+ messages in thread
From: Philipp Schmidt @ 2009-04-30 16:17 UTC (permalink / raw)
To: LVM general discussion and development
On 2009-04-28 03:52:23 +0200, Gaute Lund <gaute@idrift.no>
wrote in <003a01c9c7a3$fa10a000$ee31e000$@no>:
> I have searched the web and the mailing list without finding anything
> similar to this.
>
> The issue: If I md5sum largeish files, or test archives, I sometimes get
> errors or randomly different md5sums. Like now, I have 11 folders, all with
> rar files in parts: some 300 15MB pieces in 6 folders/sets, totaling 4,2GB,
> and 560 50MB pieces in 5 folders/sets, totaling 23G.
I got the same problem about a year ago - I turned out to be a small
incompability between the mainboard and ram...
AVE!
phils...
--
Lbh unir whfg ivbyngrq gur Qvtvgny Zvyraavhz Pbclevtug Npg ol oernxvat gur
cebgrpgvba bs pbclevtugrq zngrevny. Vs lbh ner abg n pvgvmra be erfvqrag bs
gur HFN, lbh evfx orvat vzcevfbarq naq uryq jvgubhg onvy sbe hc gb gjb jrrxf
hcba ragel gb gur HFN (c) Copyright 2001 by Hartmann Schaffer (signature only)
:wq
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
2009-04-29 3:52 ` f-lvm
2009-04-29 19:02 ` Clyde E. Kunkel
@ 2009-04-30 23:33 ` Clyde E. Kunkel
2009-05-01 1:28 ` f-lvm
1 sibling, 1 reply; 13+ messages in thread
From: Clyde E. Kunkel @ 2009-04-30 23:33 UTC (permalink / raw)
To: LVM general discussion and development
On 04/28/2009 11:52 PM, f-lvm@media.mit.edu wrote:
> Btw, one way to proceed on the test-your-hardware angle without
> yanking disks (or even opening the case) and possibly turning this
> into a heisenbug if it really -is- something like cabling would be
> to do something like this:
>
> dd if=/dev/hda bs=1M count=1000 | md5sum
>
> <snip>
>
> And report back; it'd be good to close the loop on this if it's proven
> -not- to be an LVM issue.
>
>
Frequent mismatched sha1sums on /dev/sdf which has the PV for the
problem LV. All other /dev/sd* ran clean.
Thanks for the valuable debugging method. Looks like a SIL 680 card, or
cable, or Seagate drive problem, but will be easy to isolate.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [linux-lvm] Random file system errors
2009-04-30 23:33 ` Clyde E. Kunkel
@ 2009-05-01 1:28 ` f-lvm
0 siblings, 0 replies; 13+ messages in thread
From: f-lvm @ 2009-05-01 1:28 UTC (permalink / raw)
To: linux-lvm
> Date: Thu, 30 Apr 2009 19:33:46 -0400
> From: "Clyde E. Kunkel" <rascal.jumper-747@cox.net>
> On 04/28/2009 11:52 PM, f-lvm@media.mit.edu wrote:
> > Btw, one way to proceed on the test-your-hardware angle without
> > yanking disks (or even opening the case) and possibly turning this
> > into a heisenbug if it really -is- something like cabling would be
> > to do something like this:
> >
> > dd if=/dev/hda bs=1M count=1000 | md5sum
> Frequent mismatched sha1sums on /dev/sdf which has the PV for the
> problem LV. All other /dev/sd* ran clean.
> Thanks for the valuable debugging method. Looks like a SIL 680 card, or
> cable, or Seagate drive problem, but will be easy to isolate.
Good to hear. Well, not good really, since hardware issues never are,
but at least you've got something you can now debug...
(And it lets LVM off the hook as well.)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
2009-04-29 6:19 [linux-lvm] Random file system errors Gaute Lund
@ 2009-06-07 11:44 ` Gaute Lund
2009-06-07 15:16 ` Clyde E. Kunkel
0 siblings, 1 reply; 13+ messages in thread
From: Gaute Lund @ 2009-06-07 11:44 UTC (permalink / raw)
To: LVM general discussion and development
Thanks again Clyde, Geoff and f-lvm@media.mit.edu and others who gave
advice. Turns out it was a RAM issue.
Just to close off this threadm, even if it's old. This is "only" a
private/testing box, and I've been busy, so I've only been able to test
stuff every now and then.
A few runs of memtest86 found no errors. I turned to the "md5sums of parts
of disks" approach. If I read large chunks (5 GB), from different places on
the disks, with 5+ iterations with each chunk, I got errors occasionally
(diverging md5sums). But this is 10 disks across two controllers and all
but two gave errors several times, albeit seldomly.
I started swapping hardware, and with different RAM I am OK. I guess clean
runs of memtest shouldn't be trusted 100%. I can even say, the way these
errors have crept up on me gradually over months(!), it means the RAM
stick(s) have failed gradually, without being touched or anything. Scary!
-gaute
--On 29. April 2009 08:19 +0200 Gaute Lund <gaute@idrift.no> wrote:
> Thanks, and also to others who gave feedback. The approach with
> md5summing devices came from another source too, and I'll try a
> systematic approach as soon as time allows.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [linux-lvm] Random file system errors
2009-06-07 11:44 ` Gaute Lund
@ 2009-06-07 15:16 ` Clyde E. Kunkel
0 siblings, 0 replies; 13+ messages in thread
From: Clyde E. Kunkel @ 2009-06-07 15:16 UTC (permalink / raw)
To: LVM general discussion and development
On 06/07/2009 07:44 AM, Gaute Lund wrote:
> Thanks again Clyde, Geoff and f-lvm@media.mit.edu and others who gave
> advice. Turns out it was a RAM issue.
>
> Just to close off this threadm, even if it's old. This is "only" a
> private/testing box, and I've been busy, so I've only been able to
> test stuff every now and then.
>
> A few runs of memtest86 found no errors. I turned to the "md5sums of
> parts of disks" approach. If I read large chunks (5 GB), from
> different places on the disks, with 5+ iterations with each chunk, I
> got errors occasionally (diverging md5sums). But this is 10 disks
> across two controllers and all but two gave errors several times,
> albeit seldomly.
>
> I started swapping hardware, and with different RAM I am OK. I guess
> clean runs of memtest shouldn't be trusted 100%. I can even say, the
> way these errors have crept up on me gradually over months(!), it
> means the RAM stick(s) have failed gradually, without being touched or
> anything. Scary!
>
> <snip>
FWIW, there is a bad version of memtest out there. I don't recall the
version number and don't recall the problem, but, for you, its moot.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-06-07 15:15 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-04-29 6:19 [linux-lvm] Random file system errors Gaute Lund
2009-06-07 11:44 ` Gaute Lund
2009-06-07 15:16 ` Clyde E. Kunkel
-- strict thread matches above, loose matches on Subject: below --
2009-04-28 1:52 Gaute Lund
2009-04-28 3:32 ` f-lvm
2009-04-28 3:50 ` Steer, Geoff
2009-04-28 14:41 ` Clyde E. Kunkel
2009-04-28 17:00 ` Greg Freemyer
2009-04-29 3:52 ` f-lvm
2009-04-29 19:02 ` Clyde E. Kunkel
2009-04-30 23:33 ` Clyde E. Kunkel
2009-05-01 1:28 ` f-lvm
2009-04-30 16:17 ` Philipp Schmidt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.