* raid 5 mismatch_cnt errors
@ 2010-05-20 16:58 Trey Scarborough
0 siblings, 0 replies; 12+ messages in thread
From: Trey Scarborough @ 2010-05-20 16:58 UTC (permalink / raw)
To: linux-raid
I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
growing. This is causing file corruption on the underlaying file systems
as well. I can copy a group of 100 100mb files and then do a md5sum on
them and 1-3 will be corrupt. If this is a drive that is bad is there
anyway to run a report on the count per drive that these mismatches
occur. I have run smarttools test and do not see one drive that stands
out to be causing errors. Could something else be causing these errors?
^ permalink raw reply [flat|nested] 12+ messages in thread
* raid 5 mismatch_cnt errors
@ 2010-05-20 17:02 Trey Scarborough
2010-05-20 21:16 ` Neil Brown
0 siblings, 1 reply; 12+ messages in thread
From: Trey Scarborough @ 2010-05-20 17:02 UTC (permalink / raw)
To: linux-raid
I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
growing. This is causing file corruption on the underlaying file systems
as well. I can copy a group of 100 100mb files and then do a md5sum on
them and 1-3 will be corrupt. If this is a drive that is bad is there
anyway to run a report on the count per drive that these mismatches
occur. I have run smarttools test and do not see one drive that stands
out to be causing errors. Could something else be causing these errors?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-20 17:02 Trey Scarborough
@ 2010-05-20 21:16 ` Neil Brown
2010-05-20 22:29 ` Trey Scarborough
0 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2010-05-20 21:16 UTC (permalink / raw)
To: Trey Scarborough; +Cc: linux-raid
On Thu, 20 May 2010 12:02:23 -0500
Trey Scarborough <treys@locallinux.com> wrote:
> I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
> growing. This is causing file corruption on the underlaying file systems
> as well. I can copy a group of 100 100mb files and then do a md5sum on
> them and 1-3 will be corrupt. If this is a drive that is bad is there
> anyway to run a report on the count per drive that these mismatches
> occur. I have run smarttools test and do not see one drive that stands
> out to be causing errors. Could something else be causing these errors?
When RAID5 detects an inconsistency there is no way to know which device was
wrong.
SMART only detects some errors, not all.
I have had hard drives before which appears to have a single-bit error in
their internal buffer. No error would be reported, but data you read would
sometimes be wrong.
RAID5 cannot help you with this sort of error.
I would suggest backing up all your data (if it isn't already to late),
breaking the array, and testing each device individually.
e.g. create a filesystem on the device and try copying data on and reading it
off.
NeilBrown
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-20 21:16 ` Neil Brown
@ 2010-05-20 22:29 ` Trey Scarborough
2010-05-20 22:38 ` Neil Brown
0 siblings, 1 reply; 12+ messages in thread
From: Trey Scarborough @ 2010-05-20 22:29 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-raid@vger.kernel.org
Neil Brown wrote:
> On Thu, 20 May 2010 12:02:23 -0500
> Trey Scarborough <treys@locallinux.com> wrote:
>
>
>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
>> growing. This is causing file corruption on the underlaying file systems
>> as well. I can copy a group of 100 100mb files and then do a md5sum on
>> them and 1-3 will be corrupt. If this is a drive that is bad is there
>> anyway to run a report on the count per drive that these mismatches
>> occur. I have run smarttools test and do not see one drive that stands
>> out to be causing errors. Could something else be causing these errors?
>>
>
>
> When RAID5 detects an inconsistency there is no way to know which device was
> wrong.
> SMART only detects some errors, not all.
> I have had hard drives before which appears to have a single-bit error in
> their internal buffer. No error would be reported, but data you read would
> sometimes be wrong.
> RAID5 cannot help you with this sort of error.
>
> I would suggest backing up all your data (if it isn't already to late),
> breaking the array, and testing each device individually.
> e.g. create a filesystem on the device and try copying data on and reading it
> off.
>
> NeilBrown
>
Thats what I was afraid of. The problem I have is if I back it up
knowing what data is bad. Luckily it appears to be a write error because
once written and correct I can do sums on all the files and I do not see
anymore errors. I was thinking that there might be a way of do a resync
and turning up the debug somehow so that it would log the mismatches
with both the drives that it was reading from at the time. I could then
take that information and considering there are 9 drives in the array
the one that comes out having the most should be the culprit. I could
then remove that drive from the array and test it leaving the rest in a
state that could be rebuilt and the data being consistant because the
drive with the bad write errors would be removed. Is this something that
might be possible?
Thanks,
Trey
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-20 22:29 ` Trey Scarborough
@ 2010-05-20 22:38 ` Neil Brown
2010-05-21 2:16 ` Doug Ledford
0 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2010-05-20 22:38 UTC (permalink / raw)
To: Trey Scarborough; +Cc: linux-raid@vger.kernel.org
On Thu, 20 May 2010 17:29:37 -0500
Trey Scarborough <treys@locallinux.com> wrote:
> Neil Brown wrote:
> > On Thu, 20 May 2010 12:02:23 -0500
> > Trey Scarborough <treys@locallinux.com> wrote:
> >
> >
> >> I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
> >> growing. This is causing file corruption on the underlaying file systems
> >> as well. I can copy a group of 100 100mb files and then do a md5sum on
> >> them and 1-3 will be corrupt. If this is a drive that is bad is there
> >> anyway to run a report on the count per drive that these mismatches
> >> occur. I have run smarttools test and do not see one drive that stands
> >> out to be causing errors. Could something else be causing these errors?
> >>
> >
> >
> > When RAID5 detects an inconsistency there is no way to know which device was
> > wrong.
> > SMART only detects some errors, not all.
> > I have had hard drives before which appears to have a single-bit error in
> > their internal buffer. No error would be reported, but data you read would
> > sometimes be wrong.
> > RAID5 cannot help you with this sort of error.
> >
> > I would suggest backing up all your data (if it isn't already to late),
> > breaking the array, and testing each device individually.
> > e.g. create a filesystem on the device and try copying data on and reading it
> > off.
> >
> > NeilBrown
> >
> Thats what I was afraid of. The problem I have is if I back it up
> knowing what data is bad. Luckily it appears to be a write error because
> once written and correct I can do sums on all the files and I do not see
> anymore errors. I was thinking that there might be a way of do a resync
> and turning up the debug somehow so that it would log the mismatches
> with both the drives that it was reading from at the time. I could then
> take that information and considering there are 9 drives in the array
> the one that comes out having the most should be the culprit. I could
> then remove that drive from the array and test it leaving the rest in a
> state that could be rebuilt and the data being consistant because the
> drive with the bad write errors would be removed. Is this something that
> might be possible?
To detect a mismatch, raid5 reads from all drives in parallel, calculates the
parity across the data blocks and compares that to the parity block.
So no: something like that is not possible.
only thing I can suggest:
- add a write-intent bitmap so you can remove/re-add devices fairly cheaply
- create a v.large file.
- write random data to the file without truncating it. (use dd of=file
conv=notrunc) then read it back and see if it matches. If it does, then
this approach doesn't help. If it doesn't:
1 by 1, fail/remove a drive from the array. Write new random data to the
same file and read it back and compare. Then --readd the missing device.
I'm hoping that you will get an error every time except when the 'bad'
device has been removed.
NeilBrown
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-20 22:38 ` Neil Brown
@ 2010-05-21 2:16 ` Doug Ledford
2010-05-21 16:40 ` MRK
2010-05-26 15:07 ` Bill Davidsen
0 siblings, 2 replies; 12+ messages in thread
From: Doug Ledford @ 2010-05-21 2:16 UTC (permalink / raw)
To: Neil Brown; +Cc: Trey Scarborough, linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 3801 bytes --]
On 05/20/2010 06:38 PM, Neil Brown wrote:
> On Thu, 20 May 2010 17:29:37 -0500
> Trey Scarborough <treys@locallinux.com> wrote:
>
>> Neil Brown wrote:
>>> On Thu, 20 May 2010 12:02:23 -0500
>>> Trey Scarborough <treys@locallinux.com> wrote:
>>>
>>>
>>>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
>>>> growing. This is causing file corruption on the underlaying file systems
>>>> as well. I can copy a group of 100 100mb files and then do a md5sum on
>>>> them and 1-3 will be corrupt. If this is a drive that is bad is there
>>>> anyway to run a report on the count per drive that these mismatches
>>>> occur. I have run smarttools test and do not see one drive that stands
>>>> out to be causing errors. Could something else be causing these errors?
>>>>
While a bad drive is certainly a possibility here, this is precisely the
type of failure scenario that would make me suspect bad RAM,
motherboard, or CPU. So I wouldn't rule those out as possibilities either.
>>>
>>> When RAID5 detects an inconsistency there is no way to know which device was
>>> wrong.
>>> SMART only detects some errors, not all.
>>> I have had hard drives before which appears to have a single-bit error in
>>> their internal buffer. No error would be reported, but data you read would
>>> sometimes be wrong.
>>> RAID5 cannot help you with this sort of error.
>>>
>>> I would suggest backing up all your data (if it isn't already to late),
>>> breaking the array, and testing each device individually.
>>> e.g. create a filesystem on the device and try copying data on and reading it
>>> off.
>>>
>>> NeilBrown
>>>
>> Thats what I was afraid of. The problem I have is if I back it up
>> knowing what data is bad. Luckily it appears to be a write error because
>> once written and correct I can do sums on all the files and I do not see
>> anymore errors. I was thinking that there might be a way of do a resync
>> and turning up the debug somehow so that it would log the mismatches
>> with both the drives that it was reading from at the time. I could then
>> take that information and considering there are 9 drives in the array
>> the one that comes out having the most should be the culprit. I could
>> then remove that drive from the array and test it leaving the rest in a
>> state that could be rebuilt and the data being consistant because the
>> drive with the bad write errors would be removed. Is this something that
>> might be possible?
>
> To detect a mismatch, raid5 reads from all drives in parallel, calculates the
> parity across the data blocks and compares that to the parity block.
> So no: something like that is not possible.
>
> only thing I can suggest:
>
> - add a write-intent bitmap so you can remove/re-add devices fairly cheaply
> - create a v.large file.
> - write random data to the file without truncating it. (use dd of=file
> conv=notrunc) then read it back and see if it matches. If it does, then
> this approach doesn't help. If it doesn't:
>
> 1 by 1, fail/remove a drive from the array. Write new random data to the
> same file and read it back and compare. Then --readd the missing device.
> I'm hoping that you will get an error every time except when the 'bad'
> device has been removed.
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-21 2:16 ` Doug Ledford
@ 2010-05-21 16:40 ` MRK
2010-05-21 20:57 ` Doug Ledford
2010-05-26 15:07 ` Bill Davidsen
1 sibling, 1 reply; 12+ messages in thread
From: MRK @ 2010-05-21 16:40 UTC (permalink / raw)
To: Doug Ledford; +Cc: Neil Brown, Trey Scarborough, linux-raid@vger.kernel.org
On 05/21/2010 04:16 AM, Doug Ledford wrote:
> On 05/20/2010 06:38 PM, Neil Brown wrote:
>
>> On Thu, 20 May 2010 17:29:37 -0500
>> Trey Scarborough<treys@locallinux.com> wrote:
>>
>>
>>> Neil Brown wrote:
>>>
>>>> On Thu, 20 May 2010 12:02:23 -0500
>>>> Trey Scarborough<treys@locallinux.com> wrote:
>>>>
>>>>
>>>>
>>>>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
>>>>> growing. This is causing file corruption on the underlaying file systems
>>>>> as well. I can copy a group of 100 100mb files and then do a md5sum on
>>>>> them and 1-3 will be corrupt. If this is a drive that is bad is there
>>>>> anyway to run a report on the count per drive that these mismatches
>>>>> occur. I have run smarttools test and do not see one drive that stands
>>>>> out to be causing errors. Could something else be causing these errors?
>>>>>
>>>>>
> While a bad drive is certainly a possibility here, this is precisely the
> type of failure scenario that would make me suspect bad RAM,
> motherboard, or CPU. So I wouldn't rule those out as possibilities either.
>
Could the cabling to the drive be causing this? (maybe failing or maybe
it's partly disconnected)
I don't remember at what point Linux is at implementing the checksums
between the controller and the drive.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-21 16:40 ` MRK
@ 2010-05-21 20:57 ` Doug Ledford
2010-05-24 9:34 ` Tim Small
0 siblings, 1 reply; 12+ messages in thread
From: Doug Ledford @ 2010-05-21 20:57 UTC (permalink / raw)
To: MRK; +Cc: Neil Brown, Trey Scarborough, linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1945 bytes --]
On 05/21/2010 12:40 PM, MRK wrote:
> On 05/21/2010 04:16 AM, Doug Ledford wrote:
>> On 05/20/2010 06:38 PM, Neil Brown wrote:
>>
>>> On Thu, 20 May 2010 17:29:37 -0500
>>> Trey Scarborough<treys@locallinux.com> wrote:
>>>
>>>
>>>> Neil Brown wrote:
>>>>
>>>>> On Thu, 20 May 2010 12:02:23 -0500
>>>>> Trey Scarborough<treys@locallinux.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that
>>>>>> keeps
>>>>>> growing. This is causing file corruption on the underlaying file
>>>>>> systems
>>>>>> as well. I can copy a group of 100 100mb files and then do a
>>>>>> md5sum on
>>>>>> them and 1-3 will be corrupt. If this is a drive that is bad is there
>>>>>> anyway to run a report on the count per drive that these mismatches
>>>>>> occur. I have run smarttools test and do not see one drive that
>>>>>> stands
>>>>>> out to be causing errors. Could something else be causing these
>>>>>> errors?
>>>>>>
>>>>>>
>> While a bad drive is certainly a possibility here, this is precisely the
>> type of failure scenario that would make me suspect bad RAM,
>> motherboard, or CPU. So I wouldn't rule those out as possibilities
>> either.
>>
>
> Could the cabling to the drive be causing this? (maybe failing or maybe
> it's partly disconnected)
> I don't remember at what point Linux is at implementing the checksums
> between the controller and the drive.
I don't know. I'm not up on the SATA signaling details so I don't know
if it uses CRC on the signal, but I suspect it does and a bad cable
would cause failed requests. But I wouldn't bet my house on it, so I
would ask some SATA gurus.
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-21 20:57 ` Doug Ledford
@ 2010-05-24 9:34 ` Tim Small
2010-05-25 19:09 ` Robert Hancock
0 siblings, 1 reply; 12+ messages in thread
From: Tim Small @ 2010-05-24 9:34 UTC (permalink / raw)
To: Doug Ledford
Cc: MRK, Neil Brown, Trey Scarborough, linux-raid@vger.kernel.org,
linux-ide
On 21/05/10 21:57, Doug Ledford wrote:
> On 05/21/2010 12:40 PM, MRK wrote:
>
>> On 05/21/2010 04:16 AM, Doug Ledford wrote:
>>
>> Could the cabling to the drive be causing this? (maybe failing or maybe
>> it's partly disconnected)
>> I don't remember at what point Linux is at implementing the checksums
>> between the controller and the drive.
>>
> I don't know. I'm not up on the SATA signaling details so I don't know
> if it uses CRC on the signal, but I suspect it does and a bad cable
> would cause failed requests. But I wouldn't bet my house on it, so I
> would ask some SATA gurus.
>
I wouldn't call myself that, but I believe PATA and SATA-level CRC
errors show up in the UDMA_CRC_Error_Count SMART variable - look for a
non-zero raw value in the smartctl output. This is presumably just the
error-count from the drive's point of view (bad data recd at drive
end). I don't know what happens with CRC errors detected at the Linux
end - and whether detection is controller-dependant. Better ask on
linux-ide.
From the SMART attribute name, presumably the earlier PATA transfer
modes don't support CRC error detection.
An easy thing to check might be to reduce the libata transfer speed from
3GBps to 1.5GBps. Similarly, try to test each drive and SATA port in
isolation if you can....
Tim.
--
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53 http://seoss.co.uk/ +44-(0)1273-808309
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-24 9:34 ` Tim Small
@ 2010-05-25 19:09 ` Robert Hancock
0 siblings, 0 replies; 12+ messages in thread
From: Robert Hancock @ 2010-05-25 19:09 UTC (permalink / raw)
To: Tim Small
Cc: Doug Ledford, MRK, Neil Brown, Trey Scarborough,
linux-raid@vger.kernel.org, linux-ide
On 05/24/2010 03:34 AM, Tim Small wrote:
> On 21/05/10 21:57, Doug Ledford wrote:
>> On 05/21/2010 12:40 PM, MRK wrote:
>>> On 05/21/2010 04:16 AM, Doug Ledford wrote:
>>> Could the cabling to the drive be causing this? (maybe failing or maybe
>>> it's partly disconnected)
>>> I don't remember at what point Linux is at implementing the checksums
>>> between the controller and the drive.
>> I don't know. I'm not up on the SATA signaling details so I don't know
>> if it uses CRC on the signal, but I suspect it does and a bad cable
>> would cause failed requests. But I wouldn't bet my house on it, so I
>> would ask some SATA gurus.
>
> I wouldn't call myself that, but I believe PATA and SATA-level CRC
> errors show up in the UDMA_CRC_Error_Count SMART variable - look for a
> non-zero raw value in the smartctl output. This is presumably just the
> error-count from the drive's point of view (bad data recd at drive end).
> I don't know what happens with CRC errors detected at the Linux end -
> and whether detection is controller-dependant. Better ask on linux-ide.
>
>
> From the SMART attribute name, presumably the earlier PATA transfer
> modes don't support CRC error detection.
>
> An easy thing to check might be to reduce the libata transfer speed from
> 3GBps to 1.5GBps. Similarly, try to test each drive and SATA port in
> isolation if you can....
ATA transfer errors should cause a bad CRC resulting in a failed
transfer which will cause complaints in the kernel log. For PATA, only
UDMA modes can detect CRC errors, PIO and MWDMA transfers can't.
There are other places where data corruption can occur however, like
inside the controller or the drive itself..
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-21 2:16 ` Doug Ledford
2010-05-21 16:40 ` MRK
@ 2010-05-26 15:07 ` Bill Davidsen
2010-05-26 15:49 ` Doug Ledford
1 sibling, 1 reply; 12+ messages in thread
From: Bill Davidsen @ 2010-05-26 15:07 UTC (permalink / raw)
To: Doug Ledford; +Cc: Neil Brown, Trey Scarborough, linux-raid@vger.kernel.org
Doug Ledford wrote:
> On 05/20/2010 06:38 PM, Neil Brown wrote:
>
>> On Thu, 20 May 2010 17:29:37 -0500
>> Trey Scarborough <treys@locallinux.com> wrote:
>>
>>
>>> Neil Brown wrote:
>>>
>>>> On Thu, 20 May 2010 12:02:23 -0500
>>>> Trey Scarborough <treys@locallinux.com> wrote:
>>>>
>>>>
>>>>
>>>>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that keeps
>>>>> growing. This is causing file corruption on the underlaying file systems
>>>>> as well. I can copy a group of 100 100mb files and then do a md5sum on
>>>>> them and 1-3 will be corrupt. If this is a drive that is bad is there
>>>>> anyway to run a report on the count per drive that these mismatches
>>>>> occur. I have run smarttools test and do not see one drive that stands
>>>>> out to be causing errors. Could something else be causing these errors?
>>>>>
>>>>>
>
> While a bad drive is certainly a possibility here, this is precisely the
> type of failure scenario that would make me suspect bad RAM,
> motherboard, or CPU. So I wouldn't rule those out as possibilities either.
>
I have the same thought, I would remove half the RAM from the system and
test again, then swap to the "other" half and repeat. Of course running
memtest first is a good idea, but I have seen failures which only happen
on disk access.
If the system is O/C obviously the first step is to cut the speed back...
--
Bill Davidsen <davidsen@tmr.com>
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: raid 5 mismatch_cnt errors
2010-05-26 15:07 ` Bill Davidsen
@ 2010-05-26 15:49 ` Doug Ledford
0 siblings, 0 replies; 12+ messages in thread
From: Doug Ledford @ 2010-05-26 15:49 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Neil Brown, Trey Scarborough, linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 1955 bytes --]
On 05/26/2010 11:07 AM, Bill Davidsen wrote:
> Doug Ledford wrote:
>> On 05/20/2010 06:38 PM, Neil Brown wrote:
>>
>>> On Thu, 20 May 2010 17:29:37 -0500
>>> Trey Scarborough <treys@locallinux.com> wrote:
>>>
>>>
>>>> Neil Brown wrote:
>>>>
>>>>> On Thu, 20 May 2010 12:02:23 -0500
>>>>> Trey Scarborough <treys@locallinux.com> wrote:
>>>>>
>>>>>
>>>>>> I have a raid 5 array with 9 disks and I have a mismatch_cnt that
>>>>>> keeps growing. This is causing file corruption on the underlaying
>>>>>> file systems as well. I can copy a group of 100 100mb files and
>>>>>> then do a md5sum on them and 1-3 will be corrupt. If this is a
>>>>>> drive that is bad is there anyway to run a report on the count per
>>>>>> drive that these mismatches occur. I have run smarttools test and
>>>>>> do not see one drive that stands out to be causing errors. Could
>>>>>> something else be causing these errors?
>>>>>>
>>
>> While a bad drive is certainly a possibility here, this is precisely the
>> type of failure scenario that would make me suspect bad RAM,
>> motherboard, or CPU. So I wouldn't rule those out as possibilities
>> either.
>>
>
> I have the same thought, I would remove half the RAM from the system and
> test again, then swap to the "other" half and repeat. Of course running
> memtest first is a good idea, but I have seen failures which only happen
> on disk access.
Indeed, I've seen lots of failures that only happen with disk access and
not with memory testers. Hence why I have a shell script on my web page
in my sig that uses disk access to test memory.
> If the system is O/C obviously the first step is to cut the speed back...
>
--
Doug Ledford <dledford@redhat.com>
GPG KeyID: CFBFF194
http://people.redhat.com/dledford
Infiniband specific RPMs available at
http://people.redhat.com/dledford/Infiniband
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2010-05-26 15:49 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-20 16:58 raid 5 mismatch_cnt errors Trey Scarborough
-- strict thread matches above, loose matches on Subject: below --
2010-05-20 17:02 Trey Scarborough
2010-05-20 21:16 ` Neil Brown
2010-05-20 22:29 ` Trey Scarborough
2010-05-20 22:38 ` Neil Brown
2010-05-21 2:16 ` Doug Ledford
2010-05-21 16:40 ` MRK
2010-05-21 20:57 ` Doug Ledford
2010-05-24 9:34 ` Tim Small
2010-05-25 19:09 ` Robert Hancock
2010-05-26 15:07 ` Bill Davidsen
2010-05-26 15:49 ` Doug Ledford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).