linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Raid corruption problems.
@ 2006-12-15 19:52 John McMonagle
  2006-12-19 21:39 ` Bill Davidsen
  0 siblings, 1 reply; 4+ messages in thread
From: John McMonagle @ 2006-12-15 19:52 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 696 bytes --]

Have a raid1  backup server that seems to get corrupted.
This is the 3rd time in about a year.
Have 2 other backup servers that were cloned from this one that have no 
problems.

Done a couple kernel upgrades recently.
Now has 2.6.18-2 kernel.
It's based on Debian sarge.

It's a low end Intel server motherboard using ata_piix sata driver.
Have another mother board just like doing raid1 with sata drives that 
has had no problems but it has a much lighter disk load.
smartctl has never shown any problems.
In /sys/block/md2 did
echo check > syncaction
No error messages but mismatch_cnt  is 1152.
rc0/errors and rc1/errors are both 0.

I'm guessing a hardware problem.
Any suggestions?

John



[-- Attachment #2: johnm.vcf --]
[-- Type: text/x-vcard, Size: 184 bytes --]

begin:vcard
fn:John McMonagle
n:McMonagle;John
org:Advocap inc
email;internet:johnm@advocap.org
title:IT manager
x-mozilla-html:FALSE
url:http://www.advocap.org
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid corruption problems.
  2006-12-15 19:52 Raid corruption problems John McMonagle
@ 2006-12-19 21:39 ` Bill Davidsen
  2006-12-24 14:00   ` John McMonagle
  0 siblings, 1 reply; 4+ messages in thread
From: Bill Davidsen @ 2006-12-19 21:39 UTC (permalink / raw)
  To: John McMonagle; +Cc: linux-raid

John McMonagle wrote:
> Have a raid1  backup server that seems to get corrupted.
> This is the 3rd time in about a year.
> Have 2 other backup servers that were cloned from this one that have 
> no problems.
>
> Done a couple kernel upgrades recently.
> Now has 2.6.18-2 kernel.
> It's based on Debian sarge.
>
> It's a low end Intel server motherboard using ata_piix sata driver.
> Have another mother board just like doing raid1 with sata drives that 
> has had no problems but it has a much lighter disk load.
> smartctl has never shown any problems.
> In /sys/block/md2 did
> echo check > syncaction
> No error messages but mismatch_cnt  is 1152.
> rc0/errors and rc1/errors are both 0.
>
> I'm guessing a hardware problem.
> Any suggestions? 

Since memory is the easiest to test, I'd try memtest86+ for at least 12 
hr. If this were PATA I'd suggest replugging the cables, but it's lower 
probability with SATA. Still, probably worth trying.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid corruption problems.
  2006-12-19 21:39 ` Bill Davidsen
@ 2006-12-24 14:00   ` John McMonagle
  2006-12-25 19:06     ` Bill Davidsen
  0 siblings, 1 reply; 4+ messages in thread
From: John McMonagle @ 2006-12-24 14:00 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: linux-raid

Bill Davidsen wrote:

> John McMonagle wrote:
>
>> Have a raid1  backup server that seems to get corrupted.
>> This is the 3rd time in about a year.
>> Have 2 other backup servers that were cloned from this one that have
>> no problems.
>>
>> Done a couple kernel upgrades recently.
>> Now has 2.6.18-2 kernel.
>> It's based on Debian sarge.
>>
>> It's a low end Intel server motherboard using ata_piix sata driver.
>> Have another mother board just like doing raid1 with sata drives that
>> has had no problems but it has a much lighter disk load.
>> smartctl has never shown any problems.
>> In /sys/block/md2 did
>> echo check > syncaction
>> No error messages but mismatch_cnt  is 1152.
>> rc0/errors and rc1/errors are both 0.
>>
>> I'm guessing a hardware problem.
>> Any suggestions? 
>
>
> Since memory is the easiest to test, I'd try memtest86+ for at least
> 12 hr. If this were PATA I'd suggest replugging the cables, but it's
> lower probability with SATA. Still, probably worth trying.
>
Ran  Memtest86+ for over 18 hours with no errors.
Also have ecc ram.
I can look at the cables next time I'm  there.
Doesn't  sata do some sort of error checking over the cables?
Anything else to try?


Memtest86+ v1.65      | Pass 74% ############################
Pentium 4 (0.09) 2793 MHz   | Test 61% #######################
L1 Cache:   16K 17135MB/s   | Test #7  [Random number sequence]
L2 Cache: 1024K 15179MB/s   | Testing:  112K - 1024M 1024M
Memory  : 1024M  2059MB/s   | Pattern:   189170f4
Chipset : Intel i875P (ECC : Detect / Correct) - PAT : Enabled
Settings: RAM : 199 MHz (DDR398) / CAS : 3-3-3-8 / Dual Channel (128 bits)

 WallTime   Cached  RsvdMem   MemMap   Cache  ECC  Test  Pass  Errors
ECC Errs
 ---------  ------  -------  --------  -----  ---  ----  ----  ------
--------
  18:52:27   1024M     120K  e820-Std    on   off   Std    56       0
 -----------------------------------------------------------------------------

-- 
John McMonagle
IT Manager
Advocap Inc.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Raid corruption problems.
  2006-12-24 14:00   ` John McMonagle
@ 2006-12-25 19:06     ` Bill Davidsen
  0 siblings, 0 replies; 4+ messages in thread
From: Bill Davidsen @ 2006-12-25 19:06 UTC (permalink / raw)
  To: John McMonagle; +Cc: linux-raid

John McMonagle wrote:
> Bill Davidsen wrote:
>
>   
>> John McMonagle wrote:
>>
>>     
>>> Have a raid1  backup server that seems to get corrupted.
>>> This is the 3rd time in about a year.
>>> Have 2 other backup servers that were cloned from this one that have
>>> no problems.
>>>
>>> Done a couple kernel upgrades recently.
>>> Now has 2.6.18-2 kernel.
>>> It's based on Debian sarge.
>>>
>>> It's a low end Intel server motherboard using ata_piix sata driver.
>>> Have another mother board just like doing raid1 with sata drives that
>>> has had no problems but it has a much lighter disk load.
>>> smartctl has never shown any problems.
>>> In /sys/block/md2 did
>>> echo check > syncaction
>>> No error messages but mismatch_cnt  is 1152.
>>> rc0/errors and rc1/errors are both 0.
>>>
>>> I'm guessing a hardware problem.
>>> Any suggestions? 
>>>       
>> Since memory is the easiest to test, I'd try memtest86+ for at least
>> 12 hr. If this were PATA I'd suggest replugging the cables, but it's
>> lower probability with SATA. Still, probably worth trying.
>>
>>     
> Ran  Memtest86+ for over 18 hours with no errors.
> Also have ecc ram.
> I can look at the cables next time I'm  there.
> Doesn't  sata do some sort of error checking over the cables?
> Anything else to try?

I wish I had another reasonable idea, but memory problems are the big 
target. Is the firmware current level (if you can compare against the 
servers which are working well that would really be good). Other than 
that power supply is the only thing even reasonably likely, particularly 
if it happens under heavy load. Have you looked at the BIOS voltage and 
temp reports just to see if they suggest anything?

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-12-25 19:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-12-15 19:52 Raid corruption problems John McMonagle
2006-12-19 21:39 ` Bill Davidsen
2006-12-24 14:00   ` John McMonagle
2006-12-25 19:06     ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).