All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Eddington <chrise@synplicity.com>
To: Bill Davidsen <davidsen@tmr.com>, David Greaves <david@dgreaves.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Raid5 assemble after dual sata port failure
Date: Fri, 16 Nov 2007 22:31:30 -0800	[thread overview]
Message-ID: <473E8AC2.9020701@synplicity.com> (raw)
In-Reply-To: <4737A5CC.8040105@tmr.com>

Yes, this is exactly the kind of symptoms I've experienced.  I was 
losing a drive here and there every couple of months (mostly the last 
two drives sdc and sdd) which I though were cable problems (shut down, 
re-plug the cables and restart and it would always work, with 
add/rebuild the 4th disk).  But now my guess is the motherboard chipset 
is overheating (or maybe the drives).  I have an MSI K9N platinum 
AMD/Nividia chipset that has 4 raid ports + 2 raid ports from a separate 
chip.  The mb chipset comes with a wimpy heatsink on it and it is very 
hot to the touch.  I had been planning to replace it but never got 
around to it.

I've been out of town this week so I had someone image all three disks.  
He used ghost disk image application.  He said the third disk reported 
media problems, and about 5% of the data was not fixable (sector 
errors).  Using these three copied drives, the array comes up and 
xfs_repair still reports a bunch of inode repairs as before, but it is a 
bit different, maybe even a reduction in losses.  But most important is 
the hpa_sector errors no longer occur.

Key questions:
- I assume ddrescue will do a much better job of correcting errors when 
imaging a disk?  My colleague used ghost which is just a copy tool.  I 
don't understand the capabilities of ddrescue on raid partitions that well.
- fdisk -l reports that all the drives are exactly the same size with 
exactly the same # sectors shown below.  I don't quite follow the 
hpa_resize issue, but it appears the drives don't have hidden HPA 
sectors - I guess?  Note that sdc is the original drive, where sda, sdb, 
and sdd are the imaged drives.

So what do you recommend to do first?  Should I try xfs_repair on the 
ghost copy, or just re-copy myself using ddrescue?  Are there special 
settings to ddrescue I should consider to verify/correct potential HPA 
changes?

Thks,
Chris

Disk /dev/sda: 500.1 GB, 500107862016 bytes
/dev/sda1               1       60801   488384001   fd  Linux raid 
autodetect
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
/dev/sdb1               1       60801   488384001   fd  Linux raid 
autodetect
Disk /dev/sdc: 500.1 GB, 500107862016 bytes
/dev/sdc1               1       60801   488384001   fd  Linux raid 
autodetect
Disk /dev/sdd: 500.1 GB, 500107862016 bytes
/dev/sdd1               1       60801   488384001   fd  Linux raid 
autodetect

Bill Davidsen wrote:
> David Greaves wrote:
>> Chris Eddington wrote:
>>  
>>> Yes, there is some kind of media error message in dmesg, below.  It is
>>> not random, it happens at exactly the same moments in each 
>>> xfs_repair -n
>>> run.
>>> Nov 11 09:48:25 altair kernel: [37043.300691]          res
>>> 51/40:00:01:00:00/00:00:00:00:00/e1 Emask 0x9 (media error)
>>> Nov 11 09:48:25 altair kernel: [37043.304326] ata4.00: 
>>> ata_hpa_resize 1:
>>> sectors = 976773168, hpa_sectors = 976773168
>>> Nov 11 09:48:25 altair kernel: [37043.307672] ata4.00: 
>>> ata_hpa_resize 1:
>>> sectors = 976773168, hpa_sectors = 976773168
>>>     
>>
>> I'm not sure what an ata_hpa_resize error is...
>>   
>
> HPA = Hardware Protected Area.
>
> By any chance is this disk partitioned such that the partition size 
> includes the HPA? If it does, this sounds at least familiar, this 
> mailing list post may get you started: 
> http://osdir.com/ml/linux.ataraid/2005-09/msg00002.html
>
> In any case, run "fdisk -l" and look at the claimed total disk size 
> and the end point of the last partition. The HPA is not included in 
> the "disk size" so nothing should be trying to do so.
>> It probably explains the problems you've been having with the raid 
>> not 'just
>> recovering' though.
>>
>> I saw this:
>> http://www.linuxquestions.org/questions/linux-kernel-70/sata-issues-568894/ 
>>
>>   
>
> May be the same thing. Let us know what fdisk reports.
>>
>> What does smartctl say about your drive?
>>
>> IMO the spare drive is no longer useful for data recovery - you may 
>> want to use
>> ddrescue to try and copy this drive to the spare drive.
>>
>> David
>> PS Don't get the ddrescue parameters the wrong way round if you go 
>> that route...
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>   
>
>


  reply	other threads:[~2007-11-17  6:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-07 20:28 Raid5 assemble after dual sata port failure Chris Eddington
2007-11-08 10:33 ` David Greaves
2007-11-09 21:23   ` Chris Eddington
2007-11-10  0:28     ` Chris Eddington
2007-11-10  9:16       ` David Greaves
2007-11-10 18:46         ` Chris Eddington
2007-11-11 17:09           ` David Greaves
2007-11-11 17:41             ` Chris Eddington
2007-11-11 22:49               ` David Greaves
2007-11-12  1:01                 ` Bill Davidsen
2007-11-17  6:31                   ` Chris Eddington [this message]
2007-11-18 12:25                     ` David Greaves
  -- strict thread matches above, loose matches on Subject: below --
2007-11-07 20:23 chrise

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=473E8AC2.9020701@synplicity.com \
    --to=chrise@synplicity.com \
    --cc=david@dgreaves.com \
    --cc=davidsen@tmr.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.