* The right way to recover from md partition failure?
@ 2004-08-30 19:38 Jonathan Baker-Bates
2004-08-30 20:14 ` Guy
0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Baker-Bates @ 2004-08-30 19:38 UTC (permalink / raw)
To: linux-raid
I've been reading various FAQs and HOWTOs, but for some reason can't really
get an answer to what I assume is a simple question about how best to get a
failed md RAID 1 partition back into an array.
After a power-outage, I see that cat /proc/mdstat shows:
Personalities : [raid1]
read_ahead 1024 sectors
Event: 3
md1 : active raid1 hdg3[1]
178787264 blocks [2/1] [_U]
md0 : active raid1 hde2[0] hdg2[1]
2048192 blocks [2/2] [UU]
md2 : active raid1 hde1[0] hdg1[1]
104320 blocks [2/2] [UU]
unused devices: <none>
So it looks like /dev/hde3 is down. I'm not sure exactly why this is, but
there were some console messages about a bad block or something. So,
assuming hdg3 is OK (which it seems to be) can I just do the following?
Copy good partition to bad one:
dd if=/dev/hdg3 of=/dev/hde3
Add the resulting copy to the raid:
raidhotadd /dev/md1 /dev/hde3
fsck /dev/md1 to make sure all is well.
Is there a better way?
Jonathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: The right way to recover from md partition failure?
2004-08-30 19:38 The right way to recover from md partition failure? Jonathan Baker-Bates
@ 2004-08-30 20:14 ` Guy
2004-08-30 21:33 ` David Greaves
2004-08-30 21:44 ` Jonathan Baker-Bates
0 siblings, 2 replies; 8+ messages in thread
From: Guy @ 2004-08-30 20:14 UTC (permalink / raw)
To: 'Jonathan Baker-Bates', linux-raid
No need to copy, that's what md does.
Verify that the disk is not part of the array:
mdadm -D /dev/md1
I bet you will find the disk is there, but failed.
So, raidhotremove it, then raidhotadd it.
mdadm is the preferred tool. The old raidtools are not supported.
For details:
man mdadm
You may need to install mdadm.
mdadm manage /dev/md1 -r /dev/hde3
mdadm manage /dev/md1 -a /dev/hde3
or short form:
mdadm /dev/md1 -r /dev/hde3
mdadm /dev/md1 -a /dev/hde3
It should start to re-sync. Monitor the status with:
cat /proc/mdstat
and/or
mdadm -D /dev/md1
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Jonathan Baker-Bates
Sent: Monday, August 30, 2004 3:39 PM
To: linux-raid@vger.kernel.org
Subject: The right way to recover from md partition failure?
I've been reading various FAQs and HOWTOs, but for some reason can't really
get an answer to what I assume is a simple question about how best to get a
failed md RAID 1 partition back into an array.
After a power-outage, I see that cat /proc/mdstat shows:
Personalities : [raid1]
read_ahead 1024 sectors
Event: 3
md1 : active raid1 hdg3[1]
178787264 blocks [2/1] [_U]
md0 : active raid1 hde2[0] hdg2[1]
2048192 blocks [2/2] [UU]
md2 : active raid1 hde1[0] hdg1[1]
104320 blocks [2/2] [UU]
unused devices: <none>
So it looks like /dev/hde3 is down. I'm not sure exactly why this is, but
there were some console messages about a bad block or something. So,
assuming hdg3 is OK (which it seems to be) can I just do the following?
Copy good partition to bad one:
dd if=/dev/hdg3 of=/dev/hde3
Add the resulting copy to the raid:
raidhotadd /dev/md1 /dev/hde3
fsck /dev/md1 to make sure all is well.
Is there a better way?
Jonathan
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: The right way to recover from md partition failure?
2004-08-30 20:14 ` Guy
@ 2004-08-30 21:33 ` David Greaves
2004-08-30 21:50 ` Jonathan Baker-Bates
2004-08-30 22:17 ` Philip Molter
2004-08-30 21:44 ` Jonathan Baker-Bates
1 sibling, 2 replies; 8+ messages in thread
From: David Greaves @ 2004-08-30 21:33 UTC (permalink / raw)
To: Guy; +Cc: 'Jonathan Baker-Bates', linux-raid
I think a better approach might be:
mdadm /dev/md1 -r /dev/hde3
dd if=/dev/hde3 of=/dev/null
check logs for nasty errors and only continue if there weren't any :)
mdadm /dev/md1 -a /dev/hde3
Having done this very thing this afternoon!!
If you have "some console messages about a bad block or something" then
I'd make damn sure your disk is good before putting it back.
If you end up doing lots of retries during the resync and an error
occurs on a remaining drive you'll be sorry!
In general a raid failure means you should suspect a disk failure.
I just wish Jeff G would get of his backside and make SMART work with
libata - doesn't the man work on bank holidays? ;)
David
Guy wrote:
>No need to copy, that's what md does.
>
>Verify that the disk is not part of the array:
>mdadm -D /dev/md1
>
>I bet you will find the disk is there, but failed.
>So, raidhotremove it, then raidhotadd it.
>
>mdadm is the preferred tool. The old raidtools are not supported.
>For details:
>man mdadm
>
>You may need to install mdadm.
>
>mdadm manage /dev/md1 -r /dev/hde3
>mdadm manage /dev/md1 -a /dev/hde3
>
>or short form:
>mdadm /dev/md1 -r /dev/hde3
>mdadm /dev/md1 -a /dev/hde3
>
>It should start to re-sync. Monitor the status with:
>cat /proc/mdstat
>and/or
>mdadm -D /dev/md1
>
>Guy
>
>-----Original Message-----
>From: linux-raid-owner@vger.kernel.org
>[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Jonathan Baker-Bates
>Sent: Monday, August 30, 2004 3:39 PM
>To: linux-raid@vger.kernel.org
>Subject: The right way to recover from md partition failure?
>
>I've been reading various FAQs and HOWTOs, but for some reason can't really
>get an answer to what I assume is a simple question about how best to get a
>failed md RAID 1 partition back into an array.
>
>After a power-outage, I see that cat /proc/mdstat shows:
>
>Personalities : [raid1]
>read_ahead 1024 sectors
>Event: 3
>md1 : active raid1 hdg3[1]
> 178787264 blocks [2/1] [_U]
>
>md0 : active raid1 hde2[0] hdg2[1]
> 2048192 blocks [2/2] [UU]
>
>md2 : active raid1 hde1[0] hdg1[1]
> 104320 blocks [2/2] [UU]
>
>unused devices: <none>
>
>So it looks like /dev/hde3 is down. I'm not sure exactly why this is, but
>there were some console messages about a bad block or something. So,
>assuming hdg3 is OK (which it seems to be) can I just do the following?
>
>Copy good partition to bad one:
>
>dd if=/dev/hdg3 of=/dev/hde3
>
>Add the resulting copy to the raid:
>
>raidhotadd /dev/md1 /dev/hde3
>
>fsck /dev/md1 to make sure all is well.
>
>Is there a better way?
>
>Jonathan
>
>
>
>
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: The right way to recover from md partition failure?
2004-08-30 20:14 ` Guy
2004-08-30 21:33 ` David Greaves
@ 2004-08-30 21:44 ` Jonathan Baker-Bates
1 sibling, 0 replies; 8+ messages in thread
From: Jonathan Baker-Bates @ 2004-08-30 21:44 UTC (permalink / raw)
To: linux-raid
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org
> [mailto:linux-raid-owner@vger.kernel.org]On Behalf Of Guy
> Sent: 30 August 2004 21:14
> To: 'Jonathan Baker-Bates'; linux-raid@vger.kernel.org
> Subject: RE: The right way to recover from md partition failure?
>
>
> No need to copy, that's what md does.
>
> Verify that the disk is not part of the array:
> mdadm -D /dev/md1
>
> I bet you will find the disk is there, but failed.
> So, raidhotremove it, then raidhotadd it.
>
> mdadm is the preferred tool. The old raidtools are not supported.
> For details:
> man mdadm
>
> You may need to install mdadm.
>
> mdadm manage /dev/md1 -r /dev/hde3
> mdadm manage /dev/md1 -a /dev/hde3
>
> or short form:
> mdadm /dev/md1 -r /dev/hde3
> mdadm /dev/md1 -a /dev/hde3
>
> It should start to re-sync. Monitor the status with:
> cat /proc/mdstat
> and/or
> mdadm -D /dev/md1
>
Ah, thanks. I'll need to do a backup just in case before I try that though.
One question meanwhile: If there are bad blocks on the drive, and assuming
mdadm adds that disk to the array OK, can I fsck /dev/md1 in the normal way
and repair or mark them as bad? I'm a bit confused about using fsck and
RAID.
Jonathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: The right way to recover from md partition failure?
2004-08-30 21:33 ` David Greaves
@ 2004-08-30 21:50 ` Jonathan Baker-Bates
2004-08-30 22:11 ` David Greaves
2004-08-30 22:17 ` Philip Molter
1 sibling, 1 reply; 8+ messages in thread
From: Jonathan Baker-Bates @ 2004-08-30 21:50 UTC (permalink / raw)
To: linux-raid
> -----Original Message-----
> From: David Greaves [mailto:david@dgreaves.com]
> Sent: 30 August 2004 22:33
> To: Guy
> Cc: 'Jonathan Baker-Bates'; linux-raid@vger.kernel.org
> Subject: Re: The right way to recover from md partition failure?
>
>
> I think a better approach might be:
>
> mdadm /dev/md1 -r /dev/hde3
> dd if=/dev/hde3 of=/dev/null
Why the /dev/null-ing?
> check logs for nasty errors and only continue if there weren't any :)
> mdadm /dev/md1 -a /dev/hde3
>
> Having done this very thing this afternoon!!
>
> If you have "some console messages about a bad block or something" then
> I'd make damn sure your disk is good before putting it back.
> If you end up doing lots of retries during the resync and an error
> occurs on a remaining drive you'll be sorry!
>
> In general a raid failure means you should suspect a disk failure.
>
Now it's the issue of making sure the disk is good that was worrying me. How
do I make sure? Hence my question to Guy about fsck.
Jonathan
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: The right way to recover from md partition failure?
2004-08-30 21:50 ` Jonathan Baker-Bates
@ 2004-08-30 22:11 ` David Greaves
0 siblings, 0 replies; 8+ messages in thread
From: David Greaves @ 2004-08-30 22:11 UTC (permalink / raw)
To: Jonathan Baker-Bates; +Cc: linux-raid
Jonathan Baker-Bates wrote:
>>-----Original Message-----
>>From: David Greaves [mailto:david@dgreaves.com]
>>Sent: 30 August 2004 22:33
>>To: Guy
>>Cc: 'Jonathan Baker-Bates'; linux-raid@vger.kernel.org
>>Subject: Re: The right way to recover from md partition failure?
>>
>>
>>I think a better approach might be:
>>
>>mdadm /dev/md1 -r /dev/hde3
>>dd if=/dev/hde3 of=/dev/null
>>
>>
>
>Why the /dev/null-ing?
>
>
Since you ask I guess you're new at this?
First of be careful - check the dd syntax carefully - it can ruin your
whole day.
In this case dd goes straight to the hard disk device and pulls data
from the disk and sends it to /dev/null
The objective is to cause the disk to read every sector in the partition
and cause the OS to flag any low-level read errors.
If the dd command doesn't produce any errors - CHECK THE LOGS
If it succeeds on a 'retry' then I'd suspect the disk - if you have
*any* errors - suspect the disk.
>>check logs for nasty errors and only continue if there weren't any :)
>>
>>
check /var/log/messages and /var/log/kernel
Let us know what they say.
>>mdadm /dev/md1 -a /dev/hde3
>>
>>Having done this very thing this afternoon!!
>>
>>If you have "some console messages about a bad block or something" then
>>I'd make damn sure your disk is good before putting it back.
>>If you end up doing lots of retries during the resync and an error
>>occurs on a remaining drive you'll be sorry!
>>
>>In general a raid failure means you should suspect a disk failure.
>>
>>
>>
>
>Now it's the issue of making sure the disk is good that was worrying me. How
>do I make sure? Hence my question to Guy about fsck.
>
>
No
fsck will check to see if the *filesystem* is good - it will be.
To be honest you shouldn't have noticed any problems - the disk failed -
it happens - that's why you have RAID.
Smile - right now your system would be toast without it.
[Aside: FYI, disk systems are 'layered'.
In your case data (files) lives 'on top' of the filesystem which lives
on top of the md1 device which lives on top of the /dev/hd?? devices.
The md1 is designed to keep working if either /dev/hd?? fails - so the
filesystem and your files should never notice.
]
Anyway, of course disks sometimes have glitches (eg if it gets too hot etc).
You should probably go and get smartmon or smarttools (they look at your
disk's health status)
If you do have errors then shut down if you can and check your cables
and make sure all your fans are OK.
Reboot and try the dd again.
If you get errors again then you can try changing the IDE cable.
If you *still* have errors then get yourself online and dig out the
credit-card for a new disk.
David
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: The right way to recover from md partition failure?
2004-08-30 21:33 ` David Greaves
2004-08-30 21:50 ` Jonathan Baker-Bates
@ 2004-08-30 22:17 ` Philip Molter
2004-08-30 23:27 ` Guy
1 sibling, 1 reply; 8+ messages in thread
From: Philip Molter @ 2004-08-30 22:17 UTC (permalink / raw)
To: linux-raid
David Greaves wrote:
> I think a better approach might be:
>
> mdadm /dev/md1 -r /dev/hde3
> dd if=/dev/hde3 of=/dev/null
> check logs for nasty errors and only continue if there weren't any :)
> mdadm /dev/md1 -a /dev/hde3
Normally, for this I:
dd if=/dev/zero of=/dev/hde3
dd if=/dev/hde3 of=/dev/null
The write will usually cause the hard drive to internally relocate any
bad sectors, which is usually what causes RAID failures on IDE drives
(in my experience).
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: The right way to recover from md partition failure?
2004-08-30 22:17 ` Philip Molter
@ 2004-08-30 23:27 ` Guy
0 siblings, 0 replies; 8+ messages in thread
From: Guy @ 2004-08-30 23:27 UTC (permalink / raw)
To: 'Philip Molter', linux-raid
Yes! That was my plan, I just did not take the time to explain.
When md re-syncs the disk, the write to the "bad" disk should fix/re-locate
the bad blocks.
or
What he said (Philip), but be very careful!
dd if=/dev/zero of=/dev/hde3
dd if=/dev/hde3 of=/dev/null
SCSI disk have the same bad block issue. md does not support bad blocks. :(
You should test the disks once per day! In my opinion!
Guy
-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Philip Molter
Sent: Monday, August 30, 2004 6:17 PM
To: linux-raid@vger.kernel.org
Subject: Re: The right way to recover from md partition failure?
David Greaves wrote:
> I think a better approach might be:
>
> mdadm /dev/md1 -r /dev/hde3
> dd if=/dev/hde3 of=/dev/null
> check logs for nasty errors and only continue if there weren't any :)
> mdadm /dev/md1 -a /dev/hde3
Normally, for this I:
dd if=/dev/zero of=/dev/hde3
dd if=/dev/hde3 of=/dev/null
The write will usually cause the hard drive to internally relocate any
bad sectors, which is usually what causes RAID failures on IDE drives
(in my experience).
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2004-08-30 23:27 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-30 19:38 The right way to recover from md partition failure? Jonathan Baker-Bates
2004-08-30 20:14 ` Guy
2004-08-30 21:33 ` David Greaves
2004-08-30 21:50 ` Jonathan Baker-Bates
2004-08-30 22:11 ` David Greaves
2004-08-30 22:17 ` Philip Molter
2004-08-30 23:27 ` Guy
2004-08-30 21:44 ` Jonathan Baker-Bates
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).