* Removed two drives (still valid and working) from raid-5 and need to add them back in.
@ 2011-03-12 19:12 mtice
2011-03-13 1:01 ` mtice
0 siblings, 1 reply; 6+ messages in thread
From: mtice @ 2011-03-12 19:12 UTC (permalink / raw)
To: linux-raid
I have a 4 disk raid 5 array on my Ubuntu 10.10 box. They are /dev/sd[c,d,e,f]. Smartctl started notifying me that /dev/sde had some bad sectors and the number of errors was increasing each day. To mitigate this I decided to buy a new drive and replace it.
I failed /dev/sde via mdadm:
mdadm --manage /dev/md0 --fail /dev/sde
mdadm --manage /dev/md0 --remove
I pulled the drive from the enclosure . . . and found it was the wrong drive (should have been the next drive down . . .). I quickly pushed the drive back in and found that the system renamed the device (/dev/sdh).
I then tried to add that drive back in (this time with the different dev name):
mdadm --manage /dev/md0 --re-add /dev/sdh
(I don't have the output of --detail for this step.)
I rebooted and the original dev name returned (/dev/sdd).
The problem is now I have two drives in my raid 5 which of course won't start:
mdadm -As /dev/md0
mdadm: /dev/md0 assembled from 2 drives and 2 spares - not enough to start the array.
Although, I can get it running with:
dadm --incremental --run --scan
So my question is how can I add these two still-valid spares back into my array?
Here is the output of mdadm --detail /dev/md0:
/dev/md0:
Version : 00.90
Creation Time : Thu May 27 15:35:56 2010
Raid Level : raid5
Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Fri Mar 11 15:53:35 2011
State : active, degraded, Not Started
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2
Layout : left-symmetric
Chunk Size : 64K
UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
Events : 0.43200
Number Major Minor RaidDevice State
0 8 80 0 active sync /dev/sdf
1 0 0 1 removed
2 0 0 2 removed
3 8 32 3 active sync /dev/sdc
4 8 64 - spare /dev/sde
5 8 48 - spare /dev/sdd
I appreciate any help.
Matt
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Removed two drives (still valid and working) from raid-5 and need to add them back in.
2011-03-12 19:12 Removed two drives (still valid and working) from raid-5 and need to add them back in mtice
@ 2011-03-13 1:01 ` mtice
2011-03-13 3:26 ` Phil Turmel
0 siblings, 1 reply; 6+ messages in thread
From: mtice @ 2011-03-13 1:01 UTC (permalink / raw)
To: mtice; +Cc: linux-raid
> I have a 4 disk raid 5 array on my Ubuntu 10.10 box. They are /dev/sd[c,d,e,f]. Smartctl started notifying me that /dev/sde had some bad sectors and the number of errors was increasing each day. To mitigate this I decided to buy a new drive and replace it.
>
> I failed /dev/sde via mdadm:
>
> mdadm --manage /dev/md0 --fail /dev/sde
> mdadm --manage /dev/md0 --remove
>
> I pulled the drive from the enclosure . . . and found it was the wrong drive (should have been the next drive down . . .). I quickly pushed the drive back in and found that the system renamed the device (/dev/sdh).
> I then tried to add that drive back in (this time with the different dev name):
>
> mdadm --manage /dev/md0 --re-add /dev/sdh
> (I don't have the output of --detail for this step.)
>
> I rebooted and the original dev name returned (/dev/sdd).
>
> The problem is now I have two drives in my raid 5 which of course won't start:
>
> mdadm -As /dev/md0
> mdadm: /dev/md0 assembled from 2 drives and 2 spares - not enough to start the array.
>
>
> Although, I can get it running with:
> dadm --incremental --run --scan
>
> So my question is how can I add these two still-valid spares back into my array?
>
> Here is the output of mdadm --detail /dev/md0:
>
> /dev/md0:
> Version : 00.90
> Creation Time : Thu May 27 15:35:56 2010
> Raid Level : raid5
> Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
> Raid Devices : 4
> Total Devices : 4
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Fri Mar 11 15:53:35 2011
> State : active, degraded, Not Started
> Active Devices : 2
> Working Devices : 4
> Failed Devices : 0
> Spare Devices : 2
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
> Events : 0.43200
>
> Number Major Minor RaidDevice State
> 0 8 80 0 active sync /dev/sdf
> 1 0 0 1 removed
> 2 0 0 2 removed
> 3 8 32 3 active sync /dev/sdc
>
> 4 8 64 - spare /dev/sde
> 5 8 48 - spare /dev/sdd
>
>
> I appreciate any help.
>
> Matt
I did find one older thread with a similar problem. The thread was titled "
RAID 5 re-add of removed drive? (failed drive replacement)"
The point that seemed to make the most sense is:
AFAIK, the only solution at this stage is to recreate the array.
You need to use the "--assume-clean" flag (or replace one of the drives
with "missing"), along with _exactly_ the same parameters & drive order
as when you originally created the array (you should be able to get most
of this from mdadm -D). This will rewrite the RAID metadata, but leave
the filesystem untouched.
The question I have is how do I know what order to put the drives in? And is this really the route I need to take?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Removed two drives (still valid and working) from raid-5 and need to add them back in.
2011-03-13 1:01 ` mtice
@ 2011-03-13 3:26 ` Phil Turmel
2011-03-13 3:38 ` mtice
0 siblings, 1 reply; 6+ messages in thread
From: Phil Turmel @ 2011-03-13 3:26 UTC (permalink / raw)
To: mtice; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 3721 bytes --]
Hi Matt,
On 03/12/2011 08:01 PM, mtice wrote:
>
>> I have a 4 disk raid 5 array on my Ubuntu 10.10 box. They are /dev/sd[c,d,e,f]. Smartctl started notifying me that /dev/sde had some bad sectors and the number of errors was increasing each day. To mitigate this I decided to buy a new drive and replace it.
>>
>> I failed /dev/sde via mdadm:
>>
>> mdadm --manage /dev/md0 --fail /dev/sde
>> mdadm --manage /dev/md0 --remove
>>
>> I pulled the drive from the enclosure . . . and found it was the wrong drive (should have been the next drive down . . .). I quickly pushed the drive back in and found that the system renamed the device (/dev/sdh).
>> I then tried to add that drive back in (this time with the different dev name):
>>
>> mdadm --manage /dev/md0 --re-add /dev/sdh
>> (I don't have the output of --detail for this step.)
>>
>> I rebooted and the original dev name returned (/dev/sdd).
>>
>> The problem is now I have two drives in my raid 5 which of course won't start:
>>
>> mdadm -As /dev/md0
>> mdadm: /dev/md0 assembled from 2 drives and 2 spares - not enough to start the array.
>>
>>
>> Although, I can get it running with:
>> dadm --incremental --run --scan
>>
>> So my question is how can I add these two still-valid spares back into my array?
>>
>> Here is the output of mdadm --detail /dev/md0:
>>
>> /dev/md0:
>> Version : 00.90
>> Creation Time : Thu May 27 15:35:56 2010
>> Raid Level : raid5
>> Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
>> Raid Devices : 4
>> Total Devices : 4
>> Preferred Minor : 0
>> Persistence : Superblock is persistent
>>
>> Update Time : Fri Mar 11 15:53:35 2011
>> State : active, degraded, Not Started
>> Active Devices : 2
>> Working Devices : 4
>> Failed Devices : 0
>> Spare Devices : 2
>>
>> Layout : left-symmetric
>> Chunk Size : 64K
>>
>> UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
>> Events : 0.43200
>>
>> Number Major Minor RaidDevice State
>> 0 8 80 0 active sync /dev/sdf
>> 1 0 0 1 removed
>> 2 0 0 2 removed
>> 3 8 32 3 active sync /dev/sdc
>>
>> 4 8 64 - spare /dev/sde
>> 5 8 48 - spare /dev/sdd
>>
>>
>> I appreciate any help.
>>
>> Matt
>
> I did find one older thread with a similar problem. The thread was titled "
> RAID 5 re-add of removed drive? (failed drive replacement)"
>
> The point that seemed to make the most sense is:
>
> AFAIK, the only solution at this stage is to recreate the array.
>
> You need to use the "--assume-clean" flag (or replace one of the drives
> with "missing"), along with _exactly_ the same parameters & drive order
> as when you originally created the array (you should be able to get most
> of this from mdadm -D). This will rewrite the RAID metadata, but leave
> the filesystem untouched.
>
> The question I have is how do I know what order to put the drives in? And is this really the route I need to take?
If you can avoid --create, do. Please report "mdadm -E /dev/sd[cdef]" so we can see all of the component drive's self-knowledge.
The order for --create will be the numerical order of the "RaidDevice" column. We know from the above what sdc and sdf are, but we need tell sdd and sde apart.
Before trying to --create, I suggest trying --assemble --force. It's much less likely to do something bad.
You might find my "lsdrv" script useful to see the serial numbers of these drives, so you won't confuse them in the future. I've attached the most recent version for your convenience.
Phil
[-- Attachment #2: lsdrv --]
[-- Type: text/plain, Size: 1947 bytes --]
#! /bin/bash
#
# Examine specific system host devices to identify the drives attached
#
function describe_controller () {
local device driver modprefix serial slotname
driver="`readlink -f \"$1/driver\"`"
driver="`basename $driver`"
modprefix="`cut -d: -f1 <\"$1/modalias\"`"
echo "Controller device @ ${1##/sys/devices/} [$driver]"
if [[ "$modprefix" == "pci" ]] ; then
slotname="`basename \"$1\"`"
echo " `lspci -s $slotname |cut -d\ -f2-`"
return
fi
if [[ "$modprefix" == "usb" ]] ; then
if [[ -f "$1/busnum" ]] ; then
device="`cat \"$1/busnum\"`:`cat \"$1/devnum\"`"
serial="`cat \"$1/serial\"`"
else
device="`cat \"$1/../busnum\"`:`cat \"$1/../devnum\"`"
serial="`cat \"$1/../serial\"`"
fi
echo " `lsusb -s $device` {SN: $serial}"
return
fi
echo -e " `cat \"$1/modalias\"`"
}
function describe_device () {
local empty=1
while read device ; do
empty=0
if [[ "$device" =~ ^(.+/[0-9]+:)([0-9]+:[0-9]+:[0-9]+)/block[/:](.+)$ ]] ; then
base="${BASH_REMATCH[1]}"
lun="${BASH_REMATCH[2]}"
bdev="${BASH_REMATCH[3]}"
vnd="$(< ${base}${lun}/vendor)"
mdl="$(< ${base}${lun}/model)"
sn="`sginfo -s /dev/$bdev | \
sed -rn -e \"/Serial Number/{s%^.+' *(.+) *'.*\\\$%\\\\1%;p;q}\"`" &>/dev/null
if [[ -n "$sn" ]] ; then
echo -e " $1 `echo $lun $bdev $vnd $mdl {SN: $sn}`"
else
echo -e " $1 `echo $lun $bdev $vnd $mdl`"
fi
else
echo -e " $1 Unknown $device"
fi
done
[[ $empty -eq 1 ]] && echo -e " $1 [Empty]"
}
function check_host () {
local found=0
local pController=
while read shost ; do
host=`dirname "$shost"`
controller=`dirname "$host"`
bhost=`basename "$host"`
if [[ "$controller" != "$pController" ]] ; then
pController="$controller"
describe_controller "$controller"
fi
find $host -regex '.+/target[0-9:]+/[0-9:]+/block[:/][^/]+' |describe_device "$bhost"
done
}
find /sys/devices/ -name 'scsi_host*' |check_host
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Removed two drives (still valid and working) from raid-5 and need to add them back in.
2011-03-13 3:26 ` Phil Turmel
@ 2011-03-13 3:38 ` mtice
[not found] ` <4D7C4083.2010000@turmel.org>
0 siblings, 1 reply; 6+ messages in thread
From: mtice @ 2011-03-13 3:38 UTC (permalink / raw)
To: Phil Turmel; +Cc: linux-raid
Hi Phil, thanks for the reply.
On Mar 12, 2011, at 8:26 PM, Phil Turmel wrote:
> Hi Matt,
>
> On 03/12/2011 08:01 PM, mtice wrote:
>>
>>> I have a 4 disk raid 5 array on my Ubuntu 10.10 box. They are /dev/sd[c,d,e,f]. Smartctl started notifying me that /dev/sde had some bad sectors and the number of errors was increasing each day. To mitigate this I decided to buy a new drive and replace it.
>>>
>>> I failed /dev/sde via mdadm:
>>>
>>> mdadm --manage /dev/md0 --fail /dev/sde
>>> mdadm --manage /dev/md0 --remove
>>>
>>> I pulled the drive from the enclosure . . . and found it was the wrong drive (should have been the next drive down . . .). I quickly pushed the drive back in and found that the system renamed the device (/dev/sdh).
>>> I then tried to add that drive back in (this time with the different dev name):
>>>
>>> mdadm --manage /dev/md0 --re-add /dev/sdh
>>> (I don't have the output of --detail for this step.)
>>>
>>> I rebooted and the original dev name returned (/dev/sdd).
>>>
>>> The problem is now I have two drives in my raid 5 which of course won't start:
>>>
>>> mdadm -As /dev/md0
>>> mdadm: /dev/md0 assembled from 2 drives and 2 spares - not enough to start the array.
>>>
>>>
>>> Although, I can get it running with:
>>> dadm --incremental --run --scan
>>>
>>> So my question is how can I add these two still-valid spares back into my array?
>>>
>>> Here is the output of mdadm --detail /dev/md0:
>>>
>>> /dev/md0:
>>> Version : 00.90
>>> Creation Time : Thu May 27 15:35:56 2010
>>> Raid Level : raid5
>>> Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
>>> Raid Devices : 4
>>> Total Devices : 4
>>> Preferred Minor : 0
>>> Persistence : Superblock is persistent
>>>
>>> Update Time : Fri Mar 11 15:53:35 2011
>>> State : active, degraded, Not Started
>>> Active Devices : 2
>>> Working Devices : 4
>>> Failed Devices : 0
>>> Spare Devices : 2
>>>
>>> Layout : left-symmetric
>>> Chunk Size : 64K
>>>
>>> UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
>>> Events : 0.43200
>>>
>>> Number Major Minor RaidDevice State
>>> 0 8 80 0 active sync /dev/sdf
>>> 1 0 0 1 removed
>>> 2 0 0 2 removed
>>> 3 8 32 3 active sync /dev/sdc
>>>
>>> 4 8 64 - spare /dev/sde
>>> 5 8 48 - spare /dev/sdd
>>>
>>>
>>> I appreciate any help.
>>>
>>> Matt
>>
>> I did find one older thread with a similar problem. The thread was titled "
>> RAID 5 re-add of removed drive? (failed drive replacement)"
>>
>> The point that seemed to make the most sense is:
>>
>> AFAIK, the only solution at this stage is to recreate the array.
>>
>> You need to use the "--assume-clean" flag (or replace one of the drives
>> with "missing"), along with _exactly_ the same parameters & drive order
>> as when you originally created the array (you should be able to get most
>> of this from mdadm -D). This will rewrite the RAID metadata, but leave
>> the filesystem untouched.
>>
>> The question I have is how do I know what order to put the drives in? And is this really the route I need to take?
>
> If you can avoid --create, do. Please report "mdadm -E /dev/sd[cdef]" so we can see all of the component drive's self-knowledge.
>
> The order for --create will be the numerical order of the "RaidDevice" column. We know from the above what sdc and sdf are, but we need tell sdd and sde apart.
>
> Before trying to --create, I suggest trying --assemble --force. It's much less likely to do something bad.
>
> You might find my "lsdrv" script useful to see the serial numbers of these drives, so you won't confuse them in the future. I've attached the most recent version for your convenience.
>
> Phil
>
>
> <lsdrv.txt>
Here is the output of mdadm -E /dev/sd[cdef]
/dev/sdc:
Magic : a92b4efc
Version : 00.90.00
UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
Creation Time : Thu May 27 15:35:56 2010
Raid Level : raid5
Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
Array Size : 2197723392 (2095.91 GiB 2250.47 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Mar 11 15:53:35 2011
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 2
Spare Devices : 2
Checksum : 3d3b86 - correct
Events : 43200
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 3 8 32 3 active sync /dev/sdc
0 0 8 80 0 active sync /dev/sdf
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 32 3 active sync /dev/sdc
4 4 8 64 4 spare /dev/sde
5 5 8 112 5 spare
/dev/sdd:
Magic : a92b4efc
Version : 00.90.00
UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
Creation Time : Thu May 27 15:35:56 2010
Raid Level : raid5
Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
Array Size : 2197723392 (2095.91 GiB 2250.47 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Mar 11 15:53:35 2011
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 2
Spare Devices : 2
Checksum : 3d3bd4 - correct
Events : 43200
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 5 8 112 5 spare
0 0 8 80 0 active sync /dev/sdf
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 32 3 active sync /dev/sdc
4 4 8 64 4 spare /dev/sde
5 5 8 112 5 spare
/dev/sde:
Magic : a92b4efc
Version : 00.90.00
UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
Creation Time : Thu May 27 15:35:56 2010
Raid Level : raid5
Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
Array Size : 2197723392 (2095.91 GiB 2250.47 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Mar 11 15:53:35 2011
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 2
Spare Devices : 2
Checksum : 3d3ba2 - correct
Events : 43200
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 64 4 spare /dev/sde
0 0 8 80 0 active sync /dev/sdf
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 32 3 active sync /dev/sdc
4 4 8 64 4 spare /dev/sde
5 5 8 112 5 spare
/dev/sdf:
Magic : a92b4efc
Version : 00.90.00
UUID : 11c1cdd8:60ec9a90:2e29483d:f114274d (local to host storage)
Creation Time : Thu May 27 15:35:56 2010
Raid Level : raid5
Used Dev Size : 732574464 (698.64 GiB 750.16 GB)
Array Size : 2197723392 (2095.91 GiB 2250.47 GB)
Raid Devices : 4
Total Devices : 5
Preferred Minor : 0
Update Time : Fri Mar 11 15:53:35 2011
State : clean
Active Devices : 2
Working Devices : 4
Failed Devices : 2
Spare Devices : 2
Checksum : 3d3bb0 - correct
Events : 43200
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 0 8 80 0 active sync /dev/sdf
0 0 8 80 0 active sync /dev/sdf
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 32 3 active sync /dev/sdc
4 4 8 64 4 spare /dev/sde
5 5 8 112 5 spare
I ran mdadm --assemble --force /dev/md0 but it erred with:
mdadm: device /dev/md0 already active - cannot assemble it
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-14 12:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-12 19:12 Removed two drives (still valid and working) from raid-5 and need to add them back in mtice
2011-03-13 1:01 ` mtice
2011-03-13 3:26 ` Phil Turmel
2011-03-13 3:38 ` mtice
[not found] ` <4D7C4083.2010000@turmel.org>
2011-03-14 0:14 ` mtice
2011-03-14 12:29 ` Matthew Tice
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).