* Unable to re-add a disk after a reboot.
@ 2014-08-14 23:08 Ram Ramesh
2014-08-15 0:19 ` NeilBrown
0 siblings, 1 reply; 6+ messages in thread
From: Ram Ramesh @ 2014-08-14 23:08 UTC (permalink / raw)
To: Linux Raid
Hi,
I just finished converting a 3-disk raid5 to 4-disk raid6. After a
reboot to start clean, I noticed that one of the disk (the new one I
just added) was missing in /proc/partitions. This was disk 4 in my
/dev/md0. Assuming some cable issue, I powered off, wiggled the cables
and restarted and the device was found by kernel. However, md0 shows
device missing and array degraded
lata [rramesh] 280 > cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sdb1[0] sdd1[3] sdc1[1]
3906763776 blocks super 1.2 level 6, 512k chunk, algorithm 2
[4/3] [UUU_]
unused devices: <none>
However my attempt to --re-add does not work.
lata [rramesh] 277 > sudo mdadm /dev/md0 --verbose --re-add /dev/sde1
mdadm: --re-add for /dev/sde1 to /dev/md0 is not possible
lata [rramesh] 278 > sudo mdadm -E /dev/sde1
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : 730051d9:f4c58e0c:504fd1d9:798a84a4
Name : lata:0 (local to host lata)
Creation Time : Sun Oct 6 16:41:01 2013
Raid Level : raid6
Raid Devices : 4
Avail Dev Size : 5860268943 (2794.39 GiB 3000.46 GB)
Array Size : 3906763776 (3725.78 GiB 4000.53 GB)
Used Dev Size : 3906763776 (1862.89 GiB 2000.26 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 03898148:47c40cc2:f365082e:9f7f06cf
Update Time : Thu Aug 14 08:53:16 2014
Checksum : 346e9226 - correct
Events : 1191488
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 3
Array State : AAAA ('A' == active, '.' == missing)
lata [rramesh] 279 > fgrep UUID /etc/mdadm/mdadm.conf
# ARRAY /dev/md/0 metadata=1.2
UUID=0e9f76b5:4a89171a:a930bccd:78749144 name=zym:0
ARRAY /dev/md0 metadata=1.2 spares=1 name=lata:0
UUID=730051d9:f4c58e0c:504fd1d9:798a84a4
I checked the SMART and it shows a lot of reallocated_sector_ct errors
also. So, the disk is dying, but I am not able understand why mdadm
would not add.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 091 091 016 Pre-fail
Always - 53
2 Throughput_Performance 0x0005 100 100 054 Pre-fail
Offline - 0
3 Spin_Up_Time 0x0007 135 135 024 Pre-fail
Always - 426 (Average 425)
4 Start_Stop_Count 0x0012 100 100 000 Old_age
Always - 59
*5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail
Always FAILING_NOW 330*
7 Seek_Error_Rate 0x000b 098 098 067 Pre-fail
Always - 2
8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail
Offline - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age
Always - 3445
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 59
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 548
193 Load_Cycle_Count 0x0012 100 100 000 Old_age
Always - 548
194 Temperature_Celsius 0x0002 153 153 000 Old_age
Always - 39 (Min/Max 21/43)
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age
Always - 17604
197 Current_Pending_Sector 0x0022 001 001 000 Old_age
Always - 13256
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age
Always - 0
Any recommendations while I am waiting to get a replacement.
Ramesh
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: Unable to re-add a disk after a reboot.
2014-08-14 23:08 Unable to re-add a disk after a reboot Ram Ramesh
@ 2014-08-15 0:19 ` NeilBrown
2014-08-15 1:33 ` Ram Ramesh
0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2014-08-15 0:19 UTC (permalink / raw)
To: Ram Ramesh; +Cc: Linux Raid
[-- Attachment #1: Type: text/plain, Size: 3666 bytes --]
On Thu, 14 Aug 2014 18:08:30 -0500 Ram Ramesh <rramesh2400@gmail.com> wrote:
> Hi,
>
> I just finished converting a 3-disk raid5 to 4-disk raid6. After a
> reboot to start clean, I noticed that one of the disk (the new one I
> just added) was missing in /proc/partitions. This was disk 4 in my
> /dev/md0. Assuming some cable issue, I powered off, wiggled the cables
> and restarted and the device was found by kernel. However, md0 shows
> device missing and array degraded
>
> lata [rramesh] 280 > cat /proc/mdstat
> Personalities : [raid6] [raid5] [raid4]
> md0 : active raid6 sdb1[0] sdd1[3] sdc1[1]
> 3906763776 blocks super 1.2 level 6, 512k chunk, algorithm 2
> [4/3] [UUU_]
>
> unused devices: <none>
>
> However my attempt to --re-add does not work.
>
> lata [rramesh] 277 > sudo mdadm /dev/md0 --verbose --re-add /dev/sde1
> mdadm: --re-add for /dev/sde1 to /dev/md0 is not possible
"re-add" only makes sense when you have a write-indent bitmap which you don't
have.
So you need to "--add" which marks the device as a spare and then starts a
complete rebuild.
> I checked the SMART and it shows a lot of reallocated_sector_ct errors
> also. So, the disk is dying, but I am not able understand why mdadm
> would not add.
It will "add". It just wont "re-add".
NeilBrown
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 1 Raw_Read_Error_Rate 0x000b 091 091 016 Pre-fail
> Always - 53
> 2 Throughput_Performance 0x0005 100 100 054 Pre-fail
> Offline - 0
> 3 Spin_Up_Time 0x0007 135 135 024 Pre-fail
> Always - 426 (Average 425)
> 4 Start_Stop_Count 0x0012 100 100 000 Old_age
> Always - 59
> *5 Reallocated_Sector_Ct 0x0033 001 001 005 Pre-fail
> Always FAILING_NOW 330*
> 7 Seek_Error_Rate 0x000b 098 098 067 Pre-fail
> Always - 2
> 8 Seek_Time_Performance 0x0005 100 100 020 Pre-fail
> Offline - 0
> 9 Power_On_Hours 0x0012 100 100 000 Old_age
> Always - 3445
> 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail
> Always - 0
> 12 Power_Cycle_Count 0x0032 100 100 000 Old_age
> Always - 59
> 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
> Always - 548
> 193 Load_Cycle_Count 0x0012 100 100 000 Old_age
> Always - 548
> 194 Temperature_Celsius 0x0002 153 153 000 Old_age
> Always - 39 (Min/Max 21/43)
> 196 Reallocated_Event_Count 0x0032 001 001 000 Old_age
> Always - 17604
> 197 Current_Pending_Sector 0x0022 001 001 000 Old_age
> Always - 13256
> 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
> Offline - 0
> 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age
> Always - 0
>
> Any recommendations while I am waiting to get a replacement.
>
> Ramesh
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unable to re-add a disk after a reboot.
2014-08-15 0:19 ` NeilBrown
@ 2014-08-15 1:33 ` Ram Ramesh
2014-08-15 4:27 ` Mikael Abrahamsson
0 siblings, 1 reply; 6+ messages in thread
From: Ram Ramesh @ 2014-08-15 1:33 UTC (permalink / raw)
To: NeilBrown; +Cc: Linux Raid
On 08/14/2014 07:19 PM, NeilBrown wrote:
> On Thu, 14 Aug 2014 18:08:30 -0500 Ram Ramesh <rramesh2400@gmail.com> wrote:
>
>> Hi,
>>
>> I just finished converting a 3-disk raid5 to 4-disk raid6. After a
>> reboot to start clean, I noticed that one of the disk (the new one I
>> just added) was missing in /proc/partitions. This was disk 4 in my
>> /dev/md0. Assuming some cable issue, I powered off, wiggled the cables
>> and restarted and the device was found by kernel. However, md0 shows
>> device missing and array degraded
>>
>> lata [rramesh] 280 > cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> md0 : active raid6 sdb1[0] sdd1[3] sdc1[1]
>> 3906763776 blocks super 1.2 level 6, 512k chunk, algorithm 2
>> [4/3] [UUU_]
>>
>> unused devices: <none>
>>
>> However my attempt to --re-add does not work.
>>
>> lata [rramesh] 277 > sudo mdadm /dev/md0 --verbose --re-add /dev/sde1
>> mdadm: --re-add for /dev/sde1 to /dev/md0 is not possible
> "re-add" only makes sense when you have a write-indent bitmap which you don't
> have.
> So you need to "--add" which marks the device as a spare and then starts a
> complete rebuild.
>
>
>> I checked the SMART and it shows a lot of reallocated_sector_ct errors
>> also. So, the disk is dying, but I am not able understand why mdadm
>> would not add.
> It will "add". It just wont "re-add".
>
> NeilBrown
>
>
Thanks. Did not know that. I thought it will add without rebuild. This
means if a cable accidentally came off or if I booted without one disk
by mistake, my arrays are dead. This looks too restrictive. I must be
wrong in my conclusion. Please help me see this. Is there a add with
assume clean?
Anyway, there is no point in rebuilding (or adding) it after it failed
this miserably (has 17K reallocated event count, whatever that means) .
I will let the array be degraded until I find a replacement.
I thought write-intent bitmap was not a good idea. May be I did not
research enough. This brings me to the next (probably more important)
question. How do I replace a old drive that has not died without having
to rebuild? If I did a dd image xfer will it accept
the replacement?
Ramesh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unable to re-add a disk after a reboot.
2014-08-15 1:33 ` Ram Ramesh
@ 2014-08-15 4:27 ` Mikael Abrahamsson
2014-08-15 4:45 ` Ram Ramesh
0 siblings, 1 reply; 6+ messages in thread
From: Mikael Abrahamsson @ 2014-08-15 4:27 UTC (permalink / raw)
To: Ram Ramesh; +Cc: NeilBrown, Linux Raid
On Thu, 14 Aug 2014, Ram Ramesh wrote:
> I thought write-intent bitmap was not a good idea. May be I did not
> research enough. This brings me to the next (probably more important)
> question. How do I replace a old drive that has not died without having
> to rebuild? If I did a dd image xfer will it accept the replacement?
If you have a fairly recent kernel and mdadm, there is mdadm --replace.
https://unix.stackexchange.com/questions/74924/how-to-safely-replace-a-not-yet-failed-disk-in-a-linux-raid5-array
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unable to re-add a disk after a reboot.
2014-08-15 4:27 ` Mikael Abrahamsson
@ 2014-08-15 4:45 ` Ram Ramesh
2014-08-15 6:21 ` Mikael Abrahamsson
0 siblings, 1 reply; 6+ messages in thread
From: Ram Ramesh @ 2014-08-15 4:45 UTC (permalink / raw)
To: Mikael Abrahamsson; +Cc: NeilBrown, Linux Raid
On 08/14/2014 11:27 PM, Mikael Abrahamsson wrote:
> On Thu, 14 Aug 2014, Ram Ramesh wrote:
>
>> I thought write-intent bitmap was not a good idea. May be I did not
>> research enough. This brings me to the next (probably more important)
>> question. How do I replace a old drive that has not died without
>> having to rebuild? If I did a dd image xfer will it accept the
>> replacement?
>
> If you have a fairly recent kernel and mdadm, there is mdadm --replace.
>
> https://unix.stackexchange.com/questions/74924/how-to-safely-replace-a-not-yet-failed-disk-in-a-linux-raid5-array
>
>
Thanks. If I may, I like to ask one related question. I have a disk that
is already kicked out. Will adding a bitmap to degraded array help in
-re_add the device? I doubt it, but I rather ask before trying as I am
paranoid after the disk failure.
Ramesh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Unable to re-add a disk after a reboot.
2014-08-15 4:45 ` Ram Ramesh
@ 2014-08-15 6:21 ` Mikael Abrahamsson
0 siblings, 0 replies; 6+ messages in thread
From: Mikael Abrahamsson @ 2014-08-15 6:21 UTC (permalink / raw)
To: Ram Ramesh; +Cc: Linux Raid
On Thu, 14 Aug 2014, Ram Ramesh wrote:
> Thanks. If I may, I like to ask one related question. I have a disk that
> is already kicked out. Will adding a bitmap to degraded array help in
> -re_add the device? I doubt it, but I rather ask before trying as I am
> paranoid after the disk failure.
No, it wont.
--
Mikael Abrahamsson email: swmike@swm.pp.se
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-08-15 6:21 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-08-14 23:08 Unable to re-add a disk after a reboot Ram Ramesh
2014-08-15 0:19 ` NeilBrown
2014-08-15 1:33 ` Ram Ramesh
2014-08-15 4:27 ` Mikael Abrahamsson
2014-08-15 4:45 ` Ram Ramesh
2014-08-15 6:21 ` Mikael Abrahamsson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.