* ddf failed disk disappears after adding spare
@ 2012-08-01 8:29 Albert Pauw
2012-08-01 16:46 ` Albert Pauw
2012-08-14 23:39 ` NeilBrown
0 siblings, 2 replies; 4+ messages in thread
From: Albert Pauw @ 2012-08-01 8:29 UTC (permalink / raw)
To: linux-raid, neilb
Hi Neil,
here is a procedure which shows you another problem. It has to do with
the table produced at the end of the mdadm -E command, showing the disks
and their status. Seems when a disk has failed and another added, the
failed one disappears.
Hope you can find the problem and fix it.
Regards,
Albert
Here is the exact procedure which shows the problem:
Create a container with 5 disks:
mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5]
Physical Disks : 5
Number RefNo Size Device Type/State
0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online
1 6de79cb6 479232K /dev/loop2 Global-Spare/Online
2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online
3 0be2d310 479232K /dev/loop4 Global-Spare/Online
4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
Create a RAID 5 set of 3 disks in container:
mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127
Physical Disks : 5
Number RefNo Size Device Type/State
0 d1c8c16e 479232K /dev/loop1 active/Online
1 6de79cb6 479232K /dev/loop2 active/Online
2 b5fd1d6c 479232K /dev/loop3 active/Online
3 0be2d310 479232K /dev/loop4 Global-Spare/Online
4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
Create a RAID 1 set of 2 disks in container:
mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127
Physical Disks : 5
Number RefNo Size Device Type/State
0 d1c8c16e 479232K /dev/loop1 active/Online
1 6de79cb6 479232K /dev/loop2 active/Online
2 b5fd1d6c 479232K /dev/loop3 active/Online
3 0be2d310 479232K /dev/loop4 active/Online
4 5d8ac3d0 479232K /dev/loop5 active/Online
Fail first disk in RAID 5 set:
mdadm -f /dev/md0 /dev/loop1
Physical Disks : 5
Number RefNo Size Device Type/State
0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed
1 6de79cb6 479232K /dev/loop2 active/Online
2 b5fd1d6c 479232K /dev/loop3 active/Online
3 0be2d310 479232K /dev/loop4 active/Online
4 5d8ac3d0 479232K /dev/loop5 active/Online
Remove failed disk:
mdadm -r /dev/md0 /dev/loop1
Physical Disks : 5
Number RefNo Size Device Type/State
0 d1c8c16e 479232K active/Offline,
Failed, Missing
1 6de79cb6 479232K /dev/loop2 active/Online
2 b5fd1d6c 479232K /dev/loop3 active/Online
3 0be2d310 479232K /dev/loop4 active/Online
4 5d8ac3d0 479232K /dev/loop5 active/Online
Add failed disk back:
mdadm -a --force /dev/md0 /dev/loop1
Physical Disks : 5
Number RefNo Size Device Type/State
0 d1c8c16e 479232K /dev/loop1 active/Offline,
Failed, Missing
1 6de79cb6 479232K /dev/loop2 active/Online
2 b5fd1d6c 479232K /dev/loop3 active/Online
3 0be2d310 479232K /dev/loop4 active/Online
4 5d8ac3d0 479232K /dev/loop5 active/Online
Add spare disk to container:
mdadm -a --force /dev/md0 /dev/loop6
Physical Disks : 5
Number RefNo Size Device Type/State
0 6de79cb6 479232K /dev/loop2 active/Online
1 b5fd1d6c 479232K /dev/loop3 active/Online
2 0be2d310 479232K /dev/loop4 active/Online
3 5d8ac3d0 479232K /dev/loop5 active/Online
4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding
This is wrong! Physical disks should be 6 now!
Removed failed disk (which is missing from list now!) again, zero
superblock and add again:
mdadm -r /dev/md0 /dev/loop1
mdadm --zero-superblock /dev/loop1
mdadm -a --force /dev/md0 /dev/loop1
Physical Disks : 6
Number RefNo Size Device Type/State
0 6de79cb6 479232K /dev/loop2 active/Online
1 b5fd1d6c 479232K /dev/loop3 active/Online
2 0be2d310 479232K /dev/loop4 active/Online
3 5d8ac3d0 479232K /dev/loop5 active/Online
4 1dcfe3cf 479232K /dev/loop6 active/Online
5 8147a3ef 479232K /dev/loop1 Global-Spare/Online
And there they are, all 6 of them.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ddf failed disk disappears after adding spare
2012-08-01 8:29 ddf failed disk disappears after adding spare Albert Pauw
@ 2012-08-01 16:46 ` Albert Pauw
2012-08-01 17:27 ` Albert Pauw
2012-08-14 23:39 ` NeilBrown
1 sibling, 1 reply; 4+ messages in thread
From: Albert Pauw @ 2012-08-01 16:46 UTC (permalink / raw)
To: linux-raid, neilb
Hi Neil,
looking at it again I think the following happened:
When the disk was removed, the entry got the status "missing", which is correct.
When I re-added the same disk (actually I used add, re-add doesn't
work with containers) the "missing" status isn't cleared, as can be
seen.
But it is recognised as belonging to its original slot, albeit the
missing status isn't cleared, the other status (failed, offline) can
stay as they are.
When I now add another disk (a spare) the slot of the missing disk is
re-used, as it is marked "missing". Only by removing that disk,
zeroing
the superblock and adding it again, i.e. effectively adding a new
disk, the total amount of slots is increased to 6.
just my two cents,
Albert
On 1 August 2012 10:29, Albert Pauw <albert.pauw@gmail.com> wrote:
> Hi Neil,
>
> here is a procedure which shows you another problem. It has to do with the
> table produced at the end of the mdadm -E command, showing the disks and
> their status. Seems when a disk has failed and another added, the failed one
> disappears.
>
> Hope you can find the problem and fix it.
>
> Regards,
>
> Albert
>
> Here is the exact procedure which shows the problem:
>
> Create a container with 5 disks:
>
> mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5]
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online
> 1 6de79cb6 479232K /dev/loop2 Global-Spare/Online
> 2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online
> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online
> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
>
>
> Create a RAID 5 set of 3 disks in container:
>
> mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Online
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online
> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
>
>
> Create a RAID 1 set of 2 disks in container:
>
> mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Online
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Fail first disk in RAID 5 set:
>
> mdadm -f /dev/md0 /dev/loop1
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Remove failed disk:
>
> mdadm -r /dev/md0 /dev/loop1
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K active/Offline, Failed, Missing
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Add failed disk back:
>
> mdadm -a --force /dev/md0 /dev/loop1
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed, Missing
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Add spare disk to container:
>
> mdadm -a --force /dev/md0 /dev/loop6
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 6de79cb6 479232K /dev/loop2 active/Online
> 1 b5fd1d6c 479232K /dev/loop3 active/Online
> 2 0be2d310 479232K /dev/loop4 active/Online
> 3 5d8ac3d0 479232K /dev/loop5 active/Online
> 4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding
>
> This is wrong! Physical disks should be 6 now!
>
> Removed failed disk (which is missing from list now!) again, zero superblock
> and add again:
>
> mdadm -r /dev/md0 /dev/loop1
> mdadm --zero-superblock /dev/loop1
> mdadm -a --force /dev/md0 /dev/loop1
>
>
> Physical Disks : 6
> Number RefNo Size Device Type/State
> 0 6de79cb6 479232K /dev/loop2 active/Online
> 1 b5fd1d6c 479232K /dev/loop3 active/Online
> 2 0be2d310 479232K /dev/loop4 active/Online
> 3 5d8ac3d0 479232K /dev/loop5 active/Online
> 4 1dcfe3cf 479232K /dev/loop6 active/Online
> 5 8147a3ef 479232K /dev/loop1 Global-Spare/Online
>
> And there they are, all 6 of them.
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ddf failed disk disappears after adding spare
2012-08-01 16:46 ` Albert Pauw
@ 2012-08-01 17:27 ` Albert Pauw
0 siblings, 0 replies; 4+ messages in thread
From: Albert Pauw @ 2012-08-01 17:27 UTC (permalink / raw)
To: linux-raid, neilb
Sorry again,
just noticed that the removing of the slot of the missing drive is
triggered by a rebuild. In fact, even a failed (but not missing) drive
is removed as well.
I noticed this by the following:
- started with 6 disks
- created md0 with 5 disks
- failed one disk in md0
- the mdadm -E table is shown very shortly with 6 disks, one failed,
but when the rebuild kicks in, the failed disk entry is removed, 5
entries remain.
Albert
On 1 August 2012 18:46, Albert Pauw <albert.pauw@gmail.com> wrote:
> Hi Neil,
>
> looking at it again I think the following happened:
>
> When the disk was removed, the entry got the status "missing", which is correct.
> When I re-added the same disk (actually I used add, re-add doesn't
> work with containers) the "missing" status isn't cleared, as can be
> seen.
> But it is recognised as belonging to its original slot, albeit the
> missing status isn't cleared, the other status (failed, offline) can
> stay as they are.
>
> When I now add another disk (a spare) the slot of the missing disk is
> re-used, as it is marked "missing". Only by removing that disk,
> zeroing
> the superblock and adding it again, i.e. effectively adding a new
> disk, the total amount of slots is increased to 6.
>
>
>
> just my two cents,
>
> Albert
>
> On 1 August 2012 10:29, Albert Pauw <albert.pauw@gmail.com> wrote:
>> Hi Neil,
>>
>> here is a procedure which shows you another problem. It has to do with the
>> table produced at the end of the mdadm -E command, showing the disks and
>> their status. Seems when a disk has failed and another added, the failed one
>> disappears.
>>
>> Hope you can find the problem and fix it.
>>
>> Regards,
>>
>> Albert
>>
>> Here is the exact procedure which shows the problem:
>>
>> Create a container with 5 disks:
>>
>> mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5]
>>
>> Physical Disks : 5
>> Number RefNo Size Device Type/State
>> 0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online
>> 1 6de79cb6 479232K /dev/loop2 Global-Spare/Online
>> 2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online
>> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online
>> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
>>
>>
>> Create a RAID 5 set of 3 disks in container:
>>
>> mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127
>>
>> Physical Disks : 5
>> Number RefNo Size Device Type/State
>> 0 d1c8c16e 479232K /dev/loop1 active/Online
>> 1 6de79cb6 479232K /dev/loop2 active/Online
>> 2 b5fd1d6c 479232K /dev/loop3 active/Online
>> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online
>> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
>>
>>
>> Create a RAID 1 set of 2 disks in container:
>>
>> mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127
>>
>> Physical Disks : 5
>> Number RefNo Size Device Type/State
>> 0 d1c8c16e 479232K /dev/loop1 active/Online
>> 1 6de79cb6 479232K /dev/loop2 active/Online
>> 2 b5fd1d6c 479232K /dev/loop3 active/Online
>> 3 0be2d310 479232K /dev/loop4 active/Online
>> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>>
>>
>> Fail first disk in RAID 5 set:
>>
>> mdadm -f /dev/md0 /dev/loop1
>>
>> Physical Disks : 5
>> Number RefNo Size Device Type/State
>> 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed
>> 1 6de79cb6 479232K /dev/loop2 active/Online
>> 2 b5fd1d6c 479232K /dev/loop3 active/Online
>> 3 0be2d310 479232K /dev/loop4 active/Online
>> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>>
>>
>> Remove failed disk:
>>
>> mdadm -r /dev/md0 /dev/loop1
>>
>> Physical Disks : 5
>> Number RefNo Size Device Type/State
>> 0 d1c8c16e 479232K active/Offline, Failed, Missing
>> 1 6de79cb6 479232K /dev/loop2 active/Online
>> 2 b5fd1d6c 479232K /dev/loop3 active/Online
>> 3 0be2d310 479232K /dev/loop4 active/Online
>> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>>
>>
>> Add failed disk back:
>>
>> mdadm -a --force /dev/md0 /dev/loop1
>>
>> Physical Disks : 5
>> Number RefNo Size Device Type/State
>> 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed, Missing
>> 1 6de79cb6 479232K /dev/loop2 active/Online
>> 2 b5fd1d6c 479232K /dev/loop3 active/Online
>> 3 0be2d310 479232K /dev/loop4 active/Online
>> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>>
>>
>> Add spare disk to container:
>>
>> mdadm -a --force /dev/md0 /dev/loop6
>>
>> Physical Disks : 5
>> Number RefNo Size Device Type/State
>> 0 6de79cb6 479232K /dev/loop2 active/Online
>> 1 b5fd1d6c 479232K /dev/loop3 active/Online
>> 2 0be2d310 479232K /dev/loop4 active/Online
>> 3 5d8ac3d0 479232K /dev/loop5 active/Online
>> 4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding
>>
>> This is wrong! Physical disks should be 6 now!
>>
>> Removed failed disk (which is missing from list now!) again, zero superblock
>> and add again:
>>
>> mdadm -r /dev/md0 /dev/loop1
>> mdadm --zero-superblock /dev/loop1
>> mdadm -a --force /dev/md0 /dev/loop1
>>
>>
>> Physical Disks : 6
>> Number RefNo Size Device Type/State
>> 0 6de79cb6 479232K /dev/loop2 active/Online
>> 1 b5fd1d6c 479232K /dev/loop3 active/Online
>> 2 0be2d310 479232K /dev/loop4 active/Online
>> 3 5d8ac3d0 479232K /dev/loop5 active/Online
>> 4 1dcfe3cf 479232K /dev/loop6 active/Online
>> 5 8147a3ef 479232K /dev/loop1 Global-Spare/Online
>>
>> And there they are, all 6 of them.
>>
>>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ddf failed disk disappears after adding spare
2012-08-01 8:29 ddf failed disk disappears after adding spare Albert Pauw
2012-08-01 16:46 ` Albert Pauw
@ 2012-08-14 23:39 ` NeilBrown
1 sibling, 0 replies; 4+ messages in thread
From: NeilBrown @ 2012-08-14 23:39 UTC (permalink / raw)
To: Albert Pauw; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 5452 bytes --]
On Wed, 01 Aug 2012 10:29:51 +0200 Albert Pauw <albert.pauw@gmail.com> wrote:
> Hi Neil,
>
> here is a procedure which shows you another problem. It has to do with
> the table produced at the end of the mdadm -E command, showing the disks
> and their status. Seems when a disk has failed and another added, the
> failed one disappears.
>
> Hope you can find the problem and fix it.
>
> Regards,
>
> Albert
>
> Here is the exact procedure which shows the problem:
>
> Create a container with 5 disks:
>
> mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5]
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online
> 1 6de79cb6 479232K /dev/loop2 Global-Spare/Online
> 2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online
> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online
> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
>
>
> Create a RAID 5 set of 3 disks in container:
>
> mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Online
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online
> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online
>
>
> Create a RAID 1 set of 2 disks in container:
>
> mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Online
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Fail first disk in RAID 5 set:
>
> mdadm -f /dev/md0 /dev/loop1
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Remove failed disk:
>
> mdadm -r /dev/md0 /dev/loop1
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K active/Offline,
> Failed, Missing
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Add failed disk back:
>
> mdadm -a --force /dev/md0 /dev/loop1
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 d1c8c16e 479232K /dev/loop1 active/Offline,
> Failed, Missing
> 1 6de79cb6 479232K /dev/loop2 active/Online
> 2 b5fd1d6c 479232K /dev/loop3 active/Online
> 3 0be2d310 479232K /dev/loop4 active/Online
> 4 5d8ac3d0 479232K /dev/loop5 active/Online
>
>
> Add spare disk to container:
>
> mdadm -a --force /dev/md0 /dev/loop6
>
> Physical Disks : 5
> Number RefNo Size Device Type/State
> 0 6de79cb6 479232K /dev/loop2 active/Online
> 1 b5fd1d6c 479232K /dev/loop3 active/Online
> 2 0be2d310 479232K /dev/loop4 active/Online
> 3 5d8ac3d0 479232K /dev/loop5 active/Online
> 4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding
>
> This is wrong! Physical disks should be 6 now!
Whenever we add a device to the ddf we currently remove any record of any
failed and missing device. We have to forget about devices that have
disappeared at some stage, and this seems like a good place.
The problem here is that a device that is in the array is marked as
'missing'. This due to the bug I mentioned in the previous email. Currently
worked around by --zeroing the device before adding it.
>
> Removed failed disk (which is missing from list now!) again, zero
> superblock and add again:
>
> mdadm -r /dev/md0 /dev/loop1
> mdadm --zero-superblock /dev/loop1
> mdadm -a --force /dev/md0 /dev/loop1
>
>
> Physical Disks : 6
> Number RefNo Size Device Type/State
> 0 6de79cb6 479232K /dev/loop2 active/Online
> 1 b5fd1d6c 479232K /dev/loop3 active/Online
> 2 0be2d310 479232K /dev/loop4 active/Online
> 3 5d8ac3d0 479232K /dev/loop5 active/Online
> 4 1dcfe3cf 479232K /dev/loop6 active/Online
> 5 8147a3ef 479232K /dev/loop1 Global-Spare/Online
>
> And there they are, all 6 of them.
>
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-14 23:39 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-01 8:29 ddf failed disk disappears after adding spare Albert Pauw
2012-08-01 16:46 ` Albert Pauw
2012-08-01 17:27 ` Albert Pauw
2012-08-14 23:39 ` NeilBrown
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).