* ddf failed disk disappears after adding spare @ 2012-08-01 8:29 Albert Pauw 2012-08-01 16:46 ` Albert Pauw 2012-08-14 23:39 ` NeilBrown 0 siblings, 2 replies; 4+ messages in thread From: Albert Pauw @ 2012-08-01 8:29 UTC (permalink / raw) To: linux-raid, neilb Hi Neil, here is a procedure which shows you another problem. It has to do with the table produced at the end of the mdadm -E command, showing the disks and their status. Seems when a disk has failed and another added, the failed one disappears. Hope you can find the problem and fix it. Regards, Albert Here is the exact procedure which shows the problem: Create a container with 5 disks: mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5] Physical Disks : 5 Number RefNo Size Device Type/State 0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online 1 6de79cb6 479232K /dev/loop2 Global-Spare/Online 2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online 3 0be2d310 479232K /dev/loop4 Global-Spare/Online 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online Create a RAID 5 set of 3 disks in container: mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127 Physical Disks : 5 Number RefNo Size Device Type/State 0 d1c8c16e 479232K /dev/loop1 active/Online 1 6de79cb6 479232K /dev/loop2 active/Online 2 b5fd1d6c 479232K /dev/loop3 active/Online 3 0be2d310 479232K /dev/loop4 Global-Spare/Online 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online Create a RAID 1 set of 2 disks in container: mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127 Physical Disks : 5 Number RefNo Size Device Type/State 0 d1c8c16e 479232K /dev/loop1 active/Online 1 6de79cb6 479232K /dev/loop2 active/Online 2 b5fd1d6c 479232K /dev/loop3 active/Online 3 0be2d310 479232K /dev/loop4 active/Online 4 5d8ac3d0 479232K /dev/loop5 active/Online Fail first disk in RAID 5 set: mdadm -f /dev/md0 /dev/loop1 Physical Disks : 5 Number RefNo Size Device Type/State 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed 1 6de79cb6 479232K /dev/loop2 active/Online 2 b5fd1d6c 479232K /dev/loop3 active/Online 3 0be2d310 479232K /dev/loop4 active/Online 4 5d8ac3d0 479232K /dev/loop5 active/Online Remove failed disk: mdadm -r /dev/md0 /dev/loop1 Physical Disks : 5 Number RefNo Size Device Type/State 0 d1c8c16e 479232K active/Offline, Failed, Missing 1 6de79cb6 479232K /dev/loop2 active/Online 2 b5fd1d6c 479232K /dev/loop3 active/Online 3 0be2d310 479232K /dev/loop4 active/Online 4 5d8ac3d0 479232K /dev/loop5 active/Online Add failed disk back: mdadm -a --force /dev/md0 /dev/loop1 Physical Disks : 5 Number RefNo Size Device Type/State 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed, Missing 1 6de79cb6 479232K /dev/loop2 active/Online 2 b5fd1d6c 479232K /dev/loop3 active/Online 3 0be2d310 479232K /dev/loop4 active/Online 4 5d8ac3d0 479232K /dev/loop5 active/Online Add spare disk to container: mdadm -a --force /dev/md0 /dev/loop6 Physical Disks : 5 Number RefNo Size Device Type/State 0 6de79cb6 479232K /dev/loop2 active/Online 1 b5fd1d6c 479232K /dev/loop3 active/Online 2 0be2d310 479232K /dev/loop4 active/Online 3 5d8ac3d0 479232K /dev/loop5 active/Online 4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding This is wrong! Physical disks should be 6 now! Removed failed disk (which is missing from list now!) again, zero superblock and add again: mdadm -r /dev/md0 /dev/loop1 mdadm --zero-superblock /dev/loop1 mdadm -a --force /dev/md0 /dev/loop1 Physical Disks : 6 Number RefNo Size Device Type/State 0 6de79cb6 479232K /dev/loop2 active/Online 1 b5fd1d6c 479232K /dev/loop3 active/Online 2 0be2d310 479232K /dev/loop4 active/Online 3 5d8ac3d0 479232K /dev/loop5 active/Online 4 1dcfe3cf 479232K /dev/loop6 active/Online 5 8147a3ef 479232K /dev/loop1 Global-Spare/Online And there they are, all 6 of them. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ddf failed disk disappears after adding spare 2012-08-01 8:29 ddf failed disk disappears after adding spare Albert Pauw @ 2012-08-01 16:46 ` Albert Pauw 2012-08-01 17:27 ` Albert Pauw 2012-08-14 23:39 ` NeilBrown 1 sibling, 1 reply; 4+ messages in thread From: Albert Pauw @ 2012-08-01 16:46 UTC (permalink / raw) To: linux-raid, neilb Hi Neil, looking at it again I think the following happened: When the disk was removed, the entry got the status "missing", which is correct. When I re-added the same disk (actually I used add, re-add doesn't work with containers) the "missing" status isn't cleared, as can be seen. But it is recognised as belonging to its original slot, albeit the missing status isn't cleared, the other status (failed, offline) can stay as they are. When I now add another disk (a spare) the slot of the missing disk is re-used, as it is marked "missing". Only by removing that disk, zeroing the superblock and adding it again, i.e. effectively adding a new disk, the total amount of slots is increased to 6. just my two cents, Albert On 1 August 2012 10:29, Albert Pauw <albert.pauw@gmail.com> wrote: > Hi Neil, > > here is a procedure which shows you another problem. It has to do with the > table produced at the end of the mdadm -E command, showing the disks and > their status. Seems when a disk has failed and another added, the failed one > disappears. > > Hope you can find the problem and fix it. > > Regards, > > Albert > > Here is the exact procedure which shows the problem: > > Create a container with 5 disks: > > mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5] > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online > 1 6de79cb6 479232K /dev/loop2 Global-Spare/Online > 2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online > 3 0be2d310 479232K /dev/loop4 Global-Spare/Online > 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online > > > Create a RAID 5 set of 3 disks in container: > > mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Online > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 Global-Spare/Online > 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online > > > Create a RAID 1 set of 2 disks in container: > > mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Online > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Fail first disk in RAID 5 set: > > mdadm -f /dev/md0 /dev/loop1 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Remove failed disk: > > mdadm -r /dev/md0 /dev/loop1 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K active/Offline, Failed, Missing > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Add failed disk back: > > mdadm -a --force /dev/md0 /dev/loop1 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed, Missing > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Add spare disk to container: > > mdadm -a --force /dev/md0 /dev/loop6 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 6de79cb6 479232K /dev/loop2 active/Online > 1 b5fd1d6c 479232K /dev/loop3 active/Online > 2 0be2d310 479232K /dev/loop4 active/Online > 3 5d8ac3d0 479232K /dev/loop5 active/Online > 4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding > > This is wrong! Physical disks should be 6 now! > > Removed failed disk (which is missing from list now!) again, zero superblock > and add again: > > mdadm -r /dev/md0 /dev/loop1 > mdadm --zero-superblock /dev/loop1 > mdadm -a --force /dev/md0 /dev/loop1 > > > Physical Disks : 6 > Number RefNo Size Device Type/State > 0 6de79cb6 479232K /dev/loop2 active/Online > 1 b5fd1d6c 479232K /dev/loop3 active/Online > 2 0be2d310 479232K /dev/loop4 active/Online > 3 5d8ac3d0 479232K /dev/loop5 active/Online > 4 1dcfe3cf 479232K /dev/loop6 active/Online > 5 8147a3ef 479232K /dev/loop1 Global-Spare/Online > > And there they are, all 6 of them. > > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ddf failed disk disappears after adding spare 2012-08-01 16:46 ` Albert Pauw @ 2012-08-01 17:27 ` Albert Pauw 0 siblings, 0 replies; 4+ messages in thread From: Albert Pauw @ 2012-08-01 17:27 UTC (permalink / raw) To: linux-raid, neilb Sorry again, just noticed that the removing of the slot of the missing drive is triggered by a rebuild. In fact, even a failed (but not missing) drive is removed as well. I noticed this by the following: - started with 6 disks - created md0 with 5 disks - failed one disk in md0 - the mdadm -E table is shown very shortly with 6 disks, one failed, but when the rebuild kicks in, the failed disk entry is removed, 5 entries remain. Albert On 1 August 2012 18:46, Albert Pauw <albert.pauw@gmail.com> wrote: > Hi Neil, > > looking at it again I think the following happened: > > When the disk was removed, the entry got the status "missing", which is correct. > When I re-added the same disk (actually I used add, re-add doesn't > work with containers) the "missing" status isn't cleared, as can be > seen. > But it is recognised as belonging to its original slot, albeit the > missing status isn't cleared, the other status (failed, offline) can > stay as they are. > > When I now add another disk (a spare) the slot of the missing disk is > re-used, as it is marked "missing". Only by removing that disk, > zeroing > the superblock and adding it again, i.e. effectively adding a new > disk, the total amount of slots is increased to 6. > > > > just my two cents, > > Albert > > On 1 August 2012 10:29, Albert Pauw <albert.pauw@gmail.com> wrote: >> Hi Neil, >> >> here is a procedure which shows you another problem. It has to do with the >> table produced at the end of the mdadm -E command, showing the disks and >> their status. Seems when a disk has failed and another added, the failed one >> disappears. >> >> Hope you can find the problem and fix it. >> >> Regards, >> >> Albert >> >> Here is the exact procedure which shows the problem: >> >> Create a container with 5 disks: >> >> mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5] >> >> Physical Disks : 5 >> Number RefNo Size Device Type/State >> 0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online >> 1 6de79cb6 479232K /dev/loop2 Global-Spare/Online >> 2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online >> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online >> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online >> >> >> Create a RAID 5 set of 3 disks in container: >> >> mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127 >> >> Physical Disks : 5 >> Number RefNo Size Device Type/State >> 0 d1c8c16e 479232K /dev/loop1 active/Online >> 1 6de79cb6 479232K /dev/loop2 active/Online >> 2 b5fd1d6c 479232K /dev/loop3 active/Online >> 3 0be2d310 479232K /dev/loop4 Global-Spare/Online >> 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online >> >> >> Create a RAID 1 set of 2 disks in container: >> >> mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127 >> >> Physical Disks : 5 >> Number RefNo Size Device Type/State >> 0 d1c8c16e 479232K /dev/loop1 active/Online >> 1 6de79cb6 479232K /dev/loop2 active/Online >> 2 b5fd1d6c 479232K /dev/loop3 active/Online >> 3 0be2d310 479232K /dev/loop4 active/Online >> 4 5d8ac3d0 479232K /dev/loop5 active/Online >> >> >> Fail first disk in RAID 5 set: >> >> mdadm -f /dev/md0 /dev/loop1 >> >> Physical Disks : 5 >> Number RefNo Size Device Type/State >> 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed >> 1 6de79cb6 479232K /dev/loop2 active/Online >> 2 b5fd1d6c 479232K /dev/loop3 active/Online >> 3 0be2d310 479232K /dev/loop4 active/Online >> 4 5d8ac3d0 479232K /dev/loop5 active/Online >> >> >> Remove failed disk: >> >> mdadm -r /dev/md0 /dev/loop1 >> >> Physical Disks : 5 >> Number RefNo Size Device Type/State >> 0 d1c8c16e 479232K active/Offline, Failed, Missing >> 1 6de79cb6 479232K /dev/loop2 active/Online >> 2 b5fd1d6c 479232K /dev/loop3 active/Online >> 3 0be2d310 479232K /dev/loop4 active/Online >> 4 5d8ac3d0 479232K /dev/loop5 active/Online >> >> >> Add failed disk back: >> >> mdadm -a --force /dev/md0 /dev/loop1 >> >> Physical Disks : 5 >> Number RefNo Size Device Type/State >> 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed, Missing >> 1 6de79cb6 479232K /dev/loop2 active/Online >> 2 b5fd1d6c 479232K /dev/loop3 active/Online >> 3 0be2d310 479232K /dev/loop4 active/Online >> 4 5d8ac3d0 479232K /dev/loop5 active/Online >> >> >> Add spare disk to container: >> >> mdadm -a --force /dev/md0 /dev/loop6 >> >> Physical Disks : 5 >> Number RefNo Size Device Type/State >> 0 6de79cb6 479232K /dev/loop2 active/Online >> 1 b5fd1d6c 479232K /dev/loop3 active/Online >> 2 0be2d310 479232K /dev/loop4 active/Online >> 3 5d8ac3d0 479232K /dev/loop5 active/Online >> 4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding >> >> This is wrong! Physical disks should be 6 now! >> >> Removed failed disk (which is missing from list now!) again, zero superblock >> and add again: >> >> mdadm -r /dev/md0 /dev/loop1 >> mdadm --zero-superblock /dev/loop1 >> mdadm -a --force /dev/md0 /dev/loop1 >> >> >> Physical Disks : 6 >> Number RefNo Size Device Type/State >> 0 6de79cb6 479232K /dev/loop2 active/Online >> 1 b5fd1d6c 479232K /dev/loop3 active/Online >> 2 0be2d310 479232K /dev/loop4 active/Online >> 3 5d8ac3d0 479232K /dev/loop5 active/Online >> 4 1dcfe3cf 479232K /dev/loop6 active/Online >> 5 8147a3ef 479232K /dev/loop1 Global-Spare/Online >> >> And there they are, all 6 of them. >> >> ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ddf failed disk disappears after adding spare 2012-08-01 8:29 ddf failed disk disappears after adding spare Albert Pauw 2012-08-01 16:46 ` Albert Pauw @ 2012-08-14 23:39 ` NeilBrown 1 sibling, 0 replies; 4+ messages in thread From: NeilBrown @ 2012-08-14 23:39 UTC (permalink / raw) To: Albert Pauw; +Cc: linux-raid [-- Attachment #1: Type: text/plain, Size: 5452 bytes --] On Wed, 01 Aug 2012 10:29:51 +0200 Albert Pauw <albert.pauw@gmail.com> wrote: > Hi Neil, > > here is a procedure which shows you another problem. It has to do with > the table produced at the end of the mdadm -E command, showing the disks > and their status. Seems when a disk has failed and another added, the > failed one disappears. > > Hope you can find the problem and fix it. > > Regards, > > Albert > > Here is the exact procedure which shows the problem: > > Create a container with 5 disks: > > mdadm -CR /dev/md127 -e ddf -l container -n 5 /dev/loop[1-5] > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 Global-Spare/Online > 1 6de79cb6 479232K /dev/loop2 Global-Spare/Online > 2 b5fd1d6c 479232K /dev/loop3 Global-Spare/Online > 3 0be2d310 479232K /dev/loop4 Global-Spare/Online > 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online > > > Create a RAID 5 set of 3 disks in container: > > mdadm -CR /dev/md0 -l 5 -n 3 /dev/md127 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Online > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 Global-Spare/Online > 4 5d8ac3d0 479232K /dev/loop5 Global-Spare/Online > > > Create a RAID 1 set of 2 disks in container: > > mdadm -CR /dev/md1 -l 1 -n 2 /dev/md127 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Online > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Fail first disk in RAID 5 set: > > mdadm -f /dev/md0 /dev/loop1 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Offline, Failed > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Remove failed disk: > > mdadm -r /dev/md0 /dev/loop1 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K active/Offline, > Failed, Missing > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Add failed disk back: > > mdadm -a --force /dev/md0 /dev/loop1 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 d1c8c16e 479232K /dev/loop1 active/Offline, > Failed, Missing > 1 6de79cb6 479232K /dev/loop2 active/Online > 2 b5fd1d6c 479232K /dev/loop3 active/Online > 3 0be2d310 479232K /dev/loop4 active/Online > 4 5d8ac3d0 479232K /dev/loop5 active/Online > > > Add spare disk to container: > > mdadm -a --force /dev/md0 /dev/loop6 > > Physical Disks : 5 > Number RefNo Size Device Type/State > 0 6de79cb6 479232K /dev/loop2 active/Online > 1 b5fd1d6c 479232K /dev/loop3 active/Online > 2 0be2d310 479232K /dev/loop4 active/Online > 3 5d8ac3d0 479232K /dev/loop5 active/Online > 4 1dcfe3cf 479232K /dev/loop6 active/Online, Rebuilding > > This is wrong! Physical disks should be 6 now! Whenever we add a device to the ddf we currently remove any record of any failed and missing device. We have to forget about devices that have disappeared at some stage, and this seems like a good place. The problem here is that a device that is in the array is marked as 'missing'. This due to the bug I mentioned in the previous email. Currently worked around by --zeroing the device before adding it. > > Removed failed disk (which is missing from list now!) again, zero > superblock and add again: > > mdadm -r /dev/md0 /dev/loop1 > mdadm --zero-superblock /dev/loop1 > mdadm -a --force /dev/md0 /dev/loop1 > > > Physical Disks : 6 > Number RefNo Size Device Type/State > 0 6de79cb6 479232K /dev/loop2 active/Online > 1 b5fd1d6c 479232K /dev/loop3 active/Online > 2 0be2d310 479232K /dev/loop4 active/Online > 3 5d8ac3d0 479232K /dev/loop5 active/Online > 4 1dcfe3cf 479232K /dev/loop6 active/Online > 5 8147a3ef 479232K /dev/loop1 Global-Spare/Online > > And there they are, all 6 of them. > NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-14 23:39 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-08-01 8:29 ddf failed disk disappears after adding spare Albert Pauw 2012-08-01 16:46 ` Albert Pauw 2012-08-01 17:27 ` Albert Pauw 2012-08-14 23:39 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).