* Error in rebuild of two "layered" md devices in container
@ 2012-08-01 17:52 Albert Pauw
2012-08-01 18:34 ` Albert Pauw
2012-08-14 23:43 ` NeilBrown
0 siblings, 2 replies; 4+ messages in thread
From: Albert Pauw @ 2012-08-01 17:52 UTC (permalink / raw)
To: linux-raid, neilb
Hi Neil,
found another bug.
- Created a container with six disks
- Created two md devices in it:
mdadm -CR /dev/md0 -l 6 -n 6 -z 50M
mdadm -CR /dev/md1 -l 5 -n 6 -z 50M
The md devices are "layered" in the container across all disks.
They both get build and are online.
- Fail one disk, both md devices are affected
- Remove disk
- Clear superblock of removed disk
- Add disk again (in essence, I just added a spare disk)
Now comes the error:
- md0 is rebuild
- md1 is NOT rebuild
Personalities : [raid6] [raid5] [raid4] [raid1]
md1 : active raid6 sde[6] sdg[5] sdf[4] sdd[2] sdc[1] sdb[0]
204800 blocks super external:/md127/1 level 6, 512k chunk,
algorithm 10 [6/6] [UUUUUU]
md0 : active raid5 sdb[6] sdg[5] sdf[4] sdd[2] sdc[1]
256000 blocks super external:/md127/0 level 5, 512k chunk,
algorithm 2 [6/5] [UUU_UU]
md127 : inactive sde[3](S) sdb[0](S) sdg[5](S) sdf[4](S) sdd[2](S) sdc[1](S)
196608 blocks super external:ddf
unused devices: <none>
Physical Disks : 6
Number RefNo Size Device Type/State
0 cb5ea6c1 1015808K /dev/sdc active/Online
1 a7f8ed2f 1015808K /dev/sdd active/Online
2 f769a815 1015808K /dev/sdf active/Online
3 025e6835 1015808K /dev/sdg active/Online
4 b22e9e4d 1015808K /dev/sdb active/Online
5 b4cccecc 1015808K /dev/sde active/Online
Albert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Error in rebuild of two "layered" md devices in container
2012-08-01 17:52 Error in rebuild of two "layered" md devices in container Albert Pauw
@ 2012-08-01 18:34 ` Albert Pauw
2012-08-14 23:43 ` NeilBrown
1 sibling, 0 replies; 4+ messages in thread
From: Albert Pauw @ 2012-08-01 18:34 UTC (permalink / raw)
To: linux-raid, neilb
Extra note:
as the mdadm -f command has to be addressed to a subarray, only this
one is going to be rebuild, even when other subarrays are depending on
this disk, as is in this case.
In the case below I failed the disk using mdadm -f /dev/md0 /dev/....
Albert
On 1 August 2012 19:52, Albert Pauw <albert.pauw@gmail.com> wrote:
> Hi Neil,
>
> found another bug.
>
> - Created a container with six disks
> - Created two md devices in it:
>
> mdadm -CR /dev/md0 -l 6 -n 6 -z 50M
> mdadm -CR /dev/md1 -l 5 -n 6 -z 50M
>
> The md devices are "layered" in the container across all disks.
>
> They both get build and are online.
>
> - Fail one disk, both md devices are affected
> - Remove disk
> - Clear superblock of removed disk
> - Add disk again (in essence, I just added a spare disk)
>
> Now comes the error:
>
> - md0 is rebuild
> - md1 is NOT rebuild
>
> Personalities : [raid6] [raid5] [raid4] [raid1]
> md1 : active raid6 sde[6] sdg[5] sdf[4] sdd[2] sdc[1] sdb[0]
> 204800 blocks super external:/md127/1 level 6, 512k chunk,
> algorithm 10 [6/6] [UUUUUU]
>
> md0 : active raid5 sdb[6] sdg[5] sdf[4] sdd[2] sdc[1]
> 256000 blocks super external:/md127/0 level 5, 512k chunk,
> algorithm 2 [6/5] [UUU_UU]
>
> md127 : inactive sde[3](S) sdb[0](S) sdg[5](S) sdf[4](S) sdd[2](S) sdc[1](S)
> 196608 blocks super external:ddf
>
> unused devices: <none>
>
> Physical Disks : 6
> Number RefNo Size Device Type/State
> 0 cb5ea6c1 1015808K /dev/sdc active/Online
> 1 a7f8ed2f 1015808K /dev/sdd active/Online
> 2 f769a815 1015808K /dev/sdf active/Online
> 3 025e6835 1015808K /dev/sdg active/Online
> 4 b22e9e4d 1015808K /dev/sdb active/Online
> 5 b4cccecc 1015808K /dev/sde active/Online
>
>
> Albert
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Error in rebuild of two "layered" md devices in container
2012-08-01 17:52 Error in rebuild of two "layered" md devices in container Albert Pauw
2012-08-01 18:34 ` Albert Pauw
@ 2012-08-14 23:43 ` NeilBrown
2012-08-15 20:04 ` Albert Pauw
1 sibling, 1 reply; 4+ messages in thread
From: NeilBrown @ 2012-08-14 23:43 UTC (permalink / raw)
To: Albert Pauw; +Cc: linux-raid
[-- Attachment #1: Type: text/plain, Size: 2216 bytes --]
On Wed, 1 Aug 2012 19:52:51 +0200 Albert Pauw <albert.pauw@gmail.com> wrote:
> Hi Neil,
>
> found another bug.
>
> - Created a container with six disks
> - Created two md devices in it:
>
> mdadm -CR /dev/md0 -l 6 -n 6 -z 50M
> mdadm -CR /dev/md1 -l 5 -n 6 -z 50M
>
> The md devices are "layered" in the container across all disks.
>
> They both get build and are online.
>
> - Fail one disk, both md devices are affected
> - Remove disk
> - Clear superblock of removed disk
> - Add disk again (in essence, I just added a spare disk)
>
> Now comes the error:
>
> - md0 is rebuild
> - md1 is NOT rebuild
The reason for this is somewhat messy.
mdadm will currently only add a 'spare' device to an array which needs a
replacement device.
In DDF the whole device is either 'active' or 'spare'. There isn't a concept
of 'partly active, partly spare'.
So when mdadm adds part of the disk to one array it stops being spare and
started being active. So when mdadm looks for a spare to add to the second
array, there are no spare devices.
I can hack around it by allowing any non-failed device to be considered as a
spare but I need to find a better solution. That might take a while. I've
made a note on my to-do list, but it is a rather long list.
Thanks,
NeilBrown
diff --git a/super-ddf.c b/super-ddf.c
index d006a04..11b98f7 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -2616,7 +2616,7 @@ static int validate_geometry_ddf(struct supertype *st,
if (chunk && *chunk == UnSet)
*chunk = DEFAULT_CHUNK;
-
+ if (level == -1000000) level = LEVEL_CONTAINER;
if (level == LEVEL_CONTAINER) {
/* Must be a fresh device to add to a container */
return validate_geometry_ddf_container(st, level, layout,
@@ -3701,6 +3701,10 @@ static struct mdinfo *ddf_activate_spare(struct active_array *a,
} else if (ddf->phys->entries[dl->pdnum].type &
__cpu_to_be16(DDF_Global_Spare)) {
is_global = 1;
+ } else if (!(ddf->phys->entries[dl->pdnum].state &
+ __cpu_to_be16(DDF_Failed))) {
+ /* we can possibly use some of this */
+ is_global = 1;
}
if ( ! (is_dedicated ||
(is_global && global_ok))) {
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: Error in rebuild of two "layered" md devices in container
2012-08-14 23:43 ` NeilBrown
@ 2012-08-15 20:04 ` Albert Pauw
0 siblings, 0 replies; 4+ messages in thread
From: Albert Pauw @ 2012-08-15 20:04 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid
Hi Neil,
as a first test I can confirm that this fixes the problem with the
layered md devices in a container.
So far so good on this.
Thanks,
regards,
Albert
On 08/15/2012 01:43 AM, NeilBrown wrote:
> On Wed, 1 Aug 2012 19:52:51 +0200 Albert Pauw <albert.pauw@gmail.com> wrote:
>
>> Hi Neil,
>>
>> found another bug.
>>
>> - Created a container with six disks
>> - Created two md devices in it:
>>
>> mdadm -CR /dev/md0 -l 6 -n 6 -z 50M
>> mdadm -CR /dev/md1 -l 5 -n 6 -z 50M
>>
>> The md devices are "layered" in the container across all disks.
>>
>> They both get build and are online.
>>
>> - Fail one disk, both md devices are affected
>> - Remove disk
>> - Clear superblock of removed disk
>> - Add disk again (in essence, I just added a spare disk)
>>
>> Now comes the error:
>>
>> - md0 is rebuild
>> - md1 is NOT rebuild
> The reason for this is somewhat messy.
> mdadm will currently only add a 'spare' device to an array which needs a
> replacement device.
> In DDF the whole device is either 'active' or 'spare'. There isn't a concept
> of 'partly active, partly spare'.
> So when mdadm adds part of the disk to one array it stops being spare and
> started being active. So when mdadm looks for a spare to add to the second
> array, there are no spare devices.
>
> I can hack around it by allowing any non-failed device to be considered as a
> spare but I need to find a better solution. That might take a while. I've
> made a note on my to-do list, but it is a rather long list.
>
> Thanks,
> NeilBrown
>
> diff --git a/super-ddf.c b/super-ddf.c
> index d006a04..11b98f7 100644
> --- a/super-ddf.c
> +++ b/super-ddf.c
> @@ -2616,7 +2616,7 @@ static int validate_geometry_ddf(struct supertype *st,
> if (chunk && *chunk == UnSet)
> *chunk = DEFAULT_CHUNK;
>
> -
> + if (level == -1000000) level = LEVEL_CONTAINER;
> if (level == LEVEL_CONTAINER) {
> /* Must be a fresh device to add to a container */
> return validate_geometry_ddf_container(st, level, layout,
> @@ -3701,6 +3701,10 @@ static struct mdinfo *ddf_activate_spare(struct active_array *a,
> } else if (ddf->phys->entries[dl->pdnum].type &
> __cpu_to_be16(DDF_Global_Spare)) {
> is_global = 1;
> + } else if (!(ddf->phys->entries[dl->pdnum].state &
> + __cpu_to_be16(DDF_Failed))) {
> + /* we can possibly use some of this */
> + is_global = 1;
> }
> if ( ! (is_dedicated ||
> (is_global && global_ok))) {
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-15 20:04 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-01 17:52 Error in rebuild of two "layered" md devices in container Albert Pauw
2012-08-01 18:34 ` Albert Pauw
2012-08-14 23:43 ` NeilBrown
2012-08-15 20:04 ` Albert Pauw
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).