[PATCH] imsm: fix: correct checking newly missing disks

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] imsm: fix: correct checking newly missing disks
@ 2011-11-14 14:52 Lukasz Dorau
  2011-11-15  4:43 ` NeilBrown
  2011-11-30  2:26 ` Dan Williams
  0 siblings, 2 replies; 8+ messages in thread
From: Lukasz Dorau @ 2011-11-14 14:52 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, dan.j.williams, marcin.labun, ed.ciechanowski

The problem occurs when RAID10 array under rebuild
(after one disk fails) is assembled incrementally.
Mdadm tries to start array just after adding the third disk
and the volume is assembled incorrectly (in degraded state).

The cause is that container_enough depends on
newly missing disks which are checked incorrectly now.
They should be checked using always the first map.

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
---
 super-intel.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 4ebee78..511a32a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
 
 		failed = imsm_count_failed(super, dev);
 		state = imsm_check_degraded(super, dev, failed);
-		map = get_imsm_map(dev, dev->vol.migr_state);
+		map = get_imsm_map(dev, 0);
 
 		/* any newly missing disks?
 		 * (catches single-degraded vs double-degraded)
 		 */
 		for (j = 0; j < map->num_members; j++) {
-			__u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
+			__u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);
 			__u32 idx = ord_to_idx(ord);
 
 			if (!(ord & IMSM_ORD_REBUILD) &&


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] imsm: fix: correct checking newly missing disks
  2011-11-14 14:52 [PATCH] imsm: fix: correct checking newly missing disks Lukasz Dorau
@ 2011-11-15  4:43 ` NeilBrown
  2011-11-30  2:26 ` Dan Williams
  1 sibling, 0 replies; 8+ messages in thread
From: NeilBrown @ 2011-11-15  4:43 UTC (permalink / raw)
  To: Lukasz Dorau; +Cc: linux-raid, dan.j.williams, marcin.labun, ed.ciechanowski

[-- Attachment #1: Type: text/plain, Size: 1435 bytes --]

On Mon, 14 Nov 2011 15:52:52 +0100 Lukasz Dorau <lukasz.dorau@intel.com>
wrote:

> The problem occurs when RAID10 array under rebuild
> (after one disk fails) is assembled incrementally.
> Mdadm tries to start array just after adding the third disk
> and the volume is assembled incorrectly (in degraded state).
> 
> The cause is that container_enough depends on
> newly missing disks which are checked incorrectly now.
> They should be checked using always the first map.
> 
> Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
> ---
>  super-intel.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/super-intel.c b/super-intel.c
> index 4ebee78..511a32a 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
>  
>  		failed = imsm_count_failed(super, dev);
>  		state = imsm_check_degraded(super, dev, failed);
> -		map = get_imsm_map(dev, dev->vol.migr_state);
> +		map = get_imsm_map(dev, 0);
>  
>  		/* any newly missing disks?
>  		 * (catches single-degraded vs double-degraded)
>  		 */
>  		for (j = 0; j < map->num_members; j++) {
> -			__u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
> +			__u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);
>  			__u32 idx = ord_to_idx(ord);
>  
>  			if (!(ord & IMSM_ORD_REBUILD) &&

Applied,
thanks,

NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] imsm: fix: correct checking newly missing disks
  2011-11-14 14:52 [PATCH] imsm: fix: correct checking newly missing disks Lukasz Dorau
  2011-11-15  4:43 ` NeilBrown
@ 2011-11-30  2:26 ` Dan Williams
  2011-12-01 14:23   ` Dorau, Lukasz
  2011-12-01 14:40   ` Dorau, Lukasz
  1 sibling, 2 replies; 8+ messages in thread
From: Dan Williams @ 2011-11-30  2:26 UTC (permalink / raw)
  To: Lukasz Dorau; +Cc: neilb, linux-raid, marcin.labun, ed.ciechanowski

On Mon, Nov 14, 2011 at 6:52 AM, Lukasz Dorau <lukasz.dorau@intel.com> wrote:
> The problem occurs when RAID10 array under rebuild
> (after one disk fails) is assembled incrementally.
> Mdadm tries to start array just after adding the third disk
> and the volume is assembled incorrectly (in degraded state).
>
> The cause is that container_enough depends on
> newly missing disks which are checked incorrectly now.
> They should be checked using always the first map.
>
> Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
> ---
>  super-intel.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index 4ebee78..511a32a 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
>
>                failed = imsm_count_failed(super, dev);
>                state = imsm_check_degraded(super, dev, failed);
> -               map = get_imsm_map(dev, dev->vol.migr_state);
> +               map = get_imsm_map(dev, 0);
>
>                /* any newly missing disks?
>                 * (catches single-degraded vs double-degraded)
>                 */
>                for (j = 0; j < map->num_members; j++) {
> -                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
> +                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);

This looks wrong.  I noticed this when looking over Przemyslaw's patch [1].

map[0] always contains the destination state of the migration so the
most reliable source for looking for out of sync disks is map[1].

--
Dan

[1]: http://marc.info/?l=linux-raid&m=132206766827484&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] imsm: fix: correct checking newly missing disks
  2011-11-30  2:26 ` Dan Williams
@ 2011-12-01 14:23   ` Dorau, Lukasz
  2011-12-06  1:10     ` NeilBrown
  2011-12-01 14:40   ` Dorau, Lukasz
  1 sibling, 1 reply; 8+ messages in thread
From: Dorau, Lukasz @ 2011-12-01 14:23 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: neilb@suse.de, linux-raid@vger.kernel.org, Labun, Marcin,
	Ciechanowski, Ed, Kwolek, Adam



Pozdrawiam,
Łukasz


> -----Original Message-----
> From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On Behalf
> Of Dan Williams
> Sent: Wednesday, November 30, 2011 3:27 AM
> To: Dorau, Lukasz
> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Labun, Marcin; Ciechanowski, Ed
> Subject: Re: [PATCH] imsm: fix: correct checking newly missing disks
> 
> On Mon, Nov 14, 2011 at 6:52 AM, Lukasz Dorau <lukasz.dorau@intel.com>
> wrote:
> > The problem occurs when RAID10 array under rebuild
> > (after one disk fails) is assembled incrementally.
> > Mdadm tries to start array just after adding the third disk
> > and the volume is assembled incorrectly (in degraded state).
> >
> > The cause is that container_enough depends on
> > newly missing disks which are checked incorrectly now.
> > They should be checked using always the first map.
> >
> > Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
> > ---
> >  super-intel.c |    4 ++--
> >  1 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/super-intel.c b/super-intel.c
> > index 4ebee78..511a32a 100644
> > --- a/super-intel.c
> > +++ b/super-intel.c
> > @@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct supertype
> *st, struct mdinfo *info, char *
> >
> >                failed = imsm_count_failed(super, dev);
> >                state = imsm_check_degraded(super, dev, failed);
> > -               map = get_imsm_map(dev, dev->vol.migr_state);
> > +               map = get_imsm_map(dev, 0);
> >
> >                /* any newly missing disks?
> >                 * (catches single-degraded vs double-degraded)
> >                 */
> >                for (j = 0; j < map->num_members; j++) {
> > -                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
> > +                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);
> 
> This looks wrong.  I noticed this when looking over Przemyslaw's patch [1].
> 
> map[0] always contains the destination state of the migration so the
> most reliable source for looking for out of sync disks is map[1].
> 

I am convinced that the patch is good. 
We are looking for information what was the state of array during migration (before it was stopped), so we have to use map[0].
map[1] contains information about the state of array before migration, which we do not need.

Regards,
Lukasz

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] imsm: fix: correct checking newly missing disks
  2011-12-01 14:23   ` Dorau, Lukasz
@ 2011-12-06  1:10     ` NeilBrown
  2011-12-06  2:19       ` Williams, Dan J
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2011-12-06  1:10 UTC (permalink / raw)
  To: Dorau, Lukasz
  Cc: Williams, Dan J, linux-raid@vger.kernel.org, Labun, Marcin,
	Ciechanowski, Ed, Kwolek, Adam

[-- Attachment #1: Type: text/plain, Size: 1783 bytes --]

On Thu, 1 Dec 2011 14:23:16 +0000 "Dorau, Lukasz" <lukasz.dorau@intel.com>
wrote:
> > > diff --git a/super-intel.c b/super-intel.c
> > > index 4ebee78..511a32a 100644
> > > --- a/super-intel.c
> > > +++ b/super-intel.c
> > > @@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct supertype
> > *st, struct mdinfo *info, char *
> > >
> > >                failed = imsm_count_failed(super, dev);
> > >                state = imsm_check_degraded(super, dev, failed);
> > > -               map = get_imsm_map(dev, dev->vol.migr_state);
> > > +               map = get_imsm_map(dev, 0);
> > >
> > >                /* any newly missing disks?
> > >                 * (catches single-degraded vs double-degraded)
> > >                 */
> > >                for (j = 0; j < map->num_members; j++) {
> > > -                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
> > > +                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);
> > 
> > This looks wrong.  I noticed this when looking over Przemyslaw's patch [1].
> > 
> > map[0] always contains the destination state of the migration so the
> > most reliable source for looking for out of sync disks is map[1].
> > 
> 
> I am convinced that the patch is good. 
> We are looking for information what was the state of array during migration (before it was stopped), so we have to use map[0].
> map[1] contains information about the state of array before migration, which we do not need.
> 
> Regards,
> Lukasz

Hi,
 do we have agreement on this?  Dan - do you stand by your original concern
 or have you seen the light :-)

The patch is in, but I'd like to be sure it is right and to be honest I
haven't followed the dance of the maps too closely...

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] imsm: fix: correct checking newly missing disks
  2011-12-06  1:10     ` NeilBrown
@ 2011-12-06  2:19       ` Williams, Dan J
  2011-12-06  2:22         ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Williams, Dan J @ 2011-12-06  2:19 UTC (permalink / raw)
  To: NeilBrown
  Cc: Dorau, Lukasz, linux-raid@vger.kernel.org, Labun, Marcin,
	Ciechanowski, Ed, Kwolek, Adam

On Mon, Dec 5, 2011 at 5:10 PM, NeilBrown <neilb@suse.de> wrote:
> On Thu, 1 Dec 2011 14:23:16 +0000 "Dorau, Lukasz" <lukasz.dorau@intel.com>
> wrote:
>> > > diff --git a/super-intel.c b/super-intel.c
>> > > index 4ebee78..511a32a 100644
>> > > --- a/super-intel.c
>> > > +++ b/super-intel.c
>> > > @@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct supertype
>> > *st, struct mdinfo *info, char *
>> > >
>> > >                failed = imsm_count_failed(super, dev);
>> > >                state = imsm_check_degraded(super, dev, failed);
>> > > -               map = get_imsm_map(dev, dev->vol.migr_state);
>> > > +               map = get_imsm_map(dev, 0);
>> > >
>> > >                /* any newly missing disks?
>> > >                 * (catches single-degraded vs double-degraded)
>> > >                 */
>> > >                for (j = 0; j < map->num_members; j++) {
>> > > -                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
>> > > +                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);
>> >
>> > This looks wrong.  I noticed this when looking over Przemyslaw's patch [1].
>> >
>> > map[0] always contains the destination state of the migration so the
>> > most reliable source for looking for out of sync disks is map[1].
>> >
>>
>> I am convinced that the patch is good.
>> We are looking for information what was the state of array during migration (before it was stopped), so we have to use map[0].
>> map[1] contains information about the state of array before migration, which we do not need.
>>
>> Regards,
>> Lukasz
>
> Hi,
>  do we have agreement on this?  Dan - do you stand by your original concern
>  or have you seen the light :-)
>
> The patch is in, but I'd like to be sure it is right and to be honest I
> haven't followed the dance of the maps too closely...

Lukasz is right.  We want to find out if starting the array with the
current list of disks in the container would regress the state of the
array recorded in the metadata.  map[0] should always record the best
possible state of all the slots in the array if any of those have gone
missing we don't want incremental assembly to proceed.

Sorry for the noise, my point was 'correct' in isolation but it missed
that the context is looking for the most optimistic view of the disk.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] imsm: fix: correct checking newly missing disks
  2011-12-06  2:19       ` Williams, Dan J
@ 2011-12-06  2:22         ` NeilBrown
  0 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2011-12-06  2:22 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: Dorau, Lukasz, linux-raid@vger.kernel.org, Labun, Marcin,
	Ciechanowski, Ed, Kwolek, Adam

[-- Attachment #1: Type: text/plain, Size: 2598 bytes --]

On Mon, 5 Dec 2011 18:19:09 -0800 "Williams, Dan J"
<dan.j.williams@intel.com> wrote:

> On Mon, Dec 5, 2011 at 5:10 PM, NeilBrown <neilb@suse.de> wrote:
> > On Thu, 1 Dec 2011 14:23:16 +0000 "Dorau, Lukasz" <lukasz.dorau@intel.com>
> > wrote:
> >> > > diff --git a/super-intel.c b/super-intel.c
> >> > > index 4ebee78..511a32a 100644
> >> > > --- a/super-intel.c
> >> > > +++ b/super-intel.c
> >> > > @@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct supertype
> >> > *st, struct mdinfo *info, char *
> >> > >
> >> > >                failed = imsm_count_failed(super, dev);
> >> > >                state = imsm_check_degraded(super, dev, failed);
> >> > > -               map = get_imsm_map(dev, dev->vol.migr_state);
> >> > > +               map = get_imsm_map(dev, 0);
> >> > >
> >> > >                /* any newly missing disks?
> >> > >                 * (catches single-degraded vs double-degraded)
> >> > >                 */
> >> > >                for (j = 0; j < map->num_members; j++) {
> >> > > -                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
> >> > > +                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);
> >> >
> >> > This looks wrong.  I noticed this when looking over Przemyslaw's patch [1].
> >> >
> >> > map[0] always contains the destination state of the migration so the
> >> > most reliable source for looking for out of sync disks is map[1].
> >> >
> >>
> >> I am convinced that the patch is good.
> >> We are looking for information what was the state of array during migration (before it was stopped), so we have to use map[0].
> >> map[1] contains information about the state of array before migration, which we do not need.
> >>
> >> Regards,
> >> Lukasz
> >
> > Hi,
> >  do we have agreement on this?  Dan - do you stand by your original concern
> >  or have you seen the light :-)
> >
> > The patch is in, but I'd like to be sure it is right and to be honest I
> > haven't followed the dance of the maps too closely...
> 
> Lukasz is right.  We want to find out if starting the array with the
> current list of disks in the container would regress the state of the
> array recorded in the metadata.  map[0] should always record the best
> possible state of all the slots in the array if any of those have gone
> missing we don't want incremental assembly to proceed.
> 
> Sorry for the noise, my point was 'correct' in isolation but it missed
> that the context is looking for the most optimistic view of the disk.

Great - thanks for clearing that up.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH] imsm: fix: correct checking newly missing disks
  2011-11-30  2:26 ` Dan Williams
  2011-12-01 14:23   ` Dorau, Lukasz
@ 2011-12-01 14:40   ` Dorau, Lukasz
  1 sibling, 0 replies; 8+ messages in thread
From: Dorau, Lukasz @ 2011-12-01 14:40 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: neilb@suse.de, linux-raid@vger.kernel.org, Labun, Marcin,
	Ciechanowski, Ed, Kwolek, Adam

> -----Original Message-----
> From: Dorau, Lukasz
> Sent: Thursday, December 01, 2011 3:23 PM
> To: 'Dan Williams'
> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Labun, Marcin; Ciechanowski,
> Ed; Kwolek, Adam
> Subject: RE: [PATCH] imsm: fix: correct checking newly missing disks
> 
> Pozdrawiam,
> Łukasz
> 

I apologize for the words at the beginning of the last message.
They are unnecessary. It was an oversight. 
The right reply is of course at the end of the message.

Lukasz

> 
> > -----Original Message-----
> > From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On
> Behalf
> > Of Dan Williams
> > Sent: Wednesday, November 30, 2011 3:27 AM
> > To: Dorau, Lukasz
> > Cc: neilb@suse.de; linux-raid@vger.kernel.org; Labun, Marcin; Ciechanowski,
> Ed
> > Subject: Re: [PATCH] imsm: fix: correct checking newly missing disks
> >
> > On Mon, Nov 14, 2011 at 6:52 AM, Lukasz Dorau <lukasz.dorau@intel.com>
> > wrote:
> > > The problem occurs when RAID10 array under rebuild
> > > (after one disk fails) is assembled incrementally.
> > > Mdadm tries to start array just after adding the third disk
> > > and the volume is assembled incorrectly (in degraded state).
> > >
> > > The cause is that container_enough depends on
> > > newly missing disks which are checked incorrectly now.
> > > They should be checked using always the first map.
> > >
> > > Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
> > > ---
> > >  super-intel.c |    4 ++--
> > >  1 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/super-intel.c b/super-intel.c
> > > index 4ebee78..511a32a 100644
> > > --- a/super-intel.c
> > > +++ b/super-intel.c
> > > @@ -2529,13 +2529,13 @@ static void getinfo_super_imsm(struct
> supertype
> > *st, struct mdinfo *info, char *
> > >
> > >                failed = imsm_count_failed(super, dev);
> > >                state = imsm_check_degraded(super, dev, failed);
> > > -               map = get_imsm_map(dev, dev->vol.migr_state);
> > > +               map = get_imsm_map(dev, 0);
> > >
> > >                /* any newly missing disks?
> > >                 * (catches single-degraded vs double-degraded)
> > >                 */
> > >                for (j = 0; j < map->num_members; j++) {
> > > -                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
> > > +                       __u32 ord = get_imsm_ord_tbl_ent(dev, i, 0);
> >
> > This looks wrong.  I noticed this when looking over Przemyslaw's patch [1].
> >
> > map[0] always contains the destination state of the migration so the
> > most reliable source for looking for out of sync disks is map[1].
> >
> 
> I am convinced that the patch is good.
> We are looking for information what was the state of array during migration
> (before it was stopped), so we have to use map[0].
> map[1] contains information about the state of array before migration, which
> we do not need.
> 
> Regards,
> Lukasz

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-12-06  2:22 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14 14:52 [PATCH] imsm: fix: correct checking newly missing disks Lukasz Dorau
2011-11-15  4:43 ` NeilBrown
2011-11-30  2:26 ` Dan Williams
2011-12-01 14:23   ` Dorau, Lukasz
2011-12-06  1:10     ` NeilBrown
2011-12-06  2:19       ` Williams, Dan J
2011-12-06  2:22         ` NeilBrown
2011-12-01 14:40   ` Dorau, Lukasz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).