* Raid0 expansion problem in md
@ 2011-12-13 15:45 Kwolek, Adam
2011-12-14 4:42 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Kwolek, Adam @ 2011-12-13 15:45 UTC (permalink / raw)
To: neilb@suse.de; +Cc: linux-raid@vger.kernel.org
Hi Neil,
On the latest md neil_for-linus branch I've found raid0 migration problem.
During OLCE in user space everything goes fine, but in kernel process is not moved forward.
/older md works fine/
It is stopped in md in reshape_request() in line (near raid5.c:3957)
wait_event(conf->wait_for_overlap, atomic_read(&conf->reshape_stripes)==0);
I've found that this problem is a side effect of patch:
md/raid5: abort any pending parity operations when array fails.
and added line in this patch:
sh->reconstruct_state = 0;
During OLCE we are going inside because condition
if (s.failed > conf->max_degraded)
with values:
locked=1 uptodate=5 to_read=0 to_write=0 failed=2 failed_num=4,1
and sh->reconstruct_state is set to 0 (reconstruct_state_idle) from 6 (reconstruct_state_result)
When sh->reconstruct_state is not reset raid0 migration is executed without problem.
Problem is probably in not executed code for finishing reconstruction (around raid5.c:3300)
In our case field s.failed should not reach value 2 but we've got it for failed_num = 4,1.
It seems that '1' is failed disk for stripe in old array geometry and 4 is failed disk for stripe in new array geometry.
This means that degradation during reshape is counted two times /final stripe degradation is sum of old and new geometry degradation/.
When we reading (from old array) and writing (to new geometry) a degraded stripe and degradation is on different positions (raid0 OLCE case) analyse_stripe() gives
us false failure information. Possible that we should have old_failed and new_failed counters to know in what geometry (old/new) failure occurs.
Here is reproduction script:
export IMSM_NO_PLATFORM=1
#create container
mdadm -C /dev/md/imsm0 -amd -e imsm -n 4 /dev/sdb /dev/sdc /dev/sde /dev/sdd -R
#create array
mdadm -C /dev/md/raid0vol_0 -amd -l 0 --chunk 64 --size 1048 -n 1 /dev/sdb -R --force
#start reshape
mdadm --grow /dev/md/imsm0 --raid-devices 4
Please let me know your opinion.
BR
Adam
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Raid0 expansion problem in md
2011-12-13 15:45 Raid0 expansion problem in md Kwolek, Adam
@ 2011-12-14 4:42 ` NeilBrown
2011-12-14 12:15 ` Kwolek, Adam
0 siblings, 1 reply; 3+ messages in thread
From: NeilBrown @ 2011-12-14 4:42 UTC (permalink / raw)
To: Kwolek, Adam; +Cc: linux-raid@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 3491 bytes --]
On Tue, 13 Dec 2011 15:45:30 +0000 "Kwolek, Adam" <adam.kwolek@intel.com>
wrote:
> Hi Neil,
>
> On the latest md neil_for-linus branch I've found raid0 migration problem.
> During OLCE in user space everything goes fine, but in kernel process is not moved forward.
> /older md works fine/
>
> It is stopped in md in reshape_request() in line (near raid5.c:3957)
> wait_event(conf->wait_for_overlap, atomic_read(&conf->reshape_stripes)==0);
>
> I've found that this problem is a side effect of patch:
> md/raid5: abort any pending parity operations when array fails.
> and added line in this patch:
> sh->reconstruct_state = 0;
>
> During OLCE we are going inside because condition
> if (s.failed > conf->max_degraded)
> with values:
> locked=1 uptodate=5 to_read=0 to_write=0 failed=2 failed_num=4,1
>
> and sh->reconstruct_state is set to 0 (reconstruct_state_idle) from 6 (reconstruct_state_result)
> When sh->reconstruct_state is not reset raid0 migration is executed without problem.
> Problem is probably in not executed code for finishing reconstruction (around raid5.c:3300)
>
> In our case field s.failed should not reach value 2 but we've got it for failed_num = 4,1.
> It seems that '1' is failed disk for stripe in old array geometry and 4 is failed disk for stripe in new array geometry.
> This means that degradation during reshape is counted two times /final stripe degradation is sum of old and new geometry degradation/.
> When we reading (from old array) and writing (to new geometry) a degraded stripe and degradation is on different positions (raid0 OLCE case) analyse_stripe() gives
> us false failure information. Possible that we should have old_failed and new_failed counters to know in what geometry (old/new) failure occurs.
>
>
> Here is reproduction script:
>
> export IMSM_NO_PLATFORM=1
> #create container
> mdadm -C /dev/md/imsm0 -amd -e imsm -n 4 /dev/sdb /dev/sdc /dev/sde /dev/sdd -R
> #create array
> mdadm -C /dev/md/raid0vol_0 -amd -l 0 --chunk 64 --size 1048 -n 1 /dev/sdb -R --force
> #start reshape
> mdadm --grow /dev/md/imsm0 --raid-devices 4
>
>
> Please let me know your opinion.
Thanks for the excellent problem report.
I think it is best fixed by the following patch.
I also need to fixed up the calculate of 'degraded' so it doesn't say '2' in
this case, which is confusing. Then I'll commit the fixes.
Thanks,
NeilBrown
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 31670f8..858fdbb 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3065,11 +3065,17 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
}
} else if (test_bit(In_sync, &rdev->flags))
set_bit(R5_Insync, &dev->flags);
- else {
+ else if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
/* in sync if before recovery_offset */
- if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
- set_bit(R5_Insync, &dev->flags);
- }
+ set_bit(R5_Insync, &dev->flags);
+ else if (test_bit(R5_UPTODATE, &dev->flags) &&
+ test_bit(R5_Expanded, &dev->flags))
+ /* If we've reshaped into here, we assume it is Insync.
+ * We will shortly update recovery_offset to make
+ * it official.
+ */
+ set_bit(R5_Insync, &dev->flags);
+
if (rdev && test_bit(R5_WriteError, &dev->flags)) {
clear_bit(R5_Insync, &dev->flags);
if (!test_bit(Faulty, &rdev->flags)) {
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply related [flat|nested] 3+ messages in thread
* RE: Raid0 expansion problem in md
2011-12-14 4:42 ` NeilBrown
@ 2011-12-14 12:15 ` Kwolek, Adam
0 siblings, 0 replies; 3+ messages in thread
From: Kwolek, Adam @ 2011-12-14 12:15 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-raid@vger.kernel.org
> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, December 14, 2011 5:42 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Raid0 expansion problem in md
>
> On Tue, 13 Dec 2011 15:45:30 +0000 "Kwolek, Adam"
> <adam.kwolek@intel.com>
> wrote:
>
> > Hi Neil,
> >
> > On the latest md neil_for-linus branch I've found raid0 migration problem.
> > During OLCE in user space everything goes fine, but in kernel process is not
> moved forward.
> > /older md works fine/
> >
> > It is stopped in md in reshape_request() in line (near raid5.c:3957)
> > wait_event(conf->wait_for_overlap,
> > atomic_read(&conf->reshape_stripes)==0);
> >
> > I've found that this problem is a side effect of patch:
> > md/raid5: abort any pending parity operations when array fails.
> > and added line in this patch:
> > sh->reconstruct_state = 0;
> >
> > During OLCE we are going inside because condition
> > if (s.failed > conf->max_degraded) with values:
> > locked=1 uptodate=5 to_read=0 to_write=0 failed=2 failed_num=4,1
> >
> > and sh->reconstruct_state is set to 0 (reconstruct_state_idle) from 6
> > (reconstruct_state_result) When sh->reconstruct_state is not reset raid0
> migration is executed without problem.
> > Problem is probably in not executed code for finishing reconstruction
> > (around raid5.c:3300)
> >
> > In our case field s.failed should not reach value 2 but we've got it for
> failed_num = 4,1.
> > It seems that '1' is failed disk for stripe in old array geometry and 4 is failed
> disk for stripe in new array geometry.
> > This means that degradation during reshape is counted two times /final
> stripe degradation is sum of old and new geometry degradation/.
> > When we reading (from old array) and writing (to new geometry) a
> > degraded stripe and degradation is on different positions (raid0 OLCE case)
> analyse_stripe() gives us false failure information. Possible that we should
> have old_failed and new_failed counters to know in what geometry
> (old/new) failure occurs.
> >
> >
> > Here is reproduction script:
> >
> > export IMSM_NO_PLATFORM=1
> > #create container
> > mdadm -C /dev/md/imsm0 -amd -e imsm -n 4 /dev/sdb /dev/sdc /dev/sde
> > /dev/sdd -R #create array mdadm -C /dev/md/raid0vol_0 -amd -l 0
> > --chunk 64 --size 1048 -n 1 /dev/sdb -R --force #start reshape mdadm
> > --grow /dev/md/imsm0 --raid-devices 4
> >
> >
> > Please let me know your opinion.
>
> Thanks for the excellent problem report.
>
> I think it is best fixed by the following patch.
> I also need to fixed up the calculate of 'degraded' so it doesn't say '2' in this
> case, which is confusing. Then I'll commit the fixes.
>
>
> Thanks,
> NeilBrown
Yes this helps :)
Thanks Neil,
BR
Adam
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 31670f8..858fdbb
> 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3065,11 +3065,17 @@ static void analyse_stripe(struct stripe_head *sh,
> struct stripe_head_state *s)
> }
> } else if (test_bit(In_sync, &rdev->flags))
> set_bit(R5_Insync, &dev->flags);
> - else {
> + else if (sh->sector + STRIPE_SECTORS <= rdev-
> >recovery_offset)
> /* in sync if before recovery_offset */
> - if (sh->sector + STRIPE_SECTORS <= rdev-
> >recovery_offset)
> - set_bit(R5_Insync, &dev->flags);
> - }
> + set_bit(R5_Insync, &dev->flags);
> + else if (test_bit(R5_UPTODATE, &dev->flags) &&
> + test_bit(R5_Expanded, &dev->flags))
> + /* If we've reshaped into here, we assume it is
> Insync.
> + * We will shortly update recovery_offset to make
> + * it official.
> + */
> + set_bit(R5_Insync, &dev->flags);
> +
> if (rdev && test_bit(R5_WriteError, &dev->flags)) {
> clear_bit(R5_Insync, &dev->flags);
> if (!test_bit(Faulty, &rdev->flags)) {
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-12-14 12:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-13 15:45 Raid0 expansion problem in md Kwolek, Adam
2011-12-14 4:42 ` NeilBrown
2011-12-14 12:15 ` Kwolek, Adam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).