* Raid0 expansion problem in md
@ 2011-12-13 15:45 Kwolek, Adam
2011-12-14 4:42 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Kwolek, Adam @ 2011-12-13 15:45 UTC (permalink / raw)
To: neilb@suse.de; +Cc: linux-raid@vger.kernel.org
Hi Neil,
On the latest md neil_for-linus branch I've found raid0 migration problem.
During OLCE in user space everything goes fine, but in kernel process is not moved forward.
/older md works fine/
It is stopped in md in reshape_request() in line (near raid5.c:3957)
wait_event(conf->wait_for_overlap, atomic_read(&conf->reshape_stripes)==0);
I've found that this problem is a side effect of patch:
md/raid5: abort any pending parity operations when array fails.
and added line in this patch:
sh->reconstruct_state = 0;
During OLCE we are going inside because condition
if (s.failed > conf->max_degraded)
with values:
locked=1 uptodate=5 to_read=0 to_write=0 failed=2 failed_num=4,1
and sh->reconstruct_state is set to 0 (reconstruct_state_idle) from 6 (reconstruct_state_result)
When sh->reconstruct_state is not reset raid0 migration is executed without problem.
Problem is probably in not executed code for finishing reconstruction (around raid5.c:3300)
In our case field s.failed should not reach value 2 but we've got it for failed_num = 4,1.
It seems that '1' is failed disk for stripe in old array geometry and 4 is failed disk for stripe in new array geometry.
This means that degradation during reshape is counted two times /final stripe degradation is sum of old and new geometry degradation/.
When we reading (from old array) and writing (to new geometry) a degraded stripe and degradation is on different positions (raid0 OLCE case) analyse_stripe() gives
us false failure information. Possible that we should have old_failed and new_failed counters to know in what geometry (old/new) failure occurs.
Here is reproduction script:
export IMSM_NO_PLATFORM=1
#create container
mdadm -C /dev/md/imsm0 -amd -e imsm -n 4 /dev/sdb /dev/sdc /dev/sde /dev/sdd -R
#create array
mdadm -C /dev/md/raid0vol_0 -amd -l 0 --chunk 64 --size 1048 -n 1 /dev/sdb -R --force
#start reshape
mdadm --grow /dev/md/imsm0 --raid-devices 4
Please let me know your opinion.
BR
Adam
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: Raid0 expansion problem in md 2011-12-13 15:45 Raid0 expansion problem in md Kwolek, Adam @ 2011-12-14 4:42 ` NeilBrown 2011-12-14 12:15 ` Kwolek, Adam 0 siblings, 1 reply; 3+ messages in thread From: NeilBrown @ 2011-12-14 4:42 UTC (permalink / raw) To: Kwolek, Adam; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3491 bytes --] On Tue, 13 Dec 2011 15:45:30 +0000 "Kwolek, Adam" <adam.kwolek@intel.com> wrote: > Hi Neil, > > On the latest md neil_for-linus branch I've found raid0 migration problem. > During OLCE in user space everything goes fine, but in kernel process is not moved forward. > /older md works fine/ > > It is stopped in md in reshape_request() in line (near raid5.c:3957) > wait_event(conf->wait_for_overlap, atomic_read(&conf->reshape_stripes)==0); > > I've found that this problem is a side effect of patch: > md/raid5: abort any pending parity operations when array fails. > and added line in this patch: > sh->reconstruct_state = 0; > > During OLCE we are going inside because condition > if (s.failed > conf->max_degraded) > with values: > locked=1 uptodate=5 to_read=0 to_write=0 failed=2 failed_num=4,1 > > and sh->reconstruct_state is set to 0 (reconstruct_state_idle) from 6 (reconstruct_state_result) > When sh->reconstruct_state is not reset raid0 migration is executed without problem. > Problem is probably in not executed code for finishing reconstruction (around raid5.c:3300) > > In our case field s.failed should not reach value 2 but we've got it for failed_num = 4,1. > It seems that '1' is failed disk for stripe in old array geometry and 4 is failed disk for stripe in new array geometry. > This means that degradation during reshape is counted two times /final stripe degradation is sum of old and new geometry degradation/. > When we reading (from old array) and writing (to new geometry) a degraded stripe and degradation is on different positions (raid0 OLCE case) analyse_stripe() gives > us false failure information. Possible that we should have old_failed and new_failed counters to know in what geometry (old/new) failure occurs. > > > Here is reproduction script: > > export IMSM_NO_PLATFORM=1 > #create container > mdadm -C /dev/md/imsm0 -amd -e imsm -n 4 /dev/sdb /dev/sdc /dev/sde /dev/sdd -R > #create array > mdadm -C /dev/md/raid0vol_0 -amd -l 0 --chunk 64 --size 1048 -n 1 /dev/sdb -R --force > #start reshape > mdadm --grow /dev/md/imsm0 --raid-devices 4 > > > Please let me know your opinion. Thanks for the excellent problem report. I think it is best fixed by the following patch. I also need to fixed up the calculate of 'degraded' so it doesn't say '2' in this case, which is confusing. Then I'll commit the fixes. Thanks, NeilBrown diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 31670f8..858fdbb 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3065,11 +3065,17 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) } } else if (test_bit(In_sync, &rdev->flags)) set_bit(R5_Insync, &dev->flags); - else { + else if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset) /* in sync if before recovery_offset */ - if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset) - set_bit(R5_Insync, &dev->flags); - } + set_bit(R5_Insync, &dev->flags); + else if (test_bit(R5_UPTODATE, &dev->flags) && + test_bit(R5_Expanded, &dev->flags)) + /* If we've reshaped into here, we assume it is Insync. + * We will shortly update recovery_offset to make + * it official. + */ + set_bit(R5_Insync, &dev->flags); + if (rdev && test_bit(R5_WriteError, &dev->flags)) { clear_bit(R5_Insync, &dev->flags); if (!test_bit(Faulty, &rdev->flags)) { [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply related [flat|nested] 3+ messages in thread
* RE: Raid0 expansion problem in md 2011-12-14 4:42 ` NeilBrown @ 2011-12-14 12:15 ` Kwolek, Adam 0 siblings, 0 replies; 3+ messages in thread From: Kwolek, Adam @ 2011-12-14 12:15 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org > -----Original Message----- > From: NeilBrown [mailto:neilb@suse.de] > Sent: Wednesday, December 14, 2011 5:42 AM > To: Kwolek, Adam > Cc: linux-raid@vger.kernel.org > Subject: Re: Raid0 expansion problem in md > > On Tue, 13 Dec 2011 15:45:30 +0000 "Kwolek, Adam" > <adam.kwolek@intel.com> > wrote: > > > Hi Neil, > > > > On the latest md neil_for-linus branch I've found raid0 migration problem. > > During OLCE in user space everything goes fine, but in kernel process is not > moved forward. > > /older md works fine/ > > > > It is stopped in md in reshape_request() in line (near raid5.c:3957) > > wait_event(conf->wait_for_overlap, > > atomic_read(&conf->reshape_stripes)==0); > > > > I've found that this problem is a side effect of patch: > > md/raid5: abort any pending parity operations when array fails. > > and added line in this patch: > > sh->reconstruct_state = 0; > > > > During OLCE we are going inside because condition > > if (s.failed > conf->max_degraded) with values: > > locked=1 uptodate=5 to_read=0 to_write=0 failed=2 failed_num=4,1 > > > > and sh->reconstruct_state is set to 0 (reconstruct_state_idle) from 6 > > (reconstruct_state_result) When sh->reconstruct_state is not reset raid0 > migration is executed without problem. > > Problem is probably in not executed code for finishing reconstruction > > (around raid5.c:3300) > > > > In our case field s.failed should not reach value 2 but we've got it for > failed_num = 4,1. > > It seems that '1' is failed disk for stripe in old array geometry and 4 is failed > disk for stripe in new array geometry. > > This means that degradation during reshape is counted two times /final > stripe degradation is sum of old and new geometry degradation/. > > When we reading (from old array) and writing (to new geometry) a > > degraded stripe and degradation is on different positions (raid0 OLCE case) > analyse_stripe() gives us false failure information. Possible that we should > have old_failed and new_failed counters to know in what geometry > (old/new) failure occurs. > > > > > > Here is reproduction script: > > > > export IMSM_NO_PLATFORM=1 > > #create container > > mdadm -C /dev/md/imsm0 -amd -e imsm -n 4 /dev/sdb /dev/sdc /dev/sde > > /dev/sdd -R #create array mdadm -C /dev/md/raid0vol_0 -amd -l 0 > > --chunk 64 --size 1048 -n 1 /dev/sdb -R --force #start reshape mdadm > > --grow /dev/md/imsm0 --raid-devices 4 > > > > > > Please let me know your opinion. > > Thanks for the excellent problem report. > > I think it is best fixed by the following patch. > I also need to fixed up the calculate of 'degraded' so it doesn't say '2' in this > case, which is confusing. Then I'll commit the fixes. > > > Thanks, > NeilBrown Yes this helps :) Thanks Neil, BR Adam > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 31670f8..858fdbb > 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -3065,11 +3065,17 @@ static void analyse_stripe(struct stripe_head *sh, > struct stripe_head_state *s) > } > } else if (test_bit(In_sync, &rdev->flags)) > set_bit(R5_Insync, &dev->flags); > - else { > + else if (sh->sector + STRIPE_SECTORS <= rdev- > >recovery_offset) > /* in sync if before recovery_offset */ > - if (sh->sector + STRIPE_SECTORS <= rdev- > >recovery_offset) > - set_bit(R5_Insync, &dev->flags); > - } > + set_bit(R5_Insync, &dev->flags); > + else if (test_bit(R5_UPTODATE, &dev->flags) && > + test_bit(R5_Expanded, &dev->flags)) > + /* If we've reshaped into here, we assume it is > Insync. > + * We will shortly update recovery_offset to make > + * it official. > + */ > + set_bit(R5_Insync, &dev->flags); > + > if (rdev && test_bit(R5_WriteError, &dev->flags)) { > clear_bit(R5_Insync, &dev->flags); > if (!test_bit(Faulty, &rdev->flags)) { ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-12-14 12:15 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-12-13 15:45 Raid0 expansion problem in md Kwolek, Adam 2011-12-14 4:42 ` NeilBrown 2011-12-14 12:15 ` Kwolek, Adam
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).