linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Start reshape failed
       [not found] <152634201.28921946.1443426236655.JavaMail.zimbra@redhat.com>
@ 2015-09-28  8:52 ` Xiao Ni
  0 siblings, 0 replies; 3+ messages in thread
From: Xiao Ni @ 2015-09-28  8:52 UTC (permalink / raw)
  To: linux-raid

Hi Neil

When I add one disk to a 3disks raid5 and try to run --grow to let raid devices to 4. 
It's failed to start reshape. I used 4.3.0-rc2 and the latest mdadm git tree.

[root@storageqe-09 md]# mdadm --grow /dev/md0 --raid-devices=4
mdadm: Failed to initiate reshape!
[root@storageqe-09 md]# uname -r
4.3.0-rc2
[root@storageqe-09 md]# mdadm --version
mdadm - v3.3.4-24-g86a406c - 28th September 2015

After some analysis, I found some hints.

A: when run --grow, it want to write reshape to sync_action. Before that, it set SET_ARRAY_INFO
in impose_reshape first. When set SET_ARRAY_INFO, it get mutex lock mddev->reconfig_mutex, and 
will call mddev_resume. In mddev_resume it set MD_RECOVERY_NEEDED.

B: At the same time raid5d run. And it call md_check_recovery. But it can't get the lock 
mddev->reconfig_mutex. So it misses the chance to clear MD_RECOVERY_NEEDED.

After A, it write reshape to sync_action. It calls action_store. It will return EBUSY because
the MD_RECOVERY_NEEDED is already set.

} else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
    test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) 
     return -EBUSY;

It's a little complex. I add md_check_recovery in action_store and the problem can be fixed.
But I think maybe it's not a right way to fix this. 

Could you give some suggestions?

Best Regards
Xiao

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Start reshape failed
@ 2015-12-16  7:47 Kev Dorman
  2015-12-21  1:09 ` NeilBrown
  0 siblings, 1 reply; 3+ messages in thread
From: Kev Dorman @ 2015-12-16  7:47 UTC (permalink / raw)
  To: linux-raid

Hi,

Xiao Ni posted back in September about a problem where mdadm --grow fails with:

mdadm: Failed to initiate reshape!

I'm seeing the same problem, with mdadm 3.3.4, Kernel 4.3.0.  I can
reproduce this reliably
when adding a device to a 2-device raid0 array.  Instead of switching
to raid4 for the
reshape, then switching back to raid0 when done, it ends up in raid4
in recovery mode.

I tried the suggested change of adding a call to md_check_recovery()
in action_store(),
and that does seem to help most of the time.

I was curious if there is a better solution, and/or if this or another
change will show up in
4.3.x soon.

Thanks.

Kev

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Start reshape failed
  2015-12-16  7:47 Start reshape failed Kev Dorman
@ 2015-12-21  1:09 ` NeilBrown
  0 siblings, 0 replies; 3+ messages in thread
From: NeilBrown @ 2015-12-21  1:09 UTC (permalink / raw)
  To: Kev Dorman, linux-raid

[-- Attachment #1: Type: text/plain, Size: 865 bytes --]

On Wed, Dec 16 2015, Kev Dorman wrote:

> Hi,
>
> Xiao Ni posted back in September about a problem where mdadm --grow fails with:
>
> mdadm: Failed to initiate reshape!
>
> I'm seeing the same problem, with mdadm 3.3.4, Kernel 4.3.0.  I can
> reproduce this reliably
> when adding a device to a 2-device raid0 array.  Instead of switching
> to raid4 for the
> reshape, then switching back to raid0 when done, it ends up in raid4
> in recovery mode.
>
> I tried the suggested change of adding a call to md_check_recovery()
> in action_store(),
> and that does seem to help most of the time.
>
> I was curious if there is a better solution, and/or if this or another
> change will show up in
> 4.3.x soon.

I have a patch queued in my for-next branch which I'll send to Linus
before Christmas.  It should then get into 4.3.x in early January (at a
guess).

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-21  1:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-16  7:47 Start reshape failed Kev Dorman
2015-12-21  1:09 ` NeilBrown
     [not found] <152634201.28921946.1443426236655.JavaMail.zimbra@redhat.com>
2015-09-28  8:52 ` Xiao Ni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).