* Re: Start reshape failed
@ 2015-12-16 7:47 Kev Dorman
2015-12-21 1:09 ` NeilBrown
0 siblings, 1 reply; 3+ messages in thread
From: Kev Dorman @ 2015-12-16 7:47 UTC (permalink / raw)
To: linux-raid
Hi,
Xiao Ni posted back in September about a problem where mdadm --grow fails with:
mdadm: Failed to initiate reshape!
I'm seeing the same problem, with mdadm 3.3.4, Kernel 4.3.0. I can
reproduce this reliably
when adding a device to a 2-device raid0 array. Instead of switching
to raid4 for the
reshape, then switching back to raid0 when done, it ends up in raid4
in recovery mode.
I tried the suggested change of adding a call to md_check_recovery()
in action_store(),
and that does seem to help most of the time.
I was curious if there is a better solution, and/or if this or another
change will show up in
4.3.x soon.
Thanks.
Kev
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Start reshape failed
2015-12-16 7:47 Start reshape failed Kev Dorman
@ 2015-12-21 1:09 ` NeilBrown
0 siblings, 0 replies; 3+ messages in thread
From: NeilBrown @ 2015-12-21 1:09 UTC (permalink / raw)
To: Kev Dorman, linux-raid
[-- Attachment #1: Type: text/plain, Size: 865 bytes --]
On Wed, Dec 16 2015, Kev Dorman wrote:
> Hi,
>
> Xiao Ni posted back in September about a problem where mdadm --grow fails with:
>
> mdadm: Failed to initiate reshape!
>
> I'm seeing the same problem, with mdadm 3.3.4, Kernel 4.3.0. I can
> reproduce this reliably
> when adding a device to a 2-device raid0 array. Instead of switching
> to raid4 for the
> reshape, then switching back to raid0 when done, it ends up in raid4
> in recovery mode.
>
> I tried the suggested change of adding a call to md_check_recovery()
> in action_store(),
> and that does seem to help most of the time.
>
> I was curious if there is a better solution, and/or if this or another
> change will show up in
> 4.3.x soon.
I have a patch queued in my for-next branch which I'll send to Linus
before Christmas. It should then get into 4.3.x in early January (at a
guess).
NeilBrown
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]
^ permalink raw reply [flat|nested] 3+ messages in thread
[parent not found: <152634201.28921946.1443426236655.JavaMail.zimbra@redhat.com>]
* Start reshape failed
[not found] <152634201.28921946.1443426236655.JavaMail.zimbra@redhat.com>
@ 2015-09-28 8:52 ` Xiao Ni
0 siblings, 0 replies; 3+ messages in thread
From: Xiao Ni @ 2015-09-28 8:52 UTC (permalink / raw)
To: linux-raid
Hi Neil
When I add one disk to a 3disks raid5 and try to run --grow to let raid devices to 4.
It's failed to start reshape. I used 4.3.0-rc2 and the latest mdadm git tree.
[root@storageqe-09 md]# mdadm --grow /dev/md0 --raid-devices=4
mdadm: Failed to initiate reshape!
[root@storageqe-09 md]# uname -r
4.3.0-rc2
[root@storageqe-09 md]# mdadm --version
mdadm - v3.3.4-24-g86a406c - 28th September 2015
After some analysis, I found some hints.
A: when run --grow, it want to write reshape to sync_action. Before that, it set SET_ARRAY_INFO
in impose_reshape first. When set SET_ARRAY_INFO, it get mutex lock mddev->reconfig_mutex, and
will call mddev_resume. In mddev_resume it set MD_RECOVERY_NEEDED.
B: At the same time raid5d run. And it call md_check_recovery. But it can't get the lock
mddev->reconfig_mutex. So it misses the chance to clear MD_RECOVERY_NEEDED.
After A, it write reshape to sync_action. It calls action_store. It will return EBUSY because
the MD_RECOVERY_NEEDED is already set.
} else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
return -EBUSY;
It's a little complex. I add md_check_recovery in action_store and the problem can be fixed.
But I think maybe it's not a right way to fix this.
Could you give some suggestions?
Best Regards
Xiao
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2015-12-21 1:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-16 7:47 Start reshape failed Kev Dorman
2015-12-21 1:09 ` NeilBrown
[not found] <152634201.28921946.1443426236655.JavaMail.zimbra@redhat.com>
2015-09-28 8:52 ` Xiao Ni
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).