From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Ni Subject: Start reshape failed Date: Mon, 28 Sep 2015 04:52:26 -0400 (EDT) Message-ID: <842369480.28953435.1443430346579.JavaMail.zimbra@redhat.com> References: <152634201.28921946.1443426236655.JavaMail.zimbra@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <152634201.28921946.1443426236655.JavaMail.zimbra@redhat.com> Sender: linux-raid-owner@vger.kernel.org To: linux-raid List-Id: linux-raid.ids Hi Neil When I add one disk to a 3disks raid5 and try to run --grow to let raid devices to 4. It's failed to start reshape. I used 4.3.0-rc2 and the latest mdadm git tree. [root@storageqe-09 md]# mdadm --grow /dev/md0 --raid-devices=4 mdadm: Failed to initiate reshape! [root@storageqe-09 md]# uname -r 4.3.0-rc2 [root@storageqe-09 md]# mdadm --version mdadm - v3.3.4-24-g86a406c - 28th September 2015 After some analysis, I found some hints. A: when run --grow, it want to write reshape to sync_action. Before that, it set SET_ARRAY_INFO in impose_reshape first. When set SET_ARRAY_INFO, it get mutex lock mddev->reconfig_mutex, and will call mddev_resume. In mddev_resume it set MD_RECOVERY_NEEDED. B: At the same time raid5d run. And it call md_check_recovery. But it can't get the lock mddev->reconfig_mutex. So it misses the chance to clear MD_RECOVERY_NEEDED. After A, it write reshape to sync_action. It calls action_store. It will return EBUSY because the MD_RECOVERY_NEEDED is already set. } else if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || test_bit(MD_RECOVERY_NEEDED, &mddev->recovery)) return -EBUSY; It's a little complex. I add md_check_recovery in action_store and the problem can be fixed. But I think maybe it's not a right way to fix this. Could you give some suggestions? Best Regards Xiao