From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-by2nam03on0137.outbound.protection.outlook.com ([104.47.42.137]:35861 "EHLO NAM03-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727242AbeIQIZ4 (ORCPT ); Mon, 17 Sep 2018 04:25:56 -0400 From: Sasha Levin To: "stable@vger.kernel.org" , "linux-kernel@vger.kernel.org" CC: Guoqing Jiang , Shaohua Li , Sasha Levin Subject: [PATCH AUTOSEL 4.18 027/136] md-cluster: clear another node's suspend_area after the copy is finished Date: Mon, 17 Sep 2018 03:00:28 +0000 Message-ID: <20180917030006.245495-27-alexander.levin@microsoft.com> References: <20180917030006.245495-1-alexander.levin@microsoft.com> In-Reply-To: <20180917030006.245495-1-alexander.levin@microsoft.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Sender: stable-owner@vger.kernel.org List-ID: From: Guoqing Jiang [ Upstream commit 010228e4a932ca1e8365e3b58c8e1e44c16ff793 ] When one node leaves cluster or stops the resyncing (resync or recovery) array, then other nodes need to call recover_bitmaps to continue the unfinished task. But we need to clear suspend_area later after other nodes copy the resync information to their bitmap (by call bitmap_copy_from_slot). Otherwise, all nodes could write to the suspend_area even the suspend_area is not handled by any node, because area_resyncing returns 0 at the beginning of raid1_write_request. Which means one node could write suspend_area while another node is resyncing the same area, then data could be inconsistent. So let's clear suspend_area later to avoid above issue with the protection of bm lock. Also it is straightforward to clear suspend_area after nodes have copied the resync info to bitmap. Signed-off-by: Guoqing Jiang Reviewed-by: NeilBrown Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin --- drivers/md/md-cluster.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c index 021cbf9ef1bf..1ac945f7a3c2 100644 --- a/drivers/md/md-cluster.c +++ b/drivers/md/md-cluster.c @@ -304,15 +304,6 @@ static void recover_bitmaps(struct md_thread *thread) while (cinfo->recovery_map) { slot =3D fls64((u64)cinfo->recovery_map) - 1; =20 - /* Clear suspend_area associated with the bitmap */ - spin_lock_irq(&cinfo->suspend_lock); - list_for_each_entry_safe(s, tmp, &cinfo->suspend_list, list) - if (slot =3D=3D s->slot) { - list_del(&s->list); - kfree(s); - } - spin_unlock_irq(&cinfo->suspend_lock); - snprintf(str, 64, "bitmap%04d", slot); bm_lockres =3D lockres_init(mddev, str, NULL, 1); if (!bm_lockres) { @@ -331,6 +322,16 @@ static void recover_bitmaps(struct md_thread *thread) pr_err("md-cluster: Could not copy data from bitmap %d\n", slot); goto clear_bit; } + + /* Clear suspend_area associated with the bitmap */ + spin_lock_irq(&cinfo->suspend_lock); + list_for_each_entry_safe(s, tmp, &cinfo->suspend_list, list) + if (slot =3D=3D s->slot) { + list_del(&s->list); + kfree(s); + } + spin_unlock_irq(&cinfo->suspend_lock); + if (hi > 0) { if (lo < mddev->recovery_cp) mddev->recovery_cp =3D lo; --=20 2.17.1