[PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array
@ 2016-06-03  3:32 Guoqing Jiang
  2016-06-03  3:32 ` [PATCH 2/2] md: simplify the code with md_kick_rdev_from_array Guoqing Jiang
  2016-06-03 23:21 ` [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array Shaohua Li
  0 siblings, 2 replies; 3+ messages in thread
From: Guoqing Jiang @ 2016-06-03  3:32 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

Add a disk to an array which is performing recovery
is a little complicated, we need to do both reap the
sync thread and perform add disk for the case, then
it caused deadlock as follows.

linux44:~ # ps aux|grep md|grep D
root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
linux44:~ # cat /proc/1848/stack
[<ffffffff8107afde>] kthread_stop+0x6e/0x120
[<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
[<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
[<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
[<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
[<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
[<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
[<ffffffff811a75b8>] SyS_write+0x48/0xa0
[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
[<00007f068ea1ed30>] 0x7f068ea1ed30
linux44:~ # cat /proc/1822/stack
[<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
[<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
[<ffffffff8107ad94>] kthread+0xb4/0xc0
[<ffffffff8152a518>] ret_from_fork+0x58/0x90

                        Task1848                                Task1822
md_attr_store (held reconfig_mutex by call mddev_lock())
                        action_store
			md_reap_sync_thread
			md_unregister_thread
			kthread_stop                    md_wakeup_thread(mddev->thread);
						wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))

md_check_recovery is triggered by wakeup mddev->thread,
but it can't clear MD_CHANGE_PENDING flag since it can't
get lock which was held by md_attr_store already.

To solve the deadlock problem, we move "->resync_finish()"
from md_do_sync to md_reap_sync_thread (after md_update_sb),
also MD_HELD_RESYNC_LOCK is introduced since it is possible
that node can't get resync lock in md_do_sync.

Then we do not need to wait for MD_CHANGE_PENDING is cleared
or not since metadata should be updated after md_update_sb,
so just call resync_finish if MD_HELD_RESYNC_LOCK is set.

We also unified the code after skip label, since set PENDING
for non-clustered case should be harmless.

Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 23 +++++++++++------------
 drivers/md/md.h |  3 +++
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 866825f..25d4542 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7809,6 +7809,7 @@ void md_do_sync(struct md_thread *thread)
 		if (ret)
 			goto skip;
 
+		set_bit(MD_CLUSTER_RESYNC_LOCKED, &mddev->flags);
 		if (!(test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ||
 			test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) ||
 			test_bit(MD_RECOVERY_RECOVER, &mddev->recovery))
@@ -8147,18 +8148,11 @@ void md_do_sync(struct md_thread *thread)
 		}
 	}
  skip:
-	if (mddev_is_clustered(mddev) &&
-	    ret == 0) {
-		/* set CHANGE_PENDING here since maybe another
-		 * update is needed, so other nodes are informed */
-		set_mask_bits(&mddev->flags, 0,
-			      BIT(MD_CHANGE_PENDING) | BIT(MD_CHANGE_DEVS));
-		md_wakeup_thread(mddev->thread);
-		wait_event(mddev->sb_wait,
-			   !test_bit(MD_CHANGE_PENDING, &mddev->flags));
-		md_cluster_ops->resync_finish(mddev);
-	} else
-		set_bit(MD_CHANGE_DEVS, &mddev->flags);
+	/* set CHANGE_PENDING here since maybe another update is needed,
+	 * so other nodes are informed. It should be harmless for normal
+	 * raid */
+	set_mask_bits(&mddev->flags, 0,
+		      BIT(MD_CHANGE_PENDING) | BIT(MD_CHANGE_DEVS));
 
 	spin_lock(&mddev->lock);
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery)) {
@@ -8502,6 +8496,11 @@ void md_reap_sync_thread(struct mddev *mddev)
 			rdev->saved_raid_disk = -1;
 
 	md_update_sb(mddev, 1);
+	/* MD_CHANGE_PENDING should be cleared by md_update_sb, so we can
+	 * call resync_finish here if MD_CLUSTER_RESYNC_LOCKED is set by
+	 * clustered raid */
+	if (test_and_clear_bit(MD_CLUSTER_RESYNC_LOCKED, &mddev->flags))
+		md_cluster_ops->resync_finish(mddev);
 	clear_bit(MD_RECOVERY_RUNNING, &mddev->recovery);
 	clear_bit(MD_RECOVERY_DONE, &mddev->recovery);
 	clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b5c4be7..03b19aa 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -204,6 +204,9 @@ struct mddev {
 #define MD_RELOAD_SB	7	/* Reload the superblock because another node
 				 * updated it.
 				 */
+#define MD_CLUSTER_RESYNC_LOCKED 8 /* cluster raid only, which means node
+				    * already took resync lock, need to
+				    * release the lock */
 
 	int				suspended;
 	atomic_t			active_io;
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH 2/2] md: simplify the code with md_kick_rdev_from_array
  2016-06-03  3:32 [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array Guoqing Jiang
@ 2016-06-03  3:32 ` Guoqing Jiang
  2016-06-03 23:21 ` [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array Shaohua Li
  1 sibling, 0 replies; 3+ messages in thread
From: Guoqing Jiang @ 2016-06-03  3:32 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, Guoqing Jiang

Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
 drivers/md/md.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 25d4542..459f189 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2478,8 +2478,7 @@ static int add_bound_rdev(struct md_rdev *rdev)
 		if (add_journal)
 			mddev_resume(mddev);
 		if (err) {
-			unbind_rdev_from_array(rdev);
-			export_rdev(rdev);
+			md_kick_rdev_from_array(rdev);
 			return err;
 		}
 	}
-- 
2.6.6


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array
  2016-06-03  3:32 [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array Guoqing Jiang
  2016-06-03  3:32 ` [PATCH 2/2] md: simplify the code with md_kick_rdev_from_array Guoqing Jiang
@ 2016-06-03 23:21 ` Shaohua Li
  1 sibling, 0 replies; 3+ messages in thread
From: Shaohua Li @ 2016-06-03 23:21 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Thu, Jun 02, 2016 at 11:32:04PM -0400, Guoqing Jiang wrote:
> Add a disk to an array which is performing recovery
> is a little complicated, we need to do both reap the
> sync thread and perform add disk for the case, then
> it caused deadlock as follows.
> 
> linux44:~ # ps aux|grep md|grep D
> root      1822  0.0  0.0      0     0 ?        D    16:50   0:00 [md127_resync]
> root      1848  0.0  0.0  19860   952 pts/0    D+   16:50   0:00 mdadm --manage /dev/md127 --re-add /dev/vdb
> linux44:~ # cat /proc/1848/stack
> [<ffffffff8107afde>] kthread_stop+0x6e/0x120
> [<ffffffffa051ddb0>] md_unregister_thread+0x40/0x80 [md_mod]
> [<ffffffffa0526e45>] md_reap_sync_thread+0x15/0x150 [md_mod]
> [<ffffffffa05271e0>] action_store+0x260/0x270 [md_mod]
> [<ffffffffa05206b4>] md_attr_store+0xb4/0x100 [md_mod]
> [<ffffffff81214a7e>] sysfs_write_file+0xbe/0x140
> [<ffffffff811a6b98>] vfs_write+0xb8/0x1e0
> [<ffffffff811a75b8>] SyS_write+0x48/0xa0
> [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
> [<00007f068ea1ed30>] 0x7f068ea1ed30
> linux44:~ # cat /proc/1822/stack
> [<ffffffffa05251a6>] md_do_sync+0x846/0xf40 [md_mod]
> [<ffffffffa052402d>] md_thread+0x16d/0x180 [md_mod]
> [<ffffffff8107ad94>] kthread+0xb4/0xc0
> [<ffffffff8152a518>] ret_from_fork+0x58/0x90
> 
>                         Task1848                                Task1822
> md_attr_store (held reconfig_mutex by call mddev_lock())
>                         action_store
> 			md_reap_sync_thread
> 			md_unregister_thread
> 			kthread_stop                    md_wakeup_thread(mddev->thread);
> 						wait_event(mddev->sb_wait, !test_bit(MD_CHANGE_PENDING))
> 
> md_check_recovery is triggered by wakeup mddev->thread,
> but it can't clear MD_CHANGE_PENDING flag since it can't
> get lock which was held by md_attr_store already.
> 
> To solve the deadlock problem, we move "->resync_finish()"
> from md_do_sync to md_reap_sync_thread (after md_update_sb),
> also MD_HELD_RESYNC_LOCK is introduced since it is possible
> that node can't get resync lock in md_do_sync.
> 
> Then we do not need to wait for MD_CHANGE_PENDING is cleared
> or not since metadata should be updated after md_update_sb,
> so just call resync_finish if MD_HELD_RESYNC_LOCK is set.
> 
> We also unified the code after skip label, since set PENDING
> for non-clustered case should be harmless.

applied the two, thanks!

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-06-03 23:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-03  3:32 [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array Guoqing Jiang
2016-06-03  3:32 ` [PATCH 2/2] md: simplify the code with md_kick_rdev_from_array Guoqing Jiang
2016-06-03 23:21 ` [PATCH 1/2] md-cluster: fix deadlock issue when add disk to an recoverying array Shaohua Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.