From: Guoqing Jiang <gqjiang@suse.com>
To: linux-raid@vger.kernel.org
Cc: shli@fb.com, Guoqing Jiang <gqjiang@suse.com>
Subject: [PATCH V2 09/10] md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
Date: Fri, 12 Aug 2016 13:42:42 +0800 [thread overview]
Message-ID: <1470980563-26062-10-git-send-email-gqjiang@suse.com> (raw)
In-Reply-To: <1470980563-26062-1-git-send-email-gqjiang@suse.com>
When some node leaves cluster, then it's bitmap need to be
synced by another node, so "md*_recover" thread is triggered
for the purpose. However, with below steps. we can find tasks
hang happened either in B or C.
1. Node A create a resyncing cluster raid1, assemble it in
other two nodes (B and C).
2. stop array in B and C.
3. stop array in A.
linux44:~ # ps aux|grep md|grep D
root 5938 0.0 0.1 19852 1964 pts/0 D+ 14:52 0:00 mdadm -S md0
root 5939 0.0 0.0 0 0 ? D 14:52 0:00 [md0_recover]
linux44:~ # cat /proc/5939/stack
[<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
[<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
[<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
[<ffffffff8107ad94>] kthread+0xb4/0xc0
[<ffffffff8152a518>] ret_from_fork+0x58/0x90
linux44:~ # cat /proc/5938/stack
[<ffffffff8107afde>] kthread_stop+0x6e/0x120
[<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
[<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
[<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
[<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
[<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
[<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
[<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
[<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
[<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
[<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
[<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
The problem is caused by recover_bitmaps can't reliably abort
when the thread is unregistered. So dlm_lock_sync_interruptible
is introduced to detect the thread's situation to fix the problem.
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
---
drivers/md/md-cluster.c | 37 ++++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 03a51e7..149d19a 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -10,6 +10,7 @@
#include <linux/module.h>
+#include <linux/kthread.h>
#include <linux/dlm.h>
#include <linux/sched.h>
#include <linux/raid/md_p.h>
@@ -144,6 +145,40 @@ static int dlm_unlock_sync(struct dlm_lock_resource *res)
return dlm_lock_sync(res, DLM_LOCK_NL);
}
+/* An variation of dlm_lock_sync, which make lock request could
+ * be interrupted */
+static int dlm_lock_sync_interruptible(struct dlm_lock_resource *res, int mode,
+ struct mddev *mddev)
+{
+ int ret = 0;
+
+ ret = dlm_lock(res->ls, mode, &res->lksb,
+ res->flags, res->name, strlen(res->name),
+ 0, sync_ast, res, res->bast);
+ if (ret)
+ return ret;
+
+ wait_event(res->sync_locking, res->sync_locking_done
+ || kthread_should_stop());
+ if (!res->sync_locking_done) {
+ /*
+ * the convert queue contains the lock request when request is
+ * interrupted, and sync_ast could still be run, so need to
+ * cancel the request and reset completion
+ */
+ ret = dlm_unlock(res->ls, res->lksb.sb_lkid, DLM_LKF_CANCEL, &res->lksb, res);
+ res->sync_locking_done = false;
+ if (unlikely(ret != 0))
+ pr_info("failed to cancel previous lock request "
+ "%s return %d\n", res->name, ret);
+ return -EPERM;
+ } else
+ res->sync_locking_done = false;
+ if (res->lksb.sb_status == 0)
+ res->mode = mode;
+ return res->lksb.sb_status;
+}
+
static struct dlm_lock_resource *lockres_init(struct mddev *mddev,
char *name, void (*bastfn)(void *arg, int mode), int with_lvb)
{
@@ -276,7 +311,7 @@ static void recover_bitmaps(struct md_thread *thread)
goto clear_bit;
}
- ret = dlm_lock_sync(bm_lockres, DLM_LOCK_PW);
+ ret = dlm_lock_sync_interruptible(bm_lockres, DLM_LOCK_PW, mddev);
if (ret) {
pr_err("md-cluster: Could not DLM lock %s: %d\n",
str, ret);
--
2.6.2
next prev parent reply other threads:[~2016-08-12 5:42 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-12 5:42 [PATCH V2 00/10] The latest changes for md-cluster Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 01/10] md-cluster: call md_kick_rdev_from_array once ack failed Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 02/10] md-cluster: use FORCEUNLOCK in lockres_free Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 03/10] md-cluster: remove some unnecessary dlm_unlock_sync Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 04/10] md: changes for MD_STILL_CLOSED flag Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 05/10] md-cluster: clean related infos of cluster Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 06/10] md-cluster: protect md_find_rdev_nr_rcu with rcu lock Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 07/10] md: remove obsolete ret in md_start_sync Guoqing Jiang
2016-08-12 5:42 ` [PATCH V2 08/10] md-cluster: convert the completion to wait queue Guoqing Jiang
2016-08-12 5:42 ` Guoqing Jiang [this message]
2016-08-12 5:42 ` [PATCH V2 10/10] md-cluster: make resync lock also could be interruptted Guoqing Jiang
2016-08-17 1:49 ` [PATCH V2 00/10] The latest changes for md-cluster Shaohua Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1470980563-26062-10-git-send-email-gqjiang@suse.com \
--to=gqjiang@suse.com \
--cc=linux-raid@vger.kernel.org \
--cc=shli@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox