Re: [PATCH v2] md/cluster: fix deadlock when doing reshape job

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Xiao Ni <xni@redhat.com>
To: heming zhao <heming.zhao@suse.com>
Cc: linux-raid@vger.kernel.org, song@kernel.org,
	guoqing jiang <guoqing.jiang@cloud.ionos.com>,
	lidong zhong <lidong.zhong@suse.com>,
	neilb@suse.de, colyli@suse.de
Subject: Re: [PATCH v2] md/cluster: fix deadlock when doing reshape job
Date: Thu, 12 Nov 2020 21:50:13 -0500 (EST)	[thread overview]
Message-ID: <738037216.25735601.1605235813114.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <d606f944-f66f-98f7-498d-f3939a395934@suse.com>



----- Original Message -----
> From: "heming zhao" <heming.zhao@suse.com>
> To: "Xiao Ni" <xni@redhat.com>, linux-raid@vger.kernel.org, song@kernel.org, "guoqing jiang"
> <guoqing.jiang@cloud.ionos.com>
> Cc: "lidong zhong" <lidong.zhong@suse.com>, neilb@suse.de, colyli@suse.de
> Sent: Thursday, November 12, 2020 7:27:46 PM
> Subject: Re: [PATCH v2] md/cluster: fix deadlock when doing reshape job
> 
> Hello,
> 
> On 11/12/20 1:08 PM, Xiao Ni wrote:
> > 
> > 
> > On 11/11/2020 11:51 PM, Zhao Heming wrote:
> >> There is a similar deadlock in commit 0ba959774e93
> >> ("md-cluster: use sync way to handle METADATA_UPDATED msg")
> >> My patch fixed issue is very like commit 0ba959774e93, except <c>.
> >> 0ba959774e93 step <c> is "update sb", my fix is "mdadm --remove"
> >>
> >> ... ...
> >> +               !test_and_set_bit(MD_CLUSTER_SEND_LOCK, &cinfo->state),
> >> +               msecs_to_jiffies(5000));
> >> +    if (rv)
> >> +        return lock_token(cinfo, mddev_locked);
> >> +    else
> >> +        return -1;
> >>   }
> > Hi Heming
> > 
> > Can we handle this problem like metadata_update_start? lock_comm and
> > metadata_update_start are two users that
> > want to get token lock. lock_comm can do the same thing as
> > metadata_update_start does. If so, process_recvd_msg
> > can call md_reload_sb without waiting. All threads can work well when the
> > initiated node release token lock. Resync
> > can send message and clear MD_CLUSTER_SEND_LOCK, then lock_comm can go on
> > working. In this way, all threads
> > work successfully without failure.
> > 
> 
> Currently MD_CLUSTER_SEND_LOCKED_ALREADY only for adding a new disk.
> (please refer Documentation/driver-api/md/md-cluster.rst section: 5. Adding a
> new Device")
> During ADD_NEW_DISK process, md-cluster will block all other msg sending
> until metadata_update_finish calls
> unlock_comm.
> 
> With your idea, md-cluster will allow to concurrently send msg. I'm not very
> familiar with all raid1 scenarios.
> But at least, you break the rule of handling ADD_NEW_DISK. it will allow send
> other msg during ADD_NEW_DISK.

Hi Heming

It doesn't need to change MD_CLUSTER_SEND_LOCKED_ALREADY. Is it ok to do something like this:

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 4aaf482..f6f576b 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -664,29 +664,12 @@ static void recv_daemon(struct md_thread *thread)
  * Takes the lock on the TOKEN lock resource so no other
  * node can communicate while the operation is underway.
  */
-static int lock_token(struct md_cluster_info *cinfo, bool mddev_locked)
+static int lock_token(struct md_cluster_info *cinfo)
 {
-	int error, set_bit = 0;
+	int error;
 	struct mddev *mddev = cinfo->mddev;
 
-	/*
-	 * If resync thread run after raid1d thread, then process_metadata_update
-	 * could not continue if raid1d held reconfig_mutex (and raid1d is blocked
-	 * since another node already got EX on Token and waitting the EX of Ack),
-	 * so let resync wake up thread in case flag is set.
-	 */
-	if (mddev_locked && !test_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
-				      &cinfo->state)) {
-		error = test_and_set_bit_lock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
-					      &cinfo->state);
-		WARN_ON_ONCE(error);
-		md_wakeup_thread(mddev->thread);
-		set_bit = 1;
-	}
 	error = dlm_lock_sync(cinfo->token_lockres, DLM_LOCK_EX);
-	if (set_bit)
-		clear_bit_unlock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
-
 	if (error)
 		pr_err("md-cluster(%s:%d): failed to get EX on TOKEN (%d)\n",
 				__func__, __LINE__, error);
@@ -701,10 +684,30 @@ static int lock_token(struct md_cluster_info *cinfo, bool mddev_locked)
  */
 static int lock_comm(struct md_cluster_info *cinfo, bool mddev_locked)
 {
+	int ret, set_bit = 0;
+
+	/*
+	 * If resync thread run after raid1d thread, then process_metadata_update
+	 * could not continue if raid1d held reconfig_mutex (and raid1d is blocked
+	 * since another node already got EX on Token and waitting the EX of Ack),
+	 * so let resync wake up thread in case flag is set.
+	 */
+	if (mddev_locked && !test_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
+				      &cinfo->state)) {
+		error = test_and_set_bit_lock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
+					      &cinfo->state);
+		WARN_ON_ONCE(error);
+		md_wakeup_thread(mddev->thread);
+		set_bit = 1;
+	}
+
 	wait_event(cinfo->wait,
 		   !test_and_set_bit(MD_CLUSTER_SEND_LOCK, &cinfo->state));
 
-	return lock_token(cinfo, mddev_locked);
+	ret = lock_token(cinfo, mddev_locked);
+	if (ret && set_bit)
+		clear_bit_unlock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
+	return ret;
 }
 
 static void unlock_comm(struct md_cluster_info *cinfo)

> 
> >>   static void unlock_comm(struct md_cluster_info *cinfo)
> >> @@ -784,9 +789,11 @@ static int sendmsg(struct md_cluster_info *cinfo,
> >> struct cluster_msg *cmsg,
> >>   {
> >>       int ret;
> >> -    lock_comm(cinfo, mddev_locked);
> >> ... ...
> >> +    if (mddev_is_clustered(mddev)) {
> >> +        if (md_cluster_ops->remove_disk(mddev, rdev))
> >> +            goto busy;
> >> +    }
> >>       md_kick_rdev_from_array(rdev);
> >>       set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
> > These codes are not related with this deadlock problem. It's better to file
> > a new patch
> > to fix checking the return value problem.
> > 
> 
> In my opinion: we should include these code in this patch.
> For totally fix the deadlock, md layer should return error to userspace.
> 
> But with your comments, I found other potential issues of md-cluster.
> There still have some cluster_ops APIs, which caller doesn't care error
> return:
> .resync_finish - may cause other nodes never clean MD_RESYNCING_REMOTE.
> .resync_info_update - this error could be safely ignore
> .metadata_update_finish - may cause other nodes kernel md info is
> inconsistent with disk metadata.
> 
> maybe I or other guys fix them in the future.
> 
> > Best Regards
> > Xiao
> > 
> 
>

next prev parent reply	other threads:[~2020-11-13  2:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-11 15:51 [PATCH v2] md/cluster: fix deadlock when doing reshape job Zhao Heming
2020-11-12  5:08 ` Xiao Ni
2020-11-12 11:27   ` heming.zhao
2020-11-13  2:50     ` Xiao Ni [this message]
2020-11-13 13:23       ` heming.zhao

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:4aaf482 dfblob:f6f576b )
 OR (
bs:"Re: [PATCH v2] md/cluster: fix deadlock when doing reshape job" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=738037216.25735601.1605235813114.JavaMail.zimbra@redhat.com \
    --to=xni@redhat.com \
    --cc=colyli@suse.de \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=heming.zhao@suse.com \
    --cc=lidong.zhong@suse.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=song@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).