From: Xiaowei <xiaowei.hu@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
Date: Wed, 30 May 2012 08:41:09 +0800 [thread overview]
Message-ID: <4FC56CA5.8040902@oracle.com> (raw)
In-Reply-To: <CAEeiSHXcaKXi7Qm5vLBmTp2CjiB7DCrUee5qmr03YpuJbzP5yg@mail.gmail.com>
On 05/30/2012 06:09 AM, Sunil Mushran wrote:
> On Thu, May 24, 2012 at 10:53 PM, <xiaowei.hu@oracle.com
> <mailto:xiaowei.hu@oracle.com>> wrote:
>
>
> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
> index 01ebfd0..62659e8 100644
> --- a/fs/ocfs2/dlm/dlmrecovery.c
> +++ b/fs/ocfs2/dlm/dlmrecovery.c
> @@ -555,6 +555,7 @@ static int dlm_remaster_locks(struct dlm_ctxt
> *dlm, u8 dead_node)
> int all_nodes_done;
> int destroy = 0;
> int pass = 0;
> + int dying = 0;
>
> do {
> /* we have become recovery master. there is no
> escaping
> @@ -659,6 +660,7 @@ static int dlm_remaster_locks(struct dlm_ctxt
> *dlm, u8 dead_node)
> list_for_each_entry(ndata, &dlm->reco.node_data,
> list) {
> mlog(0, "checking recovery state of node %u\n",
> ndata->node_num);
> + dying = 0;
> switch (ndata->state) {
> case DLM_RECO_NODE_DATA_INIT:
> case DLM_RECO_NODE_DATA_REQUESTING:
> @@ -679,6 +681,13 @@ static int dlm_remaster_locks(struct dlm_ctxt
> *dlm, u8 dead_node)
> dlm->name,
> ndata->node_num,
>
> ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?
> "receiving" :
> "requested");
> + spin_lock(&dlm->spinlock);
> + dying =
> !test_bit(ndata->node_num, dlm->live_nodes_map);
> + spin_unlock(&dlm->spinlock);
> + if (dying) {
> + ndata->state =
> DLM_RECO_NODE_DATA_DEAD;
> + break;
> + }
>
>
>
>
>
> I would suggest exploring adding this in dlm hb down event. Checking
> live map all
> over the place is hacky. We do it more than we should right now. Let's
> not add to the
> mess.
HI Sunil,
Do you mean we should clear the bit in domain map in dlm hb down event
directly when the node down
and check with dlm_is_node_dead at here?
Or how could we explore and ensure the node is alive during the whole
migrate process?One node could die even after it sends out one locks
package and before the next if there were too many locks on that lockres.
Thanks,
Xiaowei
>
>
>
> all_nodes_done = 0;
> break;
> case DLM_RECO_NODE_DATA_DONE:
> --
> 1.7.7.6
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com <mailto:Ocfs2-devel@oss.oracle.com>
> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20120530/5fcb3ea7/attachment.html
next prev parent reply other threads:[~2012-05-30 0:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-25 5:53 [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery xiaowei.hu at oracle.com
2012-05-25 22:17 ` srinivas eeda
2012-05-26 2:05 ` Xiaowei
2012-05-29 22:09 ` Sunil Mushran
2012-05-30 0:41 ` Xiaowei [this message]
2012-05-31 1:18 ` Sunil Mushran
2012-07-26 6:52 ` Xiaowei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FC56CA5.8040902@oracle.com \
--to=xiaowei.hu@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.