All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiaowei <xiaowei.hu@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery
Date: Thu, 26 Jul 2012 14:52:17 +0800	[thread overview]
Message-ID: <5010E921.40808@oracle.com> (raw)
In-Reply-To: <CAEeiSHWkhD8x8nrix2+Wc1nesH8CExU6kA10nCH0J1nCwUaDtg@mail.gmail.com>

Hi Sunil,

I considered your suggestion about this patch, it's possible to change 
the status in dlm hb down event,
but what need to change are the dlm_reco_node_data structures in 
dlm->reco.node_data list.
This list is initialized in dlm_remaster_locks when it begins the lock 
remaster and destroied before exit this function.
So it's not proper to check data in such a list from dlm hb down event, 
am I right?
If change the status from dlm hb down event , that means we make the 
recovery thread rely on more information from the hb down event,
actually the dlm->live_nodes_map is marked in this event , and for 
others to check , right?

This race condition only happen when cluster already in recovery and a 
node dead during recovery. the recovery thread blocked the update of 
dlm->domain_map, so I fallback to check the live_nodes_map, which won't 
be blocked.

Please reconsider this patch.

Thanks,
Xiaowei

On 05/31/2012 09:18 AM, Sunil Mushran wrote:
> On Tue, May 29, 2012 at 5:41 PM, Xiaowei <xiaowei.hu@oracle.com> wrote:
>> On 05/30/2012 06:09 AM, Sunil Mushran wrote:
>> I would suggest exploring adding this in dlm hb down event. Checking live
>> map all
>> over the place is hacky. We do it more than we should right now. Let's not
>> add to the
>> mess.
>>
>> HI Sunil,
>>
>> Do you mean we should clear the bit in domain map in dlm hb down event
>> directly when the node down
>> and check with dlm_is_node_dead at here?
>> Or how could we explore and ensure the node is alive during the whole
>> migrate process?One node could die even after it sends out one locks package
>> and before the next if there were too many locks on that lockres.
> dlm hb down event is triggered when a node is declared dead. That's where we
> clean up pending mles, etc. You can add a check for recovery and add logic to
> change the reco state for that node there.

      reply	other threads:[~2012-07-26  6:52 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-25  5:53 [Ocfs2-devel] [PATCH] Fix waiting status race condition in dlm recovery xiaowei.hu at oracle.com
2012-05-25 22:17 ` srinivas eeda
2012-05-26  2:05   ` Xiaowei
2012-05-29 22:09 ` Sunil Mushran
2012-05-30  0:41   ` Xiaowei
2012-05-31  1:18     ` Sunil Mushran
2012-07-26  6:52       ` Xiaowei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5010E921.40808@oracle.com \
    --to=xiaowei.hu@oracle.com \
    --cc=ocfs2-devel@oss.oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.