From: Sunil Mushran <sunil.mushran@oracle.com>
To: ocfs2-devel@oss.oracle.com
Subject: [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master
Date: Tue, 20 Jul 2010 15:33:17 -0700 [thread overview]
Message-ID: <4C46242D.3060305@oracle.com> (raw)
In-Reply-To: <20100720025948.GB2936@laptop.cn.oracle.com>
On 07/19/2010 07:59 PM, Wengang Wang wrote:
>> Do you have the message sequencing that would lead to this situation?
>> If we migrate the lockres to the reco master, the reco master will send
>> an assert that will make us change the master.
>>
> So first, the problem is not about the changing owner. It is that
> the bit(in refmap) on behalf of the node in question is not cleared on the new
> master(recovery master). So that the new master will fail at purging the lockres
> due to the incorrect bit in refmap.
>
> Second, I have no messages at hand for the situation. But I think it is simple
> enough.
>
> 1) node A has no interest on lockres A any longer, so it is purging it.
> 2) the owner of lockres A is node B, so node A is sending de-ref message
> to node B.
> 3) at this time, node B crashed. node C becomes the recovery master. it recovers
> lockres A(because the master is the dead node B).
> 4) node A migrated lockres A to node C with a refbit there.
> 5) node A failed to send de-ref message to node B because it crashed. The failure
> is ignored. no other action is done for lockres A any more.
>
In dlm_do_local_recovery_cleanup(), we expicitly clear the flag...
when the owner is the dead_node. So this should not happen.
Your patch changes the logic to exclude such lockres' from the
recovery list. And that's a change, while possibly workable, needs
to be looked into more thoroughly.
In short, there is a disconnect between your description and your patch.
Or, my understanding.
> So node A means to drop the ref on the master. But in such a situation, node C
> keeps the ref on behalf of node A unexpectedly. Node C finally fails at purging
> lockres A and hang on umount.
>
>
>> I think your problem is the one race we have concerning reco/migration.
>> If so, this fix is not enough.
>>
> It's a problem of purging + recovery. no pure migration for umount here.
> So what's your concern?
>
See above.
next prev parent reply other threads:[~2010-07-20 22:33 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-10 16:25 [Ocfs2-devel] [PATCH] ocfs2/dlm: correct the refmap on recovery master Wengang Wang
2010-06-25 1:55 ` Wengang Wang
2010-07-05 10:00 ` Wengang Wang
2010-07-19 10:09 ` Wengang Wang
2010-07-19 23:52 ` Sunil Mushran
2010-07-20 2:59 ` Wengang Wang
2010-07-20 22:33 ` Sunil Mushran [this message]
2010-07-21 12:22 ` Wengang Wang
2010-07-21 18:19 ` Sunil Mushran
2010-07-22 10:51 ` Wengang Wang
2010-07-22 16:58 ` Sunil Mushran
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4C46242D.3060305@oracle.com \
--to=sunil.mushran@oracle.com \
--cc=ocfs2-devel@oss.oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).