linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Zdenek Kabelac <zkabelac@redhat.com>
To: linux-lvm@redhat.com
Subject: Re: [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join
Date: Thu, 13 Dec 2012 11:04:40 +0100	[thread overview]
Message-ID: <50C9A838.3080709@redhat.com> (raw)
In-Reply-To: <50C90FC1.4060803@yahoo.co.uk>

Dne 13.12.2012 00:14, Dmitry Panov napsal(a):
> Hi everyone,
>
> I've been testing clvm recently and noticed that often the operations are
> blocked when a node rejoins the cluster after being fenced or power cycled.
> I've done some investigation and found a number of issues relating to clvm.
> Here is what's happening:
>
>
> - When a node is fenced there is no "port closed" message sent to clvm which
> means the node id remains in the updown hash, although the node itself is
> removed from the nodes list after a "configuration changed" message is received.
>
> - Then, when the node rejoins, another "configuration changed" message arrives
> but because the node id is still in the hash, it is assumed that clvmd on that
> node is running even though it might not be the case yet (in my case clvmd is
> a pacemaker resource so it takes a couple of seconds before it's started).
>
> - This causes the expected_replies count set to a higher number than it should
> be, and as a result there are never enough replies received.
>
> - There is a problem with handling of the cmd_timeout which appears to be
> fixed today (what a coincidence!) by this patch:
> https://www.redhat.com/archives/lvm-devel/2012-December/msg00024.html The
> reason why I was hitting this bug is because I'm using Linux Cluster
> Management Console which polls LVM often enough so that the timeout code never
> ran. I have
> fixed this independently and even though my efforts are now probably wasted
> I'm attaching a patch for your consideration. I believe it enforces the
> timeout more strictly.
>
> Now, the questions:
>
> 1. If the problem with stuck entry in the updown hash is fixed it will cause
> operations to fail until clvmd is started on the re-joined node. Is there any
> particular reason for making them fail? Is it to avoid a race condition when
> newly started clvmd might not receive a message generated by an 'old' node?
>
> 2. The current expected_replies counter seems a bit flawed to me because it
> will fail if a node leaves the cluster before it sends a reply. Should it be
> handled differently? For example instead of a simple counter we could have a
> list of nodes which should be updated when a node leaves the cluster.
>


Hmmm this rather looks like a logical problem either in
the if() expression in (select_status == 0) branch,
or somehow 'magical' gulm fix applied in 2005 for add_to_lvmqueue()
should be running not just when message arrives.

Both patches seems to be not fixing the bug, but rather trying to go around 
broken logic in the main loop - it will need some thinking.

Zdenek

  reply	other threads:[~2012-12-13 10:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-12 23:14 [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join Dmitry Panov
2012-12-13 10:04 ` Zdenek Kabelac [this message]
2012-12-13 11:07   ` Dmitry Panov
2012-12-14  7:10   ` Jacek Konieczny
2012-12-14 10:45     ` Dmitry Panov
2012-12-14  7:14 ` Jacek Konieczny

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C9A838.3080709@redhat.com \
    --to=zkabelac@redhat.com \
    --cc=linux-lvm@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).