From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [10.34.131.57] (dhcp131-57.brq.redhat.com [10.34.131.57] (may
	be forged))
	by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP
	id qBDA4emZ023013
	for <linux-lvm@redhat.com>; Thu, 13 Dec 2012 05:04:41 -0500
Message-ID: <50C9A838.3080709@redhat.com>
Date: Thu, 13 Dec 2012 11:04:40 +0100
From: Zdenek Kabelac <zkabelac@redhat.com>
MIME-Version: 1.0
References: <50C90FC1.4060803@yahoo.co.uk>
In-Reply-To: <50C90FC1.4060803@yahoo.co.uk>
Content-Transfer-Encoding: 7bit
Subject: Re: [linux-lvm] clvmd on cman waits forever holding the P_#global
 lock on node re-join
Reply-To: LVM general discussion and development <linux-lvm@redhat.com>
List-Id: LVM general discussion and development <linux-lvm.redhat.com>
List-Unsubscribe: <https://www.redhat.com/mailman/options/linux-lvm>,
	<mailto:linux-lvm-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/linux-lvm>
List-Post: <mailto:linux-lvm@redhat.com>
List-Help: <mailto:linux-lvm-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/linux-lvm>,
	<mailto:linux-lvm-request@redhat.com?subject=subscribe>
List-Id: <linux-lvm.redhat.com>
Content-Type: text/plain; charset="us-ascii"; format="flowed"
To: linux-lvm@redhat.com

Dne 13.12.2012 00:14, Dmitry Panov napsal(a):
> Hi everyone,
>
> I've been testing clvm recently and noticed that often the operations are
> blocked when a node rejoins the cluster after being fenced or power cycled.
> I've done some investigation and found a number of issues relating to clvm.
> Here is what's happening:
>
>
> - When a node is fenced there is no "port closed" message sent to clvm which
> means the node id remains in the updown hash, although the node itself is
> removed from the nodes list after a "configuration changed" message is received.
>
> - Then, when the node rejoins, another "configuration changed" message arrives
> but because the node id is still in the hash, it is assumed that clvmd on that
> node is running even though it might not be the case yet (in my case clvmd is
> a pacemaker resource so it takes a couple of seconds before it's started).
>
> - This causes the expected_replies count set to a higher number than it should
> be, and as a result there are never enough replies received.
>
> - There is a problem with handling of the cmd_timeout which appears to be
> fixed today (what a coincidence!) by this patch:
> https://www.redhat.com/archives/lvm-devel/2012-December/msg00024.html The
> reason why I was hitting this bug is because I'm using Linux Cluster
> Management Console which polls LVM often enough so that the timeout code never
> ran. I have
> fixed this independently and even though my efforts are now probably wasted
> I'm attaching a patch for your consideration. I believe it enforces the
> timeout more strictly.
>
> Now, the questions:
>
> 1. If the problem with stuck entry in the updown hash is fixed it will cause
> operations to fail until clvmd is started on the re-joined node. Is there any
> particular reason for making them fail? Is it to avoid a race condition when
> newly started clvmd might not receive a message generated by an 'old' node?
>
> 2. The current expected_replies counter seems a bit flawed to me because it
> will fail if a node leaves the cluster before it sends a reply. Should it be
> handled differently? For example instead of a simple counter we could have a
> list of nodes which should be updated when a node leaves the cluster.
>


Hmmm this rather looks like a logical problem either in
the if() expression in (select_status == 0) branch,
or somehow 'magical' gulm fix applied in 2005 for add_to_lvmqueue()
should be running not just when message arrives.

Both patches seems to be not fixing the bug, but rather trying to go around 
broken logic in the main loop - it will need some thinking.

Zdenek