From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [10.34.131.57] (dhcp131-57.brq.redhat.com [10.34.131.57] (may be forged)) by int-mx09.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id qBDA4emZ023013 for ; Thu, 13 Dec 2012 05:04:41 -0500 Message-ID: <50C9A838.3080709@redhat.com> Date: Thu, 13 Dec 2012 11:04:40 +0100 From: Zdenek Kabelac MIME-Version: 1.0 References: <50C90FC1.4060803@yahoo.co.uk> In-Reply-To: <50C90FC1.4060803@yahoo.co.uk> Content-Transfer-Encoding: 7bit Subject: Re: [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join Reply-To: LVM general discussion and development List-Id: LVM general discussion and development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , List-Id: Content-Type: text/plain; charset="us-ascii"; format="flowed" To: linux-lvm@redhat.com Dne 13.12.2012 00:14, Dmitry Panov napsal(a): > Hi everyone, > > I've been testing clvm recently and noticed that often the operations are > blocked when a node rejoins the cluster after being fenced or power cycled. > I've done some investigation and found a number of issues relating to clvm. > Here is what's happening: > > > - When a node is fenced there is no "port closed" message sent to clvm which > means the node id remains in the updown hash, although the node itself is > removed from the nodes list after a "configuration changed" message is received. > > - Then, when the node rejoins, another "configuration changed" message arrives > but because the node id is still in the hash, it is assumed that clvmd on that > node is running even though it might not be the case yet (in my case clvmd is > a pacemaker resource so it takes a couple of seconds before it's started). > > - This causes the expected_replies count set to a higher number than it should > be, and as a result there are never enough replies received. > > - There is a problem with handling of the cmd_timeout which appears to be > fixed today (what a coincidence!) by this patch: > https://www.redhat.com/archives/lvm-devel/2012-December/msg00024.html The > reason why I was hitting this bug is because I'm using Linux Cluster > Management Console which polls LVM often enough so that the timeout code never > ran. I have > fixed this independently and even though my efforts are now probably wasted > I'm attaching a patch for your consideration. I believe it enforces the > timeout more strictly. > > Now, the questions: > > 1. If the problem with stuck entry in the updown hash is fixed it will cause > operations to fail until clvmd is started on the re-joined node. Is there any > particular reason for making them fail? Is it to avoid a race condition when > newly started clvmd might not receive a message generated by an 'old' node? > > 2. The current expected_replies counter seems a bit flawed to me because it > will fail if a node leaves the cluster before it sends a reply. Should it be > handled differently? For example instead of a simple counter we could have a > list of nodes which should be updated when a node leaves the cluster. > Hmmm this rather looks like a logical problem either in the if() expression in (select_status == 0) branch, or somehow 'magical' gulm fix applied in 2005 for add_to_lvmqueue() should be running not just when message arrives. Both patches seems to be not fixing the bug, but rather trying to go around broken logic in the main loop - it will need some thinking. Zdenek