[linux-lvm] clvmd on cman waits forever holding the P

linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed

* [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join
@ 2012-12-12 23:14 Dmitry Panov
  2012-12-13 10:04 ` Zdenek Kabelac
  2012-12-14  7:14 ` Jacek Konieczny
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Panov @ 2012-12-12 23:14 UTC (permalink / raw)
  To: linux-lvm

[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]

Hi everyone,

I've been testing clvm recently and noticed that often the operations 
are blocked when a node rejoins the cluster after being fenced or power 
cycled. I've done some investigation and found a number of issues 
relating to clvm. Here is what's happening:


- When a node is fenced there is no "port closed" message sent to clvm 
which means the node id remains in the updown hash, although the node 
itself is removed from the nodes list after a "configuration changed" 
message is received.

- Then, when the node rejoins, another "configuration changed" message 
arrives but because the node id is still in the hash, it is assumed that 
clvmd on that node is running even though it might not be the case yet 
(in my case clvmd is a pacemaker resource so it takes a couple of 
seconds before it's started).

- This causes the expected_replies count set to a higher number than it 
should be, and as a result there are never enough replies received.

- There is a problem with handling of the cmd_timeout which appears to 
be fixed today (what a coincidence!) by this patch: 
https://www.redhat.com/archives/lvm-devel/2012-December/msg00024.html 
The reason why I was hitting this bug is because I'm using Linux Cluster 
Management Console which polls LVM often enough so that the timeout code 
never ran. I have
fixed this independently and even though my efforts are now probably 
wasted I'm attaching a patch for your consideration. I believe it 
enforces the timeout more strictly.

Now, the questions:

1. If the problem with stuck entry in the updown hash is fixed it will 
cause operations to fail until clvmd is started on the re-joined node. 
Is there any particular reason for making them fail? Is it to avoid a 
race condition when newly started clvmd might not receive a message 
generated by an 'old' node?

2. The current expected_replies counter seems a bit flawed to me because 
it will fail if a node leaves the cluster before it sends a reply. 
Should it be handled differently? For example instead of a simple 
counter we could have a list of nodes which should be updated when a 
node leaves the cluster.


Best regards,

-- 
Dmitry Panov


[-- Attachment #2: clvmd_timeout.patch --]
[-- Type: text/x-patch, Size: 3278 bytes --]

--- clvmd.c.orig	2012-03-01 21:14:43.000000000 +0000
+++ clvmd.c	2012-12-12 22:57:27.917901181 +0000
@@ -838,30 +838,33 @@
 	sigaddset(&ss, SIGTERM);
 	pthread_sigmask(SIG_UNBLOCK, &ss, NULL);
 	/* Main loop */
+	time_t deadline = time(NULL) + cmd_timeout;
 	while (!quit) {
 		fd_set in;
 		int select_status;
 		struct local_client *thisfd;
-		struct timeval tv = { cmd_timeout, 0 };
-		int quorate = clops->is_quorate();
-
-		/* Wait on the cluster FD and all local sockets/pipes */
-		local_client_head.fd = clops->get_main_cluster_fd();
-		FD_ZERO(&in);
-		for (thisfd = &local_client_head; thisfd != NULL;
-		     thisfd = thisfd->next) {
-
-			if (thisfd->removeme)
-				continue;
-
-			/* if the cluster is not quorate then don't listen for new requests */
-			if ((thisfd->type != LOCAL_RENDEZVOUS &&
-			     thisfd->type != LOCAL_SOCK) || quorate)
-				FD_SET(thisfd->fd, &in);
+		struct timeval tv = { deadline - time(NULL), 0 };
+		if (tv.tv_sec > 0) {
+		    /* Wait on the cluster FD and all local sockets/pipes */
+		    int quorate = clops->is_quorate();
+		    local_client_head.fd = clops->get_main_cluster_fd();
+		    FD_ZERO(&in);
+		    for (thisfd = &local_client_head; thisfd != NULL;
+			 thisfd = thisfd->next) {
+
+			    if (thisfd->removeme)
+				    continue;
+
+			    /* if the cluster is not quorate then don't listen for new requests */
+			    if ((thisfd->type != LOCAL_RENDEZVOUS &&
+				 thisfd->type != LOCAL_SOCK) || quorate)
+				    FD_SET(thisfd->fd, &in);
+		    }
+
+		    select_status = select(FD_SETSIZE, &in, NULL, NULL, &tv);
+		} else {
+		    select_status = 0;
 		}
-
-		select_status = select(FD_SETSIZE, &in, NULL, NULL, &tv);
-
 		if (reread_config) {
 			int saved_errno = errno;
 
@@ -936,28 +939,34 @@
 			}
 		}
 
-		/* Select timed out. Check for clients that have been waiting too long for a response */
-		if (select_status == 0) {
+		/* Check for clients that have been waiting too long for a response and set a new wake-up deadline */
+		if (select_status >= 0) {
 			time_t the_time = time(NULL);
+			deadline = the_time + cmd_timeout;
 
 			for (thisfd = &local_client_head; thisfd != NULL;
 			     thisfd = thisfd->next) {
 				if (thisfd->type == LOCAL_SOCK
 				    && thisfd->bits.localsock.sent_out
-				    && thisfd->bits.localsock.sent_time +
-				    cmd_timeout < the_time
 				    && thisfd->bits.localsock.
 				    expected_replies !=
 				    thisfd->bits.localsock.num_replies) {
-					/* Send timed out message + replies we already have */
-					DEBUGLOG
-					    ("Request timed-out (send: %ld, now: %ld)\n",
-					     thisfd->bits.localsock.sent_time,
-					     the_time);
-
-					thisfd->bits.localsock.all_success = 0;
-
-					request_timed_out(thisfd);
+					time_t this_deadline = thisfd->bits.localsock.sent_time +
+					    cmd_timeout;
+					if (this_deadline < the_time) {
+						/* Send timed out message + replies we already have */
+						DEBUGLOG
+						    ("Request timed-out (send: %ld, now: %ld)\n",
+					    	thisfd->bits.localsock.sent_time,
+					    	the_time);
+
+					    thisfd->bits.localsock.all_success = 0;
+
+					    request_timed_out(thisfd);
+					} else {
+					    if (this_deadline < deadline)
+						deadline = this_deadline;
+					}
 				}
 			}
 		}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join
  2012-12-12 23:14 [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join Dmitry Panov
@ 2012-12-13 10:04 ` Zdenek Kabelac
  2012-12-13 11:07   ` Dmitry Panov
  2012-12-14  7:10   ` Jacek Konieczny
  2012-12-14  7:14 ` Jacek Konieczny
  1 sibling, 2 replies; 6+ messages in thread
From: Zdenek Kabelac @ 2012-12-13 10:04 UTC (permalink / raw)
  To: linux-lvm

Dne 13.12.2012 00:14, Dmitry Panov napsal(a):
> Hi everyone,
>
> I've been testing clvm recently and noticed that often the operations are
> blocked when a node rejoins the cluster after being fenced or power cycled.
> I've done some investigation and found a number of issues relating to clvm.
> Here is what's happening:
>
>
> - When a node is fenced there is no "port closed" message sent to clvm which
> means the node id remains in the updown hash, although the node itself is
> removed from the nodes list after a "configuration changed" message is received.
>
> - Then, when the node rejoins, another "configuration changed" message arrives
> but because the node id is still in the hash, it is assumed that clvmd on that
> node is running even though it might not be the case yet (in my case clvmd is
> a pacemaker resource so it takes a couple of seconds before it's started).
>
> - This causes the expected_replies count set to a higher number than it should
> be, and as a result there are never enough replies received.
>
> - There is a problem with handling of the cmd_timeout which appears to be
> fixed today (what a coincidence!) by this patch:
> https://www.redhat.com/archives/lvm-devel/2012-December/msg00024.html The
> reason why I was hitting this bug is because I'm using Linux Cluster
> Management Console which polls LVM often enough so that the timeout code never
> ran. I have
> fixed this independently and even though my efforts are now probably wasted
> I'm attaching a patch for your consideration. I believe it enforces the
> timeout more strictly.
>
> Now, the questions:
>
> 1. If the problem with stuck entry in the updown hash is fixed it will cause
> operations to fail until clvmd is started on the re-joined node. Is there any
> particular reason for making them fail? Is it to avoid a race condition when
> newly started clvmd might not receive a message generated by an 'old' node?
>
> 2. The current expected_replies counter seems a bit flawed to me because it
> will fail if a node leaves the cluster before it sends a reply. Should it be
> handled differently? For example instead of a simple counter we could have a
> list of nodes which should be updated when a node leaves the cluster.
>


Hmmm this rather looks like a logical problem either in
the if() expression in (select_status == 0) branch,
or somehow 'magical' gulm fix applied in 2005 for add_to_lvmqueue()
should be running not just when message arrives.

Both patches seems to be not fixing the bug, but rather trying to go around 
broken logic in the main loop - it will need some thinking.

Zdenek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join
  2012-12-13 10:04 ` Zdenek Kabelac
@ 2012-12-13 11:07   ` Dmitry Panov
  2012-12-14  7:10   ` Jacek Konieczny
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitry Panov @ 2012-12-13 11:07 UTC (permalink / raw)
  To: linux-lvm


On 13/12/12 10:04, Zdenek Kabelac wrote:
>
>
> Hmmm this rather looks like a logical problem either in
> the if() expression in (select_status == 0) branch,
> or somehow 'magical' gulm fix applied in 2005 for add_to_lvmqueue()
> should be running not just when message arrives.
>
> Both patches seems to be not fixing the bug, but rather trying to go 
> around broken logic in the main loop - it will need some thinking.
>

There are 2 bugs actually: one causes the timeout to occur and another 
causes it not to be handled. My patch clearly fixes the latter. As for 
the former, it needs some thinking indeed.


Best regards,

-- 
Dmitry Panov

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join
  2012-12-13 10:04 ` Zdenek Kabelac
  2012-12-13 11:07   ` Dmitry Panov
@ 2012-12-14  7:10   ` Jacek Konieczny
  2012-12-14 10:45     ` Dmitry Panov
  1 sibling, 1 reply; 6+ messages in thread
From: Jacek Konieczny @ 2012-12-14  7:10 UTC (permalink / raw)
  To: Zdenek Kabelac; +Cc: linux-lvm

On Thu, Dec 13, 2012 at 11:04:40AM +0100, Zdenek Kabelac wrote:
> Hmmm this rather looks like a logical problem either in
> the if() expression in (select_status == 0) branch,

No fix in the (select_status == 0) branch would solve anything, as the
branch is never executed. This is the major problem here.

Select has timeout set to 60 seconds, a few fd events come each minute
-> the select never times out, select_status is always != 0.

Shortening the cmd_timeout setting would work, but that would be only a
workaround and would work until the fd events come even more often.

Greets,
        Jacek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join
  2012-12-12 23:14 [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join Dmitry Panov
  2012-12-13 10:04 ` Zdenek Kabelac
@ 2012-12-14  7:14 ` Jacek Konieczny
  1 sibling, 0 replies; 6+ messages in thread
From: Jacek Konieczny @ 2012-12-14  7:14 UTC (permalink / raw)
  To: Dmitry Panov; +Cc: linux-lvm

On Wed, Dec 12, 2012 at 11:14:09PM +0000, Dmitry Panov wrote:
> 1. If the problem with stuck entry in the updown hash is fixed it will 
> cause operations to fail until clvmd is started on the re-joined node. 
> Is there any particular reason for making them fail? Is it to avoid a 
> race condition when newly started clvmd might not receive a message 
> generated by an 'old' node?

For me clvmd would not work (refuse any cluster locking) if any node
leaves the cluster (making cluster LVM useless for fail-over), unless I
applied this patch:

http://www.redhat.com/archives/lvm-devel/2012-November/msg00024.html

Greets,
        Jacek

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join
  2012-12-14  7:10   ` Jacek Konieczny
@ 2012-12-14 10:45     ` Dmitry Panov
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry Panov @ 2012-12-14 10:45 UTC (permalink / raw)
  To: linux-lvm


On 14/12/12 07:10, Jacek Konieczny wrote:
> On Thu, Dec 13, 2012 at 11:04:40AM +0100, Zdenek Kabelac wrote:
>> Hmmm this rather looks like a logical problem either in
>> the if() expression in (select_status == 0) branch,
> No fix in the (select_status == 0) branch would solve anything, as the
> branch is never executed. This is the major problem here.
>
> Select has timeout set to 60 seconds, a few fd events come each minute
> -> the select never times out, select_status is always != 0.
>
> Shortening the cmd_timeout setting would work, but that would be only a
> workaround and would work until the fd events come even more often.
Applying my patch would solve the problem with timeout code not being 
run as well. And regardless of whether the underlying issue is fixed or 
not the timeout handling must be fixed too because we can't totally 
avoid timeouts (that's why the code is there).


-- 
Dmitry Panov

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-12-14 10:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-12 23:14 [linux-lvm] clvmd on cman waits forever holding the P_#global lock on node re-join Dmitry Panov
2012-12-13 10:04 ` Zdenek Kabelac
2012-12-13 11:07   ` Dmitry Panov
2012-12-14  7:10   ` Jacek Konieczny
2012-12-14 10:45     ` Dmitry Panov
2012-12-14  7:14 ` Jacek Konieczny

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).