From mboxrd@z Thu Jan  1 00:00:00 1970
From: Lon Hohberger <lhh@redhat.com>
Date: Wed, 27 Oct 2010 17:17:12 -0400
Subject: [Cluster-devel] [PATCH] rgmanager: Halt services if CMAN dies
Message-ID: <1288214232-17790-1-git-send-email-lhh@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

If cman dies because it receives a kill packet (of doom)
from other hosts, rgmanager does not notice.  This can
happen if, for example, you are using qdiskd and it hangs
on I/O to the quorum disk due to frequent trespasses or
other SAN interruptions.  The other instance of qdiskd
will ask CMAN to evict the hung node, causing it to be
ejected from the cluster and fenced.

Data is safe (which is the top priority).  If power-cycle
fencing is in use, there is no issue at all; the node
reboots and service failover occurs fairly quickly.

However, problems can arise if, in the same hung-I/O
situation:

 * storage-level fencing is in use

 * rgmanager has one or more IP addresses in use
   as part of cluster services.

This is because more recent versions of the IP resource
agent actually ping the IP address prior to bringing it
online for use by services.  This prevents accidental
take-over of IP addresses in use by other hosts on the
network due to an administrator mistake when setting up
the cluster.

Unfortunately, this behavior also prevents service
failover if the presumed-dead host is still online.

This patch causes rgmanager to use poll() instead of
select() when dealing with the baseline CMAN connection
it uses for receiving membership changes and so forth.

If the socket is closed by CMAN (either by CMAN's death
or some other reason), rgmanager can now detect and act
upon that will now treat that stimulus.  It treats it as
an emergency cluster shutdown request.  It will halt all
services and exit as quickly as possible.

Unfortunately, there is a race between this emergency
action and recovery on the surviving host.  It is not
possible for rgmanager to guarantee that all services will
halt after the node has been fenced from shared storage
(but before the other host attempts to start the
service(s)).

Furthermore, a hung 'stop' request caused by loss of
access to shared storage may very well cause rgmanager
to hang forever, preventing some services (or parts)
from ever actually being killed.

A main use case for storage-level fencing over power-
cycling is the ability to perform post-mortem RCA of what
happened in order to cause the node to die in the first
place.  This implies that rgmanager killing the host
would be an incorrect resolution.

Resolves: rhbz#639961

Signed-off-by: Lon Hohberger <lhh@redhat.com>
---
 rgmanager/src/clulib/msg_cluster.c |   32 ++++++++++++++++++++++----------
 1 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/rgmanager/src/clulib/msg_cluster.c b/rgmanager/src/clulib/msg_cluster.c
index 4ec3750..00f28c3 100644
--- a/rgmanager/src/clulib/msg_cluster.c
+++ b/rgmanager/src/clulib/msg_cluster.c
@@ -34,7 +34,9 @@
 #include <gettid.h>
 #include <cman-private.h>
 #include <clulog.h>
+#include <poll.h>
 
+static void process_cman_event(cman_handle_t handle, void *private, int reason, int arg);
 /* Ripped from ccsd's setup_local_socket */
 
 int cluster_msg_close(msgctx_t *ctx);
@@ -165,18 +167,17 @@ static int
 poll_cluster_messages(int timeout)
 {
 	int ret = -1;
-	fd_set rfds;
-	int fd, lfd, max;
+	int fd, lfd;
 	struct timeval tv;
 	struct timeval *p = NULL;
 	cman_handle_t ch;
+	struct pollfd fds[2];
 
 	if (timeout >= 0) {
 		p = &tv;
 		tv.tv_sec = tv.tv_usec = timeout;
 	}
 
-	FD_ZERO(&rfds);
 
 	/* This sucks - it could cause other threads trying to get a
 	   membership list to block for a long time.  Now, that should not
@@ -195,20 +196,31 @@ poll_cluster_messages(int timeout)
 		cman_unlock(ch);
 		return 0;
 	}
-	FD_SET(fd, &rfds);
-	FD_SET(lfd, &rfds);
 
-	max = (lfd > fd ? lfd : fd);
-	if (select(max + 1, &rfds, NULL, NULL, p) > 0) {
+	fds[0].fd = lfd;
+	fds[1].fd = fd;
+	fds[0].events = POLLIN | POLLHUP | POLLERR;
+	fds[1].events = POLLIN | POLLHUP | POLLERR;
+
+	if (poll(fds, 2, timeout * 1000) > 0) {
+
 		/* Someone woke us up */
-		if (FD_ISSET(lfd, &rfds)) {
+		if (fds[0].revents & POLLIN) {
 			cman_unlock(ch);
 			errno = EAGAIN;
 			return -1;
 		}
 
-		cman_dispatch(ch, 0);
-		ret = 0;
+		if (fds[1].revents & (POLLHUP | POLLERR)) {
+			process_cman_event(ch, NULL,
+					   CMAN_REASON_TRY_SHUTDOWN,
+					   0);
+		}
+
+		if (fds[1].revents & POLLIN) {
+			cman_dispatch(ch, 0);
+			ret = 0;
+		}
 	}
 	cman_unlock(ch);
 
-- 
1.7.2.3