From mboxrd@z Thu Jan 1 00:00:00 1970 From: Fabio M. Di Nitto Date: Thu, 28 Oct 2010 08:38:37 +0200 Subject: [Cluster-devel] [PATCH] rgmanager: Halt services if CMAN dies In-Reply-To: <1288214232-17790-1-git-send-email-lhh@redhat.com> References: <1288214232-17790-1-git-send-email-lhh@redhat.com> Message-ID: <4CC91A6D.3050905@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Looks sane to me. Fabio On 10/27/2010 11:17 PM, Lon Hohberger wrote: > If cman dies because it receives a kill packet (of doom) > from other hosts, rgmanager does not notice. This can > happen if, for example, you are using qdiskd and it hangs > on I/O to the quorum disk due to frequent trespasses or > other SAN interruptions. The other instance of qdiskd > will ask CMAN to evict the hung node, causing it to be > ejected from the cluster and fenced. > > Data is safe (which is the top priority). If power-cycle > fencing is in use, there is no issue at all; the node > reboots and service failover occurs fairly quickly. > > However, problems can arise if, in the same hung-I/O > situation: > > * storage-level fencing is in use > > * rgmanager has one or more IP addresses in use > as part of cluster services. > > This is because more recent versions of the IP resource > agent actually ping the IP address prior to bringing it > online for use by services. This prevents accidental > take-over of IP addresses in use by other hosts on the > network due to an administrator mistake when setting up > the cluster. > > Unfortunately, this behavior also prevents service > failover if the presumed-dead host is still online. > > This patch causes rgmanager to use poll() instead of > select() when dealing with the baseline CMAN connection > it uses for receiving membership changes and so forth. > > If the socket is closed by CMAN (either by CMAN's death > or some other reason), rgmanager can now detect and act > upon that will now treat that stimulus. It treats it as > an emergency cluster shutdown request. It will halt all > services and exit as quickly as possible. > > Unfortunately, there is a race between this emergency > action and recovery on the surviving host. It is not > possible for rgmanager to guarantee that all services will > halt after the node has been fenced from shared storage > (but before the other host attempts to start the > service(s)). > > Furthermore, a hung 'stop' request caused by loss of > access to shared storage may very well cause rgmanager > to hang forever, preventing some services (or parts) > from ever actually being killed. > > A main use case for storage-level fencing over power- > cycling is the ability to perform post-mortem RCA of what > happened in order to cause the node to die in the first > place. This implies that rgmanager killing the host > would be an incorrect resolution. > > Resolves: rhbz#639961 > > Signed-off-by: Lon Hohberger > --- > rgmanager/src/clulib/msg_cluster.c | 32 ++++++++++++++++++++++---------- > 1 files changed, 22 insertions(+), 10 deletions(-) > > diff --git a/rgmanager/src/clulib/msg_cluster.c b/rgmanager/src/clulib/msg_cluster.c > index 4ec3750..00f28c3 100644 > --- a/rgmanager/src/clulib/msg_cluster.c > +++ b/rgmanager/src/clulib/msg_cluster.c > @@ -34,7 +34,9 @@ > #include > #include > #include > +#include > > +static void process_cman_event(cman_handle_t handle, void *private, int reason, int arg); > /* Ripped from ccsd's setup_local_socket */ > > int cluster_msg_close(msgctx_t *ctx); > @@ -165,18 +167,17 @@ static int > poll_cluster_messages(int timeout) > { > int ret = -1; > - fd_set rfds; > - int fd, lfd, max; > + int fd, lfd; > struct timeval tv; > struct timeval *p = NULL; > cman_handle_t ch; > + struct pollfd fds[2]; > > if (timeout >= 0) { > p = &tv; > tv.tv_sec = tv.tv_usec = timeout; > } > > - FD_ZERO(&rfds); > > /* This sucks - it could cause other threads trying to get a > membership list to block for a long time. Now, that should not > @@ -195,20 +196,31 @@ poll_cluster_messages(int timeout) > cman_unlock(ch); > return 0; > } > - FD_SET(fd, &rfds); > - FD_SET(lfd, &rfds); > > - max = (lfd > fd ? lfd : fd); > - if (select(max + 1, &rfds, NULL, NULL, p) > 0) { > + fds[0].fd = lfd; > + fds[1].fd = fd; > + fds[0].events = POLLIN | POLLHUP | POLLERR; > + fds[1].events = POLLIN | POLLHUP | POLLERR; > + > + if (poll(fds, 2, timeout * 1000) > 0) { > + > /* Someone woke us up */ > - if (FD_ISSET(lfd, &rfds)) { > + if (fds[0].revents & POLLIN) { > cman_unlock(ch); > errno = EAGAIN; > return -1; > } > > - cman_dispatch(ch, 0); > - ret = 0; > + if (fds[1].revents & (POLLHUP | POLLERR)) { > + process_cman_event(ch, NULL, > + CMAN_REASON_TRY_SHUTDOWN, > + 0); > + } > + > + if (fds[1].revents & POLLIN) { > + cman_dispatch(ch, 0); > + ret = 0; > + } > } > cman_unlock(ch); >