[Cluster-devel] [PATCH] rgmanager: Halt services if CMAN dies

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Fabio M. Di Nitto <fdinitto@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] rgmanager: Halt services if CMAN dies
Date: Thu, 28 Oct 2010 08:38:37 +0200	[thread overview]
Message-ID: <4CC91A6D.3050905@redhat.com> (raw)
In-Reply-To: <1288214232-17790-1-git-send-email-lhh@redhat.com>

Looks sane to me.

Fabio

On 10/27/2010 11:17 PM, Lon Hohberger wrote:
> If cman dies because it receives a kill packet (of doom)
> from other hosts, rgmanager does not notice.  This can
> happen if, for example, you are using qdiskd and it hangs
> on I/O to the quorum disk due to frequent trespasses or
> other SAN interruptions.  The other instance of qdiskd
> will ask CMAN to evict the hung node, causing it to be
> ejected from the cluster and fenced.
> 
> Data is safe (which is the top priority).  If power-cycle
> fencing is in use, there is no issue at all; the node
> reboots and service failover occurs fairly quickly.
> 
> However, problems can arise if, in the same hung-I/O
> situation:
> 
>  * storage-level fencing is in use
> 
>  * rgmanager has one or more IP addresses in use
>    as part of cluster services.
> 
> This is because more recent versions of the IP resource
> agent actually ping the IP address prior to bringing it
> online for use by services.  This prevents accidental
> take-over of IP addresses in use by other hosts on the
> network due to an administrator mistake when setting up
> the cluster.
> 
> Unfortunately, this behavior also prevents service
> failover if the presumed-dead host is still online.
> 
> This patch causes rgmanager to use poll() instead of
> select() when dealing with the baseline CMAN connection
> it uses for receiving membership changes and so forth.
> 
> If the socket is closed by CMAN (either by CMAN's death
> or some other reason), rgmanager can now detect and act
> upon that will now treat that stimulus.  It treats it as
> an emergency cluster shutdown request.  It will halt all
> services and exit as quickly as possible.
> 
> Unfortunately, there is a race between this emergency
> action and recovery on the surviving host.  It is not
> possible for rgmanager to guarantee that all services will
> halt after the node has been fenced from shared storage
> (but before the other host attempts to start the
> service(s)).
> 
> Furthermore, a hung 'stop' request caused by loss of
> access to shared storage may very well cause rgmanager
> to hang forever, preventing some services (or parts)
> from ever actually being killed.
> 
> A main use case for storage-level fencing over power-
> cycling is the ability to perform post-mortem RCA of what
> happened in order to cause the node to die in the first
> place.  This implies that rgmanager killing the host
> would be an incorrect resolution.
> 
> Resolves: rhbz#639961
> 
> Signed-off-by: Lon Hohberger <lhh@redhat.com>
> ---
>  rgmanager/src/clulib/msg_cluster.c |   32 ++++++++++++++++++++++----------
>  1 files changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/rgmanager/src/clulib/msg_cluster.c b/rgmanager/src/clulib/msg_cluster.c
> index 4ec3750..00f28c3 100644
> --- a/rgmanager/src/clulib/msg_cluster.c
> +++ b/rgmanager/src/clulib/msg_cluster.c
> @@ -34,7 +34,9 @@
>  #include <gettid.h>
>  #include <cman-private.h>
>  #include <clulog.h>
> +#include <poll.h>
>  
> +static void process_cman_event(cman_handle_t handle, void *private, int reason, int arg);
>  /* Ripped from ccsd's setup_local_socket */
>  
>  int cluster_msg_close(msgctx_t *ctx);
> @@ -165,18 +167,17 @@ static int
>  poll_cluster_messages(int timeout)
>  {
>  	int ret = -1;
> -	fd_set rfds;
> -	int fd, lfd, max;
> +	int fd, lfd;
>  	struct timeval tv;
>  	struct timeval *p = NULL;
>  	cman_handle_t ch;
> +	struct pollfd fds[2];
>  
>  	if (timeout >= 0) {
>  		p = &tv;
>  		tv.tv_sec = tv.tv_usec = timeout;
>  	}
>  
> -	FD_ZERO(&rfds);
>  
>  	/* This sucks - it could cause other threads trying to get a
>  	   membership list to block for a long time.  Now, that should not
> @@ -195,20 +196,31 @@ poll_cluster_messages(int timeout)
>  		cman_unlock(ch);
>  		return 0;
>  	}
> -	FD_SET(fd, &rfds);
> -	FD_SET(lfd, &rfds);
>  
> -	max = (lfd > fd ? lfd : fd);
> -	if (select(max + 1, &rfds, NULL, NULL, p) > 0) {
> +	fds[0].fd = lfd;
> +	fds[1].fd = fd;
> +	fds[0].events = POLLIN | POLLHUP | POLLERR;
> +	fds[1].events = POLLIN | POLLHUP | POLLERR;
> +
> +	if (poll(fds, 2, timeout * 1000) > 0) {
> +
>  		/* Someone woke us up */
> -		if (FD_ISSET(lfd, &rfds)) {
> +		if (fds[0].revents & POLLIN) {
>  			cman_unlock(ch);
>  			errno = EAGAIN;
>  			return -1;
>  		}
>  
> -		cman_dispatch(ch, 0);
> -		ret = 0;
> +		if (fds[1].revents & (POLLHUP | POLLERR)) {
> +			process_cman_event(ch, NULL,
> +					   CMAN_REASON_TRY_SHUTDOWN,
> +					   0);
> +		}
> +
> +		if (fds[1].revents & POLLIN) {
> +			cman_dispatch(ch, 0);
> +			ret = 0;
> +		}
>  	}
>  	cman_unlock(ch);
>

next prev parent reply	other threads:[~2010-10-28  6:38 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-27 21:17 [Cluster-devel] [PATCH] rgmanager: Halt services if CMAN dies Lon Hohberger
2010-10-28  6:38 ` Fabio M. Di Nitto [this message]
2010-10-28 13:44   ` Lon Hohberger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CC91A6D.3050905@redhat.com \
    --to=fdinitto@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.