From: Fabio M. Di Nitto <fdinitto@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [patch] cman: Added checkquorum script for self fencing
Date: Wed, 02 Feb 2011 08:42:41 +0100 [thread overview]
Message-ID: <4D490AF1.3000104@redhat.com> (raw)
In-Reply-To: <702705878.661260.1296594791720.JavaMail.root@zmail04.collab.prod.int.phx2.redhat.com>
Both patches are ACK.
I have only one simple question and far from being a problem. We ship
this script as example basically, since we don?t install in
/etc/watchdog.d automatically. I think it would be best to move it to
the doc section since we require users to copy and edit the parameters.
Tho it also raises the question that you could simply symlink it from
watchdog.d if it?s in sharedir/cluster since it?s location would be
fixed (vs doc changes with versioning of the package).
Fabio
On 2/1/2011 10:13 PM, Chris Feist wrote:
> cman: Added checkquorum script for self fencing
>
> A checkquorum script has been added which when copied to the
> /etc/watchdog.d directory will cause the machine to node to reboot
> itself if it has lost quorum for ~60 seconds.
>
> Resolves: rhbz#560700
> ---
> cman/Makefile | 2 +-
> cman/man/Makefile | 3 +-
> cman/man/checkquorum.8 | 29 ++++++++++++++
> cman/scripts/Makefile | 10 +++++
> cman/scripts/checkquorum | 97 ++++++++++++++++++++++++++++++++++++++++++++++
> make/install.mk | 4 ++
> 6 files changed, 143 insertions(+), 2 deletions(-)
>
> diff --git a/cman/Makefile b/cman/Makefile
> index ead0baa..1cf8bc9 100644
> --- a/cman/Makefile
> +++ b/cman/Makefile
> @@ -1,4 +1,4 @@
> include ../make/defines.mk
> include $(OBJDIR)/make/passthrough.mk
>
> -SUBDIRS=lib cman_tool daemon qdisk notifyd init.d man
> +SUBDIRS=lib cman_tool daemon qdisk notifyd init.d man scripts
> diff --git a/cman/man/Makefile b/cman/man/Makefile
> index df20abb..f7fbebf 100644
> --- a/cman/man/Makefile
> +++ b/cman/man/Makefile
> @@ -5,7 +5,8 @@ MANTARGET= \
> qdiskd.8 \
> mkqdisk.8 \
> cmannotifyd.8 \
> - cman_notify.8
> + cman_notify.8 \
> + checkquorum.8
>
> include ../../make/defines.mk
> include $(OBJDIR)/make/install.mk
> diff --git a/cman/man/checkquorum.8 b/cman/man/checkquorum.8
> new file mode 100644
> index 0000000..96f61f0
> --- /dev/null
> +++ b/cman/man/checkquorum.8
> @@ -0,0 +1,29 @@
> +.TH "checkquorum" "8" "February 2011" "" "Check Quorum Watchdog Script"
> +.SH "NAME"
> +checkquorum \- Check Quorum Watchdog Script
> +.SH "SYNOPSIS"
> +\fBcheckquorum
> +.SH "DESCRIPTION"
> +.PP
> +The \fBcheckquorum\fP watchdog script, when copied to the
> +.IR /etc/watchdog.d
> +directory and after enabling/starting the watchdog daemon causes the node to reboot if quorum is
> +lost and not regained within a user configurable amount of time (default: 60 seconds).
> +.SH "OPTIONS"
> +The checkquorum script includes several options which can be set by editing
> +the script with a text editor.
> +.TP
> +.BR $wait_time
> +Amount of time in seconds to wait after quorum is lost before trigger a reboot
> +(Default: 60 seconds).
> +.TP
> +.BR $hardreboot
> +Instantly reboot the machine without cleanly shutting down the system.
> +Useful when the machine may hang on reboot. Set to 1 to hard reboot the
> +system, 0 to do a normal reboot.
> +.SH "NOTES"
> +\fBcheckquorum\fP should never be called outside of watchdog except for
> +debugging purposes.
> +
> +.SH "SEE ALSO"
> +watchdog(8)
> diff --git a/cman/scripts/Makefile b/cman/scripts/Makefile
> new file mode 100644
> index 0000000..b4866c8
> --- /dev/null
> +++ b/cman/scripts/Makefile
> @@ -0,0 +1,10 @@
> +SHAREDIRTEX=checkquorum
> +
> +include ../../make/defines.mk
> +include $(OBJDIR)/make/clean.mk
> +include $(OBJDIR)/make/install.mk
> +include $(OBJDIR)/make/uninstall.mk
> +
> +all:
> +
> +clean: generalclean
> diff --git a/cman/scripts/checkquorum b/cman/scripts/checkquorum
> new file mode 100755
> index 0000000..43cbc6d
> --- /dev/null
> +++ b/cman/scripts/checkquorum
> @@ -0,0 +1,97 @@
> +#!/usr/bin/perl -w
> +# Quorum detection watchdog script
> +#
> +# This script will return -2 if the node had quorum at one point
> +# and then subsequently lost it
> +#
> +# Copyright 2011 Red Hat, Inc.
> +
> +# Amount of time in seconds to wait after quorum is lost to fail script
> +$wait_time = 60;
> +
> +# Hard Reboot the system (doesn't cleanly shut down the system)
> +$hardreboot = 0;
> +
> +# Location of temporary file to capture timeouts
> +$timerfile = "/var/run/cluster/checkquorum-timer";
> +
> +# Enable debug messages (0 to disable, 1 to enable)
> +$debugval = 0;
> +
> +# If command is called attempting to 'repair' we automatically fail
> +if (($#ARGV != -1) && ($ARGV[0] eq "repair")) {
> + debug ("Failing on repair\n");
> + exit 1;
> +}
> +
> +if (!quorum()) {
> + if (has_quorum_already_been_formed()) {
> + debug("Quorum has already existed, node can be self fenced!\n");
> + if (-e $timerfile) {
> + $tf = open (FILE, "$timerfile");
> + $time = <FILE>;
> + close (FILE);
> + $timediff = time() - $time;
> + if ($timediff >= $wait_time) {
> + self_fence()
> + } else {
> + $remaining = $wait_time - $timediff;
> + debug("Time has not exceeded wait time ($remaining seconds remaining).\n");
> + }
> + } else {
> + debug("Creating timer file...\n");
> + $tf = open (FILE, ">$timerfile");
> + print FILE time();
> + close (FILE);
> + }
> + } else {
> + debug("This is a new startup no self-fencing will occur.\n");
> + `rm -f $timerfile`;
> + }
> +} else {
> + debug("Quorum exists, no self-fencing should occur.\n");
> + `rm -f $timerfile`;
> +}
> +
> +sub has_quorum_already_been_formed {
> + $oe = `/usr/sbin/corosync-objctl 2>&1 | grep -E "runtime.totem.pg.mrp.srp.operational_entered|Could not initialize objdb library|Cannot connect to quorum service" `;
> + if ($oe =~ /^Could not/ || $oe =~ /^Cannot/) {
> + debug("corosync is not running\n");
> + exit 0;
> + }
> + $oe =~ s/.*=//;
> + if ($oe > 1) {
> + return 1;
> + } else {
> + return 0;
> + }
> +}
> +
> +sub quorum {
> + $cq = `corosync-quorumtool -s 2>&1 | grep -E "Quorate:|Cannot connect to quorum service"`;
> + if ($cq =~ /Cannot connect to quorum service/) {
> + debug("corosync is not running\n");
> + exit 0;
> + }
> + $cq =~ s/Quorate: *//;
> + chomp ($cq);
> + return 1 if ($cq eq "Yes");
> + return 0;
> +}
> +
> +sub self_fence {
> + debug("Self fencing commencing...\n");
> + `rm -f $timerfile`;
> + if ($hardreboot == 1) {
> + `echo 1 > /proc/sys/kernel/sysrq`;
> + `echo b > /proc/sysrq-trigger`;
> + }
> + exit -2;
> +}
> +
> +sub debug {
> + $out = pop(@_);
> + if ($debugval) {
> + print $out;
> + }
> +}
> diff --git a/make/install.mk b/make/install.mk
> index 3f23bca..fa6ac92 100644
> --- a/make/install.mk
> +++ b/make/install.mk
> @@ -66,6 +66,10 @@ ifdef PKGCONF
> install -d ${pkgconfigdir}
> install -m644 ${PKGCONF} ${pkgconfigdir}
> endif
> +ifdef SHAREDIRTEX
> + install -d ${sharedir}
> + install -m755 ${SHAREDIRTEX} ${sharedir}
> +endif
> ifdef SHAREDIRT
> install -d ${sharedir}
> install -m644 ${SHAREDIRT} ${sharedir}
>
next prev parent reply other threads:[~2011-02-02 7:42 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20110201202222.34399120198@lists.fedorahosted.org>
2011-02-01 21:13 ` [Cluster-devel] [patch] cman: Added checkquorum script for self fencing Chris Feist
2011-02-02 7:42 ` Fabio M. Di Nitto [this message]
2011-02-02 23:00 ` Lon Hohberger
2011-02-03 6:15 ` Fabio M. Di Nitto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D490AF1.3000104@redhat.com \
--to=fdinitto@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).