* [Cluster-devel] [patch] cman: Added checkquorum script for self fencing [not found] <20110201202222.34399120198@lists.fedorahosted.org> @ 2011-02-01 21:13 ` Chris Feist 2011-02-02 7:42 ` Fabio M. Di Nitto 0 siblings, 1 reply; 4+ messages in thread From: Chris Feist @ 2011-02-01 21:13 UTC (permalink / raw) To: cluster-devel.redhat.com cman: Added checkquorum script for self fencing A checkquorum script has been added which when copied to the /etc/watchdog.d directory will cause the machine to node to reboot itself if it has lost quorum for ~60 seconds. Resolves: rhbz#560700 --- cman/Makefile | 2 +- cman/man/Makefile | 3 +- cman/man/checkquorum.8 | 29 ++++++++++++++ cman/scripts/Makefile | 10 +++++ cman/scripts/checkquorum | 97 ++++++++++++++++++++++++++++++++++++++++++++++ make/install.mk | 4 ++ 6 files changed, 143 insertions(+), 2 deletions(-) diff --git a/cman/Makefile b/cman/Makefile index ead0baa..1cf8bc9 100644 --- a/cman/Makefile +++ b/cman/Makefile @@ -1,4 +1,4 @@ include ../make/defines.mk include $(OBJDIR)/make/passthrough.mk -SUBDIRS=lib cman_tool daemon qdisk notifyd init.d man +SUBDIRS=lib cman_tool daemon qdisk notifyd init.d man scripts diff --git a/cman/man/Makefile b/cman/man/Makefile index df20abb..f7fbebf 100644 --- a/cman/man/Makefile +++ b/cman/man/Makefile @@ -5,7 +5,8 @@ MANTARGET= \ qdiskd.8 \ mkqdisk.8 \ cmannotifyd.8 \ - cman_notify.8 + cman_notify.8 \ + checkquorum.8 include ../../make/defines.mk include $(OBJDIR)/make/install.mk diff --git a/cman/man/checkquorum.8 b/cman/man/checkquorum.8 new file mode 100644 index 0000000..96f61f0 --- /dev/null +++ b/cman/man/checkquorum.8 @@ -0,0 +1,29 @@ +.TH "checkquorum" "8" "February 2011" "" "Check Quorum Watchdog Script" +.SH "NAME" +checkquorum \- Check Quorum Watchdog Script +.SH "SYNOPSIS" +\fBcheckquorum +.SH "DESCRIPTION" +.PP +The \fBcheckquorum\fP watchdog script, when copied to the +.IR /etc/watchdog.d +directory and after enabling/starting the watchdog daemon causes the node to reboot if quorum is +lost and not regained within a user configurable amount of time (default: 60 seconds). +.SH "OPTIONS" +The checkquorum script includes several options which can be set by editing +the script with a text editor. +.TP +.BR $wait_time +Amount of time in seconds to wait after quorum is lost before trigger a reboot +(Default: 60 seconds). +.TP +.BR $hardreboot +Instantly reboot the machine without cleanly shutting down the system. +Useful when the machine may hang on reboot. Set to 1 to hard reboot the +system, 0 to do a normal reboot. +.SH "NOTES" +\fBcheckquorum\fP should never be called outside of watchdog except for +debugging purposes. + +.SH "SEE ALSO" +watchdog(8) diff --git a/cman/scripts/Makefile b/cman/scripts/Makefile new file mode 100644 index 0000000..b4866c8 --- /dev/null +++ b/cman/scripts/Makefile @@ -0,0 +1,10 @@ +SHAREDIRTEX=checkquorum + +include ../../make/defines.mk +include $(OBJDIR)/make/clean.mk +include $(OBJDIR)/make/install.mk +include $(OBJDIR)/make/uninstall.mk + +all: + +clean: generalclean diff --git a/cman/scripts/checkquorum b/cman/scripts/checkquorum new file mode 100755 index 0000000..43cbc6d --- /dev/null +++ b/cman/scripts/checkquorum @@ -0,0 +1,97 @@ +#!/usr/bin/perl -w +# Quorum detection watchdog script +# +# This script will return -2 if the node had quorum at one point +# and then subsequently lost it +# +# Copyright 2011 Red Hat, Inc. + +# Amount of time in seconds to wait after quorum is lost to fail script +$wait_time = 60; + +# Hard Reboot the system (doesn't cleanly shut down the system) +$hardreboot = 0; + +# Location of temporary file to capture timeouts +$timerfile = "/var/run/cluster/checkquorum-timer"; + +# Enable debug messages (0 to disable, 1 to enable) +$debugval = 0; + +# If command is called attempting to 'repair' we automatically fail +if (($#ARGV != -1) && ($ARGV[0] eq "repair")) { + debug ("Failing on repair\n"); + exit 1; +} + +if (!quorum()) { + if (has_quorum_already_been_formed()) { + debug("Quorum has already existed, node can be self fenced!\n"); + if (-e $timerfile) { + $tf = open (FILE, "$timerfile"); + $time = <FILE>; + close (FILE); + $timediff = time() - $time; + if ($timediff >= $wait_time) { + self_fence() + } else { + $remaining = $wait_time - $timediff; + debug("Time has not exceeded wait time ($remaining seconds remaining).\n"); + } + } else { + debug("Creating timer file...\n"); + $tf = open (FILE, ">$timerfile"); + print FILE time(); + close (FILE); + } + } else { + debug("This is a new startup no self-fencing will occur.\n"); + `rm -f $timerfile`; + } +} else { + debug("Quorum exists, no self-fencing should occur.\n"); + `rm -f $timerfile`; +} + +sub has_quorum_already_been_formed { + $oe = `/usr/sbin/corosync-objctl 2>&1 | grep -E "runtime.totem.pg.mrp.srp.operational_entered|Could not initialize objdb library|Cannot connect to quorum service" `; + if ($oe =~ /^Could not/ || $oe =~ /^Cannot/) { + debug("corosync is not running\n"); + exit 0; + } + $oe =~ s/.*=//; + if ($oe > 1) { + return 1; + } else { + return 0; + } +} + +sub quorum { + $cq = `corosync-quorumtool -s 2>&1 | grep -E "Quorate:|Cannot connect to quorum service"`; + if ($cq =~ /Cannot connect to quorum service/) { + debug("corosync is not running\n"); + exit 0; + } + $cq =~ s/Quorate: *//; + chomp ($cq); + return 1 if ($cq eq "Yes"); + return 0; +} + +sub self_fence { + debug("Self fencing commencing...\n"); + `rm -f $timerfile`; + if ($hardreboot == 1) { + `echo 1 > /proc/sys/kernel/sysrq`; + `echo b > /proc/sysrq-trigger`; + } + exit -2; +} + +sub debug { + $out = pop(@_); + if ($debugval) { + print $out; + } +} diff --git a/make/install.mk b/make/install.mk index 3f23bca..fa6ac92 100644 --- a/make/install.mk +++ b/make/install.mk @@ -66,6 +66,10 @@ ifdef PKGCONF install -d ${pkgconfigdir} install -m644 ${PKGCONF} ${pkgconfigdir} endif +ifdef SHAREDIRTEX + install -d ${sharedir} + install -m755 ${SHAREDIRTEX} ${sharedir} +endif ifdef SHAREDIRT install -d ${sharedir} install -m644 ${SHAREDIRT} ${sharedir} ^ permalink raw reply related [flat|nested] 4+ messages in thread
* [Cluster-devel] [patch] cman: Added checkquorum script for self fencing 2011-02-01 21:13 ` [Cluster-devel] [patch] cman: Added checkquorum script for self fencing Chris Feist @ 2011-02-02 7:42 ` Fabio M. Di Nitto 2011-02-02 23:00 ` Lon Hohberger 0 siblings, 1 reply; 4+ messages in thread From: Fabio M. Di Nitto @ 2011-02-02 7:42 UTC (permalink / raw) To: cluster-devel.redhat.com Both patches are ACK. I have only one simple question and far from being a problem. We ship this script as example basically, since we don?t install in /etc/watchdog.d automatically. I think it would be best to move it to the doc section since we require users to copy and edit the parameters. Tho it also raises the question that you could simply symlink it from watchdog.d if it?s in sharedir/cluster since it?s location would be fixed (vs doc changes with versioning of the package). Fabio On 2/1/2011 10:13 PM, Chris Feist wrote: > cman: Added checkquorum script for self fencing > > A checkquorum script has been added which when copied to the > /etc/watchdog.d directory will cause the machine to node to reboot > itself if it has lost quorum for ~60 seconds. > > Resolves: rhbz#560700 > --- > cman/Makefile | 2 +- > cman/man/Makefile | 3 +- > cman/man/checkquorum.8 | 29 ++++++++++++++ > cman/scripts/Makefile | 10 +++++ > cman/scripts/checkquorum | 97 ++++++++++++++++++++++++++++++++++++++++++++++ > make/install.mk | 4 ++ > 6 files changed, 143 insertions(+), 2 deletions(-) > > diff --git a/cman/Makefile b/cman/Makefile > index ead0baa..1cf8bc9 100644 > --- a/cman/Makefile > +++ b/cman/Makefile > @@ -1,4 +1,4 @@ > include ../make/defines.mk > include $(OBJDIR)/make/passthrough.mk > > -SUBDIRS=lib cman_tool daemon qdisk notifyd init.d man > +SUBDIRS=lib cman_tool daemon qdisk notifyd init.d man scripts > diff --git a/cman/man/Makefile b/cman/man/Makefile > index df20abb..f7fbebf 100644 > --- a/cman/man/Makefile > +++ b/cman/man/Makefile > @@ -5,7 +5,8 @@ MANTARGET= \ > qdiskd.8 \ > mkqdisk.8 \ > cmannotifyd.8 \ > - cman_notify.8 > + cman_notify.8 \ > + checkquorum.8 > > include ../../make/defines.mk > include $(OBJDIR)/make/install.mk > diff --git a/cman/man/checkquorum.8 b/cman/man/checkquorum.8 > new file mode 100644 > index 0000000..96f61f0 > --- /dev/null > +++ b/cman/man/checkquorum.8 > @@ -0,0 +1,29 @@ > +.TH "checkquorum" "8" "February 2011" "" "Check Quorum Watchdog Script" > +.SH "NAME" > +checkquorum \- Check Quorum Watchdog Script > +.SH "SYNOPSIS" > +\fBcheckquorum > +.SH "DESCRIPTION" > +.PP > +The \fBcheckquorum\fP watchdog script, when copied to the > +.IR /etc/watchdog.d > +directory and after enabling/starting the watchdog daemon causes the node to reboot if quorum is > +lost and not regained within a user configurable amount of time (default: 60 seconds). > +.SH "OPTIONS" > +The checkquorum script includes several options which can be set by editing > +the script with a text editor. > +.TP > +.BR $wait_time > +Amount of time in seconds to wait after quorum is lost before trigger a reboot > +(Default: 60 seconds). > +.TP > +.BR $hardreboot > +Instantly reboot the machine without cleanly shutting down the system. > +Useful when the machine may hang on reboot. Set to 1 to hard reboot the > +system, 0 to do a normal reboot. > +.SH "NOTES" > +\fBcheckquorum\fP should never be called outside of watchdog except for > +debugging purposes. > + > +.SH "SEE ALSO" > +watchdog(8) > diff --git a/cman/scripts/Makefile b/cman/scripts/Makefile > new file mode 100644 > index 0000000..b4866c8 > --- /dev/null > +++ b/cman/scripts/Makefile > @@ -0,0 +1,10 @@ > +SHAREDIRTEX=checkquorum > + > +include ../../make/defines.mk > +include $(OBJDIR)/make/clean.mk > +include $(OBJDIR)/make/install.mk > +include $(OBJDIR)/make/uninstall.mk > + > +all: > + > +clean: generalclean > diff --git a/cman/scripts/checkquorum b/cman/scripts/checkquorum > new file mode 100755 > index 0000000..43cbc6d > --- /dev/null > +++ b/cman/scripts/checkquorum > @@ -0,0 +1,97 @@ > +#!/usr/bin/perl -w > +# Quorum detection watchdog script > +# > +# This script will return -2 if the node had quorum at one point > +# and then subsequently lost it > +# > +# Copyright 2011 Red Hat, Inc. > + > +# Amount of time in seconds to wait after quorum is lost to fail script > +$wait_time = 60; > + > +# Hard Reboot the system (doesn't cleanly shut down the system) > +$hardreboot = 0; > + > +# Location of temporary file to capture timeouts > +$timerfile = "/var/run/cluster/checkquorum-timer"; > + > +# Enable debug messages (0 to disable, 1 to enable) > +$debugval = 0; > + > +# If command is called attempting to 'repair' we automatically fail > +if (($#ARGV != -1) && ($ARGV[0] eq "repair")) { > + debug ("Failing on repair\n"); > + exit 1; > +} > + > +if (!quorum()) { > + if (has_quorum_already_been_formed()) { > + debug("Quorum has already existed, node can be self fenced!\n"); > + if (-e $timerfile) { > + $tf = open (FILE, "$timerfile"); > + $time = <FILE>; > + close (FILE); > + $timediff = time() - $time; > + if ($timediff >= $wait_time) { > + self_fence() > + } else { > + $remaining = $wait_time - $timediff; > + debug("Time has not exceeded wait time ($remaining seconds remaining).\n"); > + } > + } else { > + debug("Creating timer file...\n"); > + $tf = open (FILE, ">$timerfile"); > + print FILE time(); > + close (FILE); > + } > + } else { > + debug("This is a new startup no self-fencing will occur.\n"); > + `rm -f $timerfile`; > + } > +} else { > + debug("Quorum exists, no self-fencing should occur.\n"); > + `rm -f $timerfile`; > +} > + > +sub has_quorum_already_been_formed { > + $oe = `/usr/sbin/corosync-objctl 2>&1 | grep -E "runtime.totem.pg.mrp.srp.operational_entered|Could not initialize objdb library|Cannot connect to quorum service" `; > + if ($oe =~ /^Could not/ || $oe =~ /^Cannot/) { > + debug("corosync is not running\n"); > + exit 0; > + } > + $oe =~ s/.*=//; > + if ($oe > 1) { > + return 1; > + } else { > + return 0; > + } > +} > + > +sub quorum { > + $cq = `corosync-quorumtool -s 2>&1 | grep -E "Quorate:|Cannot connect to quorum service"`; > + if ($cq =~ /Cannot connect to quorum service/) { > + debug("corosync is not running\n"); > + exit 0; > + } > + $cq =~ s/Quorate: *//; > + chomp ($cq); > + return 1 if ($cq eq "Yes"); > + return 0; > +} > + > +sub self_fence { > + debug("Self fencing commencing...\n"); > + `rm -f $timerfile`; > + if ($hardreboot == 1) { > + `echo 1 > /proc/sys/kernel/sysrq`; > + `echo b > /proc/sysrq-trigger`; > + } > + exit -2; > +} > + > +sub debug { > + $out = pop(@_); > + if ($debugval) { > + print $out; > + } > +} > diff --git a/make/install.mk b/make/install.mk > index 3f23bca..fa6ac92 100644 > --- a/make/install.mk > +++ b/make/install.mk > @@ -66,6 +66,10 @@ ifdef PKGCONF > install -d ${pkgconfigdir} > install -m644 ${PKGCONF} ${pkgconfigdir} > endif > +ifdef SHAREDIRTEX > + install -d ${sharedir} > + install -m755 ${SHAREDIRTEX} ${sharedir} > +endif > ifdef SHAREDIRT > install -d ${sharedir} > install -m644 ${SHAREDIRT} ${sharedir} > ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [patch] cman: Added checkquorum script for self fencing 2011-02-02 7:42 ` Fabio M. Di Nitto @ 2011-02-02 23:00 ` Lon Hohberger 2011-02-03 6:15 ` Fabio M. Di Nitto 0 siblings, 1 reply; 4+ messages in thread From: Lon Hohberger @ 2011-02-02 23:00 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, 2011-02-02 at 08:42 +0100, Fabio M. Di Nitto wrote: > Both patches are ACK. > > I have only one simple question and far from being a problem. We ship > this script as example basically, since we don?t install in > /etc/watchdog.d automatically. I think it would be best to move it to > the doc section since we require users to copy and edit the parameters. > Tho it also raises the question that you could simply symlink it from > watchdog.d if it?s in sharedir/cluster since it?s location would be > fixed (vs doc changes with versioning of the package). If packaging in CentOS, you will need watchdog >= 5.5-8 or else /etc/watchdog.d won't exist. -- Lon ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [patch] cman: Added checkquorum script for self fencing 2011-02-02 23:00 ` Lon Hohberger @ 2011-02-03 6:15 ` Fabio M. Di Nitto 0 siblings, 0 replies; 4+ messages in thread From: Fabio M. Di Nitto @ 2011-02-03 6:15 UTC (permalink / raw) To: cluster-devel.redhat.com On 02/03/2011 12:00 AM, Lon Hohberger wrote: > On Wed, 2011-02-02 at 08:42 +0100, Fabio M. Di Nitto wrote: >> Both patches are ACK. >> >> I have only one simple question and far from being a problem. We ship >> this script as example basically, since we don?t install in >> /etc/watchdog.d automatically. I think it would be best to move it to >> the doc section since we require users to copy and edit the parameters. >> Tho it also raises the question that you could simply symlink it from >> watchdog.d if it?s in sharedir/cluster since it?s location would be >> fixed (vs doc changes with versioning of the package). > > If packaging in CentOS, you will need watchdog >= 5.5-8 or > else /etc/watchdog.d won't exist. Let's make that a release note for upstream. We don't package Centos and they pull in from our repo directly. Fabio ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-02-03 6:15 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20110201202222.34399120198@lists.fedorahosted.org> 2011-02-01 21:13 ` [Cluster-devel] [patch] cman: Added checkquorum script for self fencing Chris Feist 2011-02-02 7:42 ` Fabio M. Di Nitto 2011-02-02 23:00 ` Lon Hohberger 2011-02-03 6:15 ` Fabio M. Di Nitto
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).