cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Steven Whitehouse <swhiteho@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] gfs2_utils: Add gfs2_lockgather data gathering script
Date: Tue, 10 Jan 2012 09:24:44 +0000	[thread overview]
Message-ID: <1326187485.2717.0.camel@menhir> (raw)
In-Reply-To: <42079BC4-5241-40D8-B30A-BB4C5B874B32@redhat.com>

Hi,

Looks good to me. ACK. Do you need one of us to apply this or are you
able to do it directly?

Steve.

On Mon, 2012-01-09 at 17:52 -0500, Adam Drew wrote:
> I wrote a simple data gathering script for GFS2 called gfs2_lockgather. It should help in situations where data about a possible locking or performance issue involving GFS2 is required. It gathers system information, DLM data, glock data, and thread dumps. The data gather can be run on a single node or a single node can run it on all nodes. The data gathered is quite good for diagnosing performance and locking issues.
> 
> - Adam
> 
> diff --git a/configure.ac b/configure.ac
> index 81ffad8..3fe1a49 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -285,6 +285,7 @@ AC_CONFIG_FILES([Makefile
> 		 gfs2/tool/Makefile
> 		 gfs2/tune/Makefile
> 		 gfs2/man/Makefile
> +		 gfs2/lockgather/Makefile
> 		 doc/Makefile
> 		 po/Makefile.in
> 		 ])
> diff --git a/gfs2/Makefile.am b/gfs2/Makefile.am
> index 9116bd3..08e59c4 100644
> --- a/gfs2/Makefile.am
> +++ b/gfs2/Makefile.am
> @@ -1,4 +1,4 @@
> MAINTAINERCLEANFILES	= Makefile.in
> 
> SUBDIRS			= libgfs2 convert edit fsck mkfs mount quota tool man \
> -			  tune include #init.d 
> +			  tune include lockgather #init.d 
> diff --git a/gfs2/lockgather/Makefile.am b/gfs2/lockgather/Makefile.am
> new file mode 100644
> index 0000000..fe8b480
> --- /dev/null
> +++ b/gfs2/lockgather/Makefile.am
> @@ -0,0 +1,12 @@
> +MAINTAINERCLEANFILES    = Makefile.in
> +
> +# When an exec_prefix setting would have us install into /usr/sbin,
> +# use /sbin instead.
> +# Accept an existing sbindir value of /usr/sbin (probably for older automake),
> +# or an empty value, for automake-1.11 and newer.
> +sbindir := $(shell rpl=0; test '$(exec_prefix):$(sbindir)' = /usr:/usr/sbin \
> +                        || test '$(exec_prefix):$(sbindir)' = /usr: && rpl=1; \
> +                                     test $$rpl = 1 && echo /sbin || echo '$(exec_prefix)/sbin')
> +
> +
> +dist_sbin_SCRIPTS           = gfs2_lockgather
> diff --git a/gfs2/lockgather/gfs2_lockgather b/gfs2/lockgather/gfs2_lockgather
> new file mode 100644
> index 0000000..ed4a0c5
> --- /dev/null
> +++ b/gfs2/lockgather/gfs2_lockgather
> @@ -0,0 +1,129 @@
> +#!/bin/bash
> +
> +#    gfs2_lockgather - A script that gathers data for diagnosing GFS2 locking issues
> +#    Copyright 2012 Adam Drew <adrew@redhat.com>
> +
> +#    This program is free software: you can redistribute it and/or modify
> +#    it under the terms of the GNU General Public License as published by
> +#    the Free Software Foundation, either version 3 of the License, or
> +#    (at your option) any later version.
> +
> +#    This program is distributed in the hope that it will be useful,
> +#    but WITHOUT ANY WARRANTY; without even the implied warranty of
> +#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +#    GNU General Public License for more details.
> +
> +#    You should have received a copy of the GNU General Public License
> +#    along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +
> +QUIET=false
> +
> +#Handle arguments
> +for var in "$@"
> +do
> +	#Handle running on all nodes
> +	if [ $var == "--allnodes" ] ||  [ $var == "-a" ] ; then
> +
> +    		for node in $(ccs_tool lsnode | tail --lines=+5 | grep -v "Cluster name" | grep -v "Nodename" | awk '{print $1}') ; do
> +        		#We gather via SSH on all nodes, even the local node
> +        		#We do this becuase determining which node name is the 
> +        		#node running the script is too much logic to be worth it
> +        		echo "Starting data gathering on $node..." 
> +			ssh  -q -f  root@$node '/sbin/gfs2_lockgather -q' 
> +			echo "gfs2_lockgather will log a message in /var/log/messages on $node when complete or if there is an error."
> +   	 	done	
> +		exit 0
> +	fi
> +
> +	#Handle quiet mode
> +	if [ $var == "-q" ] || [ $var == "--quiet" ] ; then
> +		QUIET=true
> +	fi
> +	
> +	#Handle help request
> +	if [ $var == "--help" ] || [ $var == "--info" ] || [ $var == "-h" ] ; then
> +
> +		echo "gfs2_lockgather, version 1"
> +		echo "A script that gathers data for diagnosing GFS2 locking issues."
> +		echo "---------------------------------------------------------------"
> +		echo "To gather on a single node invoke the script with no arguments."
> +		echo "To see this message use --help, --info, or -h."
> +		echo "To run with messages supressed use --quiet or -q."
> +		echo "To gather on all nodes invoke the script with --allnodes or -a."
> +		echo "Only 1 instance of gfs2_lockgather may run on a node at a time."
> +		echo ""
> +		exit 0
> +	fi
> +	
> +done
> +
> +#Check for the lock file. We only want one instance running at a time.
> +if [ -e /var/run/gfs2_lockgather.lock ]; then
> +	echo -ne 'Error: Lock file /var/run/gfs2_lockgather.lock found.\nAnother instance of gfs2_lockgather may be running.\nAnother node may be running a gather on this node.\n' 
> +	logger -t gfs2_lockgather 'Error: Lock file /var/run/gfs2_lockgather.lock found. Another instance may be running. Quitting.'
> +        exit 1
> +fi
> +
> +#Create the gather lock
> +touch  /var/run/gfs2_lockgather.lock 
> +
> +logger -t gfs2_lockgather 'Gather started.'
> +
> +if [ $QUIET == false ] ; then echo -ne '[       ]  Setting up for gather.\t\t\t\t\t\t\t\t\r' ; fi
> +#Get the current datetime for unique naming
> +DATETIME=$(date +%m%d%Y-%H%M%S)
> + 
> +#Set up the directory structure
> +mkdir /tmp/debugfs
> +mount -t debugfs none /tmp/debugfs
> +mkdir /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata
> +mkdir /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/run1
> +mkdir /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/run2
> +
> +if [ $QUIET == false ] ; then echo -ne '[#      ]  Gathering environment data.\t\t\t\t\t\t\t\t\r'  ; fi
> +#Gather some basics
> +clustat > /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/clustat.out
> +cman_tool services > /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/clustat.out
> +mount -l > /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/mount-l.out
> +ps aux > /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/ps-aux.out
> +uname -a > /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/uname-a.out
> +
> +if [ $QUIET == false ] ; then echo -ne '[##     ]  Gathering GFS2 and DLM lock data: pass 1\t\t\t\t\t\t\t\t\r'  ; fi
> +#Glock and DLM lock dump 1
> +for dlmfile in $(ls -lsv /tmp/debugfs/dlm/ | grep -v total | awk '{print $10}') ; do dd if=/tmp/debugfs/dlm/$dlmfile bs=1024M of=/tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/run1/$dlmfile &> /dev/null; done
> +for fs in $(ls -lsv /tmp/debugfs/gfs2/ | grep -v total | awk '{print $10}') ; do dd if=/tmp/debugfs/gfs2/$fs/glocks bs=1024M of=/tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/run1/$fs-glocks &> /dev/null; done
> +    
> +#Enable and trigger sysrq
> +echo 1 > /proc/sys/kernel/sysrq
> + 
> +#Thread Dump
> +#This is much faster than waiting for syslog to dump the thread dumps to the messages log
> +if [ $QUIET == false ] ; then echo -ne '[###    ]  Gathering thread dumps.\t\t\t\t\t\t\t\t\r'  ; fi
> +
> +$(
> +cat /proc/kmsg > /tmp/thread-dumps &
> +echo 't' > /proc/sysrq-trigger
> +sleep 10
> +kill -9 $!
> +)
> +
> +if [ $QUIET == false ] ; then echo -ne '[####   ]  Gathering GFS2 and DLM lock data: pass 2.\t\t\t\t\t\t\t\t\r' ; fi
> +#Glock and DLM dump 2
> +for dlmfile in $(ls -lsv /tmp/debugfs/dlm/ | grep -v total | awk '{print $10}') ; do dd if=/tmp/debugfs/dlm/$dlmfile bs=1024M of=/tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/run2/$dlmfile &> /dev/null; done
> +for fs in $(ls -lsv /tmp/debugfs/gfs2/ | grep -v total | awk '{print $10}') ; do dd if=/tmp/debugfs/gfs2/$fs/glocks bs=1024M of=/tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/run2/$fs-glocks &> /dev/null; done
> +
> +if [ $QUIET == false ] ; then echo -ne '[#####  ]  Gathering messages logs\t\t\t\t\t\t\t\t\r' ; fi
> +#Get the messages log file
> +cp /var/log/messages /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/
> +
> +#Tar up the results and clean up temporary files    
> +if [ $QUIET == false ] ; then echo -ne '[###### ]   Cleaning up... 80%.\t\t\t\t\t\t\t\t\r' ; fi
> +tar cjf /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata.tar.bz /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata/ &> /dev/null
> +umount /tmp/debugfs/
> +rm -f  /var/run/gfs2_lockgather.lock 
> +rm -rf /tmp/debugfs
> +rm -rf /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata
> +logger -t gfs2_lockgather "Gather completed. File is /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata.tar.bz"  
> +if [ $QUIET == false ] ; then echo -ne "[#######]  Done. File is /tmp/$(hostname)-$(echo $DATETIME)-gfshangdata.tar.bz\r\t\t\t\t\t\t\t\t\r\n"  ; fi
> +exit 0
> diff --git a/gfs2/man/Makefile.am b/gfs2/man/Makefile.am
> index 0f132d6..648ed84 100644
> --- a/gfs2/man/Makefile.am
> +++ b/gfs2/man/Makefile.am
> @@ -9,5 +9,6 @@ dist_man_MANS		= fsck.gfs2.8 \
> 			  gfs2_quota.8 \
> 			  gfs2_tool.8 \
> 			  mkfs.gfs2.8 \
> +			  gfs2_lockgather.8 \
> 			  mount.gfs2.8 \
> 			  tunegfs2.8
> diff --git a/gfs2/man/gfs2_lockgather.8 b/gfs2/man/gfs2_lockgather.8
> new file mode 100644
> index 0000000..3cd8b9c
> --- /dev/null
> +++ b/gfs2/man/gfs2_lockgather.8
> @@ -0,0 +1,26 @@
> +.TH gfs2_lockgather 8
> +
> +.SH NAME
> +gfs2_lockgather - Gathers data for diagnosing GFS2 locking issues
> +
> +.SH SYNOPSIS
> +.B gfs2_lockgather
> +[\fIOPTIONS\fR]
> +
> +.SH DESCRIPTION
> +gfs2_lockgather will gather data that is useful for diagnosing performance and locking issues 
> +involving GFS2 filesystems. The script gathers basic system and cluster data such as rpm output, 
> +kernel version, thread dumps from all processes, and 2 passes of glock and DLM locking data. After
> +the data is gathered it is stored in a tarball under /tmp. The script can be invoked to gather
> +data from a single node, or to gather data from all nodes via ssh.     
> +.SH OPTIONS
> +.TP
> +\fB-h, --help, --info\fP
> +Display help and usage information.
> +.TP
> +\fB-q, --quiet\fP
> +Quiet mode. Run with output supressed.
> +.TP
> +\fB-a, --allnodes\fP
> +Gather data from all nodes via ssh.
> +
> 
> 
> 
> 




      reply	other threads:[~2012-01-10  9:24 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-09 22:52 [Cluster-devel] [PATCH] gfs2_utils: Add gfs2_lockgather data gathering script Adam Drew
2012-01-10  9:24 ` Steven Whitehouse [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1326187485.2717.0.camel@menhir \
    --to=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).