From: Andrew Price <anprice@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] gfs2_lockcapture: Added option to disable process data gathering, added gathering of dlm_tool lockdebug, df, lsof, DLM hash table sizes.
Date: Thu, 06 Jun 2013 10:09:00 +0100 [thread overview]
Message-ID: <51B051AC.7030401@redhat.com> (raw)
In-Reply-To: <1370461752-18653-1-git-send-email-sbradley@redhat.com>
Hi Shane,
On 05/06/13 20:49, sbradley at redhat.com wrote:
> From: Shane Bradley <sbradley@redhat.com>
>
> The script no longer requires GFS2 mounts to capture data which allows the
> capturing of dlm data without having a GFS2 mount. Added -P option so that
> process gathering can be disabled. The following commands will have their
> output saved: dlm_tool lockdebug, df -h, lsof, and contents of
> /sys/kernel/config/dlm/cluster/*_size. The -t option was removed and all
> output directories are .tar.bz2. The man page was updated with list of all
> the files or command outputs that will be in the output directory.
>
> Signed-off-by: Shane Bradley <sbradley@redhat.com>
I've pushed your patch with some tweaks to make the shortlog short and
to tidy up some language in the man page a bit.
Thanks,
Andy
> ---
> gfs2/man/gfs2_lockcapture.8 | 85 +++++++---
> gfs2/scripts/gfs2_lockcapture | 366 ++++++++++++++++++++++++++++++++----------
> 2 files changed, 347 insertions(+), 104 deletions(-)
>
> diff --git a/gfs2/man/gfs2_lockcapture.8 b/gfs2/man/gfs2_lockcapture.8
> index acd9113..0f2fd9a 100644
> --- a/gfs2/man/gfs2_lockcapture.8
> +++ b/gfs2/man/gfs2_lockcapture.8
> @@ -1,22 +1,23 @@
> .TH gfs2_lockcapture 8
>
> .SH NAME
> -gfs2_lockcapture \- will capture locking information from GFS2 file systems and DLM.
> +gfs2_lockcapture \- will capture locking information from GFS2 file-systems and DLM.
>
> .SH SYNOPSIS
> -.B gfs2_lockcapture \fR[-dqyt] [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 filesystem]\fP
> +.B gfs2_lockcapture \fR[-dqyP] [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 file-system]\fP
> .PP
> .B gfs2_lockcapture \fR[-dqyi]
>
> .SH DESCRIPTION
> \fIgfs2_lockcapture\fR is used to capture all the GFS2 lockdump data and
> -corresponding DLM data. The command can be configured to capture the data
> +corresponding DLM data for GFS2 file-systems. The command can be configured to capture the data
> multiple times and how much time to sleep between each iteration of capturing
> -the data. By default all of the mounted GFS2 filesystems will have their data
> -collected unless GFS2 filesystems are specified.
> +the data. By default all of the mounted GFS2 file-systems will have their data
> +collected unless GFS2 file-systems are specified.
> .PP
> -Please note that sysrq -t and -m events are trigger or the pid directories in /proc are
> -collected on each iteration of capturing the data.
> +Please note that sysrq -t(thread) and -m(memory) events are trigger or the
> +pid directories in /proc are collected on each iteration of capturing the
> +data unless they are disabled with the -P option.
>
> .SH OPTIONS
> .TP
> @@ -24,31 +25,79 @@ collected on each iteration of capturing the data.
> Prints out a short usage message and exits.
> .TP
> \fB-d, --debug\fP
> -enables debug logging.
> +Enables debug logging.
> .TP
> \fB-q, --quiet\fP
> -disables logging to console.
> +Disables logging to console.
> .TP
> \fB-y, --no_ask\fP
> -disables all questions and assumes yes.
> +Disables all questions and assumes yes.
> .TP
> \fB-i, --info\fP
> -prints information about the mounted GFS2 file systems.
> +Prints information about the mounted GFS2 file-systems.
> .TP
> -\fB-t, --archive\fP
> -the output directory will be archived(tar) and compressed(.bz2).
> +\fB-P, --disable_process_gather\fP
> +The gathering of process information will be disabled.
> .TP
> \fB-o \fI<output directory>, \fB--path_to_output_dir\fR=\fI<output directory>\fP
> -the directory where all the collect data will stored.
> +The directory where all the collect data will stored.
> .TP
> \fB-r \fI<number of runs>, \fB--num_of_runs\fR=\fI<number of runs>\fP
> -number of runs capturing the lockdump data.
> +The number of runs capturing the lockdump data. The default is 3 runs.
> .TP
> \fB-s \fI<seconds to sleep>, \fB--seconds_sleep\fR=\fI<seconds to sleep>\fP
> -number of seconds to sleep between runs of capturing the lockdump data.
> +The number of seconds to sleep between runs of capturing the lockdump data. The default is 120 seconds.
> .TP
> \fB-n \fI<name of GFS2 filesystem>, \fB--fs_name\fR=\fI<name of GFS2 filesystem>\fP
> -name of the GFS2 filesystem(s) that will have their lockdump data captured.
> +The name of the GFS2 filesystem(s) that will have their lockdump data captured. By default, all mounted GFS2 file-systems will have their data captured.
> .
> +.SH NOTES
> +The following commands will be ran when capturing the data:
> +.IP \(bu 2
> +uname -a
> +.IP \(bu 2
> +uptime
> +.IP \(bu 2
> +ps h -AL -o "tid,s,cmd"
> +.IP \(bu 2
> +df -h
> +.IP \(bu 2
> +lsof
> +.IP \(bu 2
> +mount -l
> +.IP \(bu 2
> +dlm_tool ls
> +.IP \(bu 2
> +dlm_tool lockdebug -v -s -w <lockspace name>
> +.IP \(bu 2
> +echo "t" > /proc/sysrq-trigger (If /proc/1/stack does not exist)
> +.IP \(bu 2
> +echo "m" > /proc/sysrq-trigger (If /proc/1/stack does not exist)
> +
> +.SH AUTHOR
> +.nf
> +Shane Bradley <sbradley@fedoraproject.org>
> +.fi
> +.SH FILES
> +.I /proc/mounts
> +.br
> +.I /proc/slabinfo
> +.br
> +.I /sys/kernel/config/dlm/cluster/lkbtbl_size
> +.br
> +.I /sys/kernel/config/dlm/cluster/dirtbl_size
> +.br
> +.I /sys/kernel/config/dlm/cluster/rsbtbl_size
> +.br
> +.I /sys/kernel/debug/gfs2/
> +.br
> +.I /sys/kernel/debug/dlm/
> +.br
> +.I /proc/<int>/
> +(If /proc/1/stack does exists)
> +.br
> +.I /var/log/messages
> +.br
> +.I /var/log/cluster/
> +.br
> .SH SEE ALSO
> -gfs2_lockanalyze(8)
> diff --git a/gfs2/scripts/gfs2_lockcapture b/gfs2/scripts/gfs2_lockcapture
> index 6a63fc8..81a0aeb 100644
> --- a/gfs2/scripts/gfs2_lockcapture
> +++ b/gfs2/scripts/gfs2_lockcapture
> @@ -1,6 +1,6 @@
> #!/usr/bin/env python
> """
> -The script gfs2_lockcapture will capture locking information from GFS2 file
> +The script "gfs2_lockcapture" will capture locking information from GFS2 file
> systems and DLM.
>
> @author : Shane Bradley
> @@ -12,6 +12,7 @@ import sys
> import os
> import os.path
> import logging
> +import logging.handlers
> from optparse import OptionParser, Option
> import time
> import platform
> @@ -33,7 +34,7 @@ import tarfile
> sure only 1 instance of this script is running at any time.
> @type PATH_TO_PID_FILENAME: String
> """
> -VERSION_NUMBER = "0.9-3"
> +VERSION_NUMBER = "0.9-7"
> MAIN_LOGGER_NAME = "%s" %(os.path.basename(sys.argv[0]))
> PATH_TO_DEBUG_DIR="/sys/kernel/debug"
> PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
> @@ -43,7 +44,7 @@ PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
> # #####################################################################
> class ClusterNode:
> """
> - This class represents a cluster node that is a current memeber in a cluster.
> + This class represents a cluster node that is a current member in a cluster.
> """
> def __init__(self, clusternodeName, clusternodeID, clusterName, mapOfMountedFilesystemLabels):
> """
> @@ -115,7 +116,7 @@ class ClusterNode:
> mounted GFS2 filesystems. If includeClusterName is False it will only
> return a list of all the mounted GFS2 filesystem names(ex. mygfs2vol1).
>
> - @return: Returns a list of all teh mounted GFS2 filesystem names.
> + @return: Returns a list of all the mounted GFS2 filesystem names.
> @rtype: Array
>
> @param includeClusterName: By default this option is True and will
> @@ -134,6 +135,24 @@ class ClusterNode:
> listOfGFS2MountedFilesystemLabels.append(fsLabelSplit[1])
> return listOfGFS2MountedFilesystemLabels
>
> + def getMountedGFS2FilesystemPaths(self):
> + """
> + Returns a map of all the mounted GFS2 filesystem paths. The key is the
> + GFS2 fs name(clustername:fs name) and value is the mountpoint.
> +
> + @return: Returns a map of all the mounted GFS2 filesystem paths. The key
> + is the GFS2 fs name(clustername:fs name) and value is the mountpoint.
> + Returns a list of all the mounted GFS2 filesystem paths.
> + @rtype: Map
> + """
> + mapOfGFS2MountedFilesystemPaths = {}
> + for fsLabel in self.__mapOfMountedFilesystemLabels.keys():
> + value = self.__mapOfMountedFilesystemLabels.get(fsLabel)
> + mountPoint = value.split("type", 1)[0].split("on")[1]
> + if (len(mountPoint) > 0):
> + mapOfGFS2MountedFilesystemPaths[fsLabel] = mountPoint
> + return mapOfGFS2MountedFilesystemPaths
> +
> # #####################################################################
> # Helper functions.
> # #####################################################################
> @@ -328,7 +347,7 @@ def archiveData(pathToSrcDir):
> message = "A compressed archvied file already exists and will be removed: %s" %(pathToTarFilename)
> logging.getLogger(MAIN_LOGGER_NAME).status(message)
> try:
> - os.remove(PATH_TO_PID_FILENAME)
> + os.remove(pathToTarFilename)
> except IOError:
> message = "There was an error removing the file: %s." %(pathToTarFilename)
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
> @@ -508,6 +527,32 @@ def backupOutputDirectory(pathToOutputDir):
> # existing output directory.
> return (not os.path.exists(pathToOutputDir))
>
> +def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
> + """
> + This function will attempt to mount a filesystem. If the filesystem is
> + already mounted or the filesystem was successfully mounted then True is
> + returned, otherwise False is returned.
> +
> + @return: If the filesystem is already mounted or the filesystem was
> + successfully mounted then True is returned, otherwise False is returned.
> + @rtype: Boolean
> +
> + @param filesystemType: The type of filesystem that will be mounted.
> + @type filesystemType: String
> + @param pathToDevice: The path to the device that will be mounted.
> + @type pathToDevice: String
> + @param pathToMountPoint: The path to the directory that will be used as the
> + mount point for the device.
> + @type pathToMountPoint: String
> + """
> + if (os.path.ismount(PATH_TO_DEBUG_DIR)):
> + return True
> + listOfCommandOptions = ["-t", filesystemType, pathToDevice, pathToMountPoint]
> + if (not runCommand("mount", listOfCommandOptions)):
> + message = "There was an error mounting the filesystem type %s for the device %s to the mount point %s." %(filesystemType, pathToDevice, pathToMountPoint)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return os.path.ismount(PATH_TO_DEBUG_DIR)
> +
> def exitScript(removePidFile=True, errorCode=0):
> """
> This function will cause the script to exit or quit. It will return an error
> @@ -615,6 +660,89 @@ def getClusterNode(listOfGFS2Names):
> else:
> return None
>
> +
> +def getDLMToolDLMLockspaces():
> + """
> + This function returns the names of all the dlm lockspace names found with the
> + command: "dlm_tool ls".
> +
> + @return: A list of all the dlm lockspace names.
> + @rtype: Array
> + """
> + dlmLockspaces = []
> + stdout = runCommandOutput("dlm_tool", ["ls"])
> + if (not stdout == None):
> + stdout = stdout.replace("dlm lockspaces\n", "")
> + dlmToolLSKeys = ["name", "id", "flags", "change", "members"]
> + # Split on newlines
> + stdoutSections = stdout.split("\n\n")
> + for section in stdoutSections:
> + # Create tmp map to hold data
> + dlmToolLSMap = dict.fromkeys(dlmToolLSKeys)
> + lines = section.split("\n")
> + for line in lines:
> + for dlmToolLSKey in dlmToolLSMap.keys():
> + if (line.startswith(dlmToolLSKey)):
> + value = line.replace(dlmToolLSKey, " ", 1).strip().rstrip()
> + dlmToolLSMap[dlmToolLSKey] = value
> + if ((not dlmToolLSMap.get("name") == None) and (not dlmToolLSMap.get("id") == None)):
> + dlmLockspaces.append(dlmToolLSMap.get("name"))
> + return dlmLockspaces
> +
> +def getGroupToolDLMLockspaces():
> + """
> + This function returns the names of all the dlm lockspace names found with the
> + command: "group_tool ls".
> +
> + @return: A list of all the dlm lockspace names.
> + @rtype: Array
> + """
> + dlmLockspaces = []
> + stdout = runCommandOutput("group_tool", ["ls"])
> + if (not stdout == None):
> + lines = stdout.split("\n")
> + for line in lines:
> + if (line.startswith("dlm")):
> + dlmLockspaces.append(line.split()[2])
> + return dlmLockspaces
> +
> +def getDLMLockspaces():
> + """
> + Returns a list of the dlm lockspace names.
> +
> + @return: Returns a list of dlm lockspace names.
> + @rtype: Array
> + """
> + message = "Gathering the DLM Lockspace Names."
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> + dlmLockspaces = getDLMToolDLMLockspaces()
> + if (not len(dlmLockspaces) > 0):
> + dlmLockspaces = getGroupToolDLMLockspaces()
> + return dlmLockspaces
> +
> +def getVerifiedDLMLockspaceNames(lockspaceNames):
> + """
> + Returns a list of DLM lockspaces that have been verified to exists in the
> + command output of $(dlm_tool ls).
> +
> + @return: Returns a list of DLM lockspaces that have been verified to exists
> + in the command output of $(dlm_tool ls).
> + @rtype: Array
> +
> + @param lockspaceNames: This is the list of DLM lockspaces that will have
> + their debug directory copied.
> + @type lockspaceNames: Array
> + """
> + # Get a list of all the DLM lockspaces names.
> + dlmLockspaces = getDLMLockspaces()
> + # Verify the lockspaceNames are lockspaces that exist.
> + verifiedLockspaceNames = []
> + for lockspaceName in lockspaceNames:
> + if ((lockspaceName in dlmLockspaces) and
> + (not lockspaceName in verifiedLockspaceNames)):
> + verifiedLockspaceNames.append(lockspaceName)
> + return verifiedLockspaceNames
> +
> def getMountedGFS2Filesystems():
> """
> This function returns a list of all the mounted GFS2 filesystems.
> @@ -659,32 +787,9 @@ def getLabelMapForMountedFilesystems(clusterName, listOfMountedFilesystems):
> mapOfMountedFilesystemLabels[fsLabel] = mountedFilesystem
> return mapOfMountedFilesystemLabels
>
> -def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
> - """
> - This function will attempt to mount a filesystem. If the filesystem is
> - already mounted or the filesystem was successfully mounted then True is
> - returned, otherwise False is returned.
> -
> - @return: If the filesystem is already mounted or the filesystem was
> - successfully mounted then True is returned, otherwise False is returned.
> - @rtype: Boolean
> -
> - @param filesystemType: The type of filesystem that will be mounted.
> - @type filesystemType: String
> - @param pathToDevice: The path to the device that will be mounted.
> - @type pathToDevice: String
> - @param pathToMountPoint: The path to the directory that will be used as the
> - mount point for the device.
> - @type pathToMountPoint: String
> - """
> - if (os.path.ismount(PATH_TO_DEBUG_DIR)):
> - return True
> - listOfCommandOptions = ["-t", filesystemType, pathToDevice, pathToMountPoint]
> - if (not runCommand("mount", listOfCommandOptions)):
> - message = "There was an error mounting the filesystem type %s for the device %s to the mount point %s." %(filesystemType, pathToDevice, pathToMountPoint)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> - return os.path.ismount(PATH_TO_DEBUG_DIR)
> -
> +# #####################################################################
> +# Gather output from command functions
> +# #####################################################################
> def gatherGeneralInformation(pathToDSTDir):
> """
> This function will gather general information about the cluster and write
> @@ -712,7 +817,15 @@ def gatherGeneralInformation(pathToDSTDir):
> pathToSrcFile = "/proc/slabinfo"
> copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
>
> + # Copy the DLM hash table sizes:
> + pathToHashTableFiles = ["/sys/kernel/config/dlm/cluster/lkbtbl_size", "/sys/kernel/config/dlm/cluster/dirtbl_size",
> + "/sys/kernel/config/dlm/cluster/rsbtbl_size"]
> + for pathToSrcFile in pathToHashTableFiles:
> + if (os.path.exists(pathToSrcFile)):
> + copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
> +
> # Get "ps -eo user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan" data.
> + # Get " ps h -AL -o tid,s,cmd
> command = "ps"
> pathToCommandOutput = os.path.join(pathToDSTDir, "ps_hALo-tid.s.cmd")
> try:
> @@ -721,7 +834,29 @@ def gatherGeneralInformation(pathToDSTDir):
> runCommand(command, ["h", "-AL", "-o", "tid,s,cmd"], standardOut=fout)
> fout.close()
> except IOError:
> - message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
> + message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +
> + # Get df -h ouput
> + command = "df"
> + pathToCommandOutput = os.path.join(pathToDSTDir, "df-h.cmd")
> + try:
> + fout = open(pathToCommandOutput, "w")
> + runCommand(command, ["-h"], standardOut=fout)
> + fout.close()
> + except IOError:
> + message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +
> + # Get lsof ouput
> + command = "lsof"
> + pathToCommandOutput = os.path.join(pathToDSTDir, "lsof.cmd")
> + try:
> + fout = open(pathToCommandOutput, "w")
> + runCommand(command, [], standardOut=fout)
> + fout.close()
> + except IOError:
> + message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> # Write the status of all the nodes in the cluster out.
> @@ -746,7 +881,9 @@ def gatherGeneralInformation(pathToDSTDir):
> message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> -
> +# #####################################################################
> +# Gather Process Information
> +# #####################################################################
> def isProcPidStackEnabled(pathToPidData):
> """
> Returns true if the init process has the file "stack" in its pid data
> @@ -810,6 +947,9 @@ def triggerSysRQEvents():
> message = "There was an error writing the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> +# #####################################################################
> +# Gather lockdumps and logs
> +# #####################################################################
> def gatherLogs(pathToDSTDir):
> """
> This function will copy all the cluster logs(/var/log/cluster) and the
> @@ -828,29 +968,46 @@ def gatherLogs(pathToDSTDir):
> pathToDSTLogDir = os.path.join(pathToDSTDir, os.path.basename(pathToLogDir))
> copyDirectory(pathToLogDir, pathToDSTDir)
>
> -def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
> +def gatherDLMLockDumps(pathToDSTDir, lockspaceNames):
> """
> - This function copies the debug files for dlm for a GFS2 filesystem in the
> - list to a directory. The list of GFS2 filesystems will only include the
> - filesystem name for each item in the list. For example: "mygfs2vol1"
> + This function copies all the debug files for dlm and sorts them into their
> + own directory based on name of dlm lockspace.
>
> @param pathToDSTDir: This is the path to directory where the files will be
> copied to.
> @type pathToDSTDir: String
> - @param listOfGFS2Filesystems: This is the list of the GFS2 filesystems that
> - will have their debug directory copied.
> - @type listOfGFS2Filesystems: Array
> + @param lockspaceNames: This is the list of DLM lockspaces that will have
> + their debug directory copied.
> + @type lockspaceNames: Array
> """
> + # This function assumes that verifiedLockspaceNames has already been called
> + # to verify the lockspace does exist.
> lockDumpType = "dlm"
> pathToSrcDir = os.path.join(PATH_TO_DEBUG_DIR, lockDumpType)
> pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
> message = "Copying the files in the %s lockdump data directory %s." %(lockDumpType.upper(), pathToSrcDir)
> logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> - for filename in os.listdir(pathToSrcDir):
> - for name in listOfGFS2Filesystems:
> - if (filename.startswith(name)):
> - copyFile(os.path.join(pathToSrcDir, filename),
> - os.path.join(os.path.join(pathToOutputDir, name), filename))
> +
> + # Get list of all the dlm lockspaces
> + if (os.path.exists(pathToSrcDir)):
> + for filename in os.listdir(pathToSrcDir):
> + for lockspaceName in lockspaceNames:
> + if (filename.startswith(lockspaceName)):
> + copyFile(os.path.join(pathToSrcDir, filename),
> + os.path.join(os.path.join(pathToOutputDir, lockspaceName), filename))
> +
> + # Run dlm_tool lockdebug against the lockspace names and write to file.
> + for lockspaceName in lockspaceNames:
> + dstDir = os.path.join(pathToOutputDir, lockspaceName)
> + if (mkdirs(dstDir)):
> + pathToCommandOutput = os.path.join(dstDir,"%s_lockdebug" %(lockspaceName))
> + try:
> + fout = open(pathToCommandOutput, "w")
> + runCommand("dlm_tool", ["lockdebug", "-v", "-s", "-w", lockspaceName], standardOut=fout)
> + fout.close()
> + except IOError:
> + message = "There was an error writing the command output to the file %s." %(pathToCommandOutput)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
> """
> @@ -875,6 +1032,8 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
> pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
> # The number of files that were copied
> fileCopiedCount = 0
> + if (not os.path.exists(pathToSrcDir)):
> + return False
> for dirName in os.listdir(pathToSrcDir):
> pathToCurrentDir = os.path.join(pathToSrcDir, dirName)
> if ((os.path.isdir(pathToCurrentDir)) and (dirName in listOfGFS2Filesystems)):
> @@ -886,6 +1045,7 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
> # If the number of files(not directories) copied was greater than zero then files were copied
> # succesfully.
> return (fileCopiedCount > 0)
> +
> # ##############################################################################
> # Get user selected options
> # ##############################################################################
> @@ -922,12 +1082,12 @@ def __getOptions(version) :
> cmdParser.add_option("-i", "--info",
> action="store_true",
> dest="enablePrintInfo",
> - help="prints information about the mounted GFS2 file systems",
> + help="prints information about the mounted GFS2 file-systems",
> default=False)
> - cmdParser.add_option("-t", "--archive",
> + cmdParser.add_option("-P", "--disable_process_gather",
> action="store_true",
> - dest="enableArchiveOutputDir",
> - help="the output directory will be archived(tar) and compressed(.bz2)",
> + dest="disableProcessGather",
> + help="the gathering of process information will be disabled",
> default=False)
> cmdParser.add_option("-o", "--path_to_output_dir",
> action="store",
> @@ -939,21 +1099,21 @@ def __getOptions(version) :
> cmdParser.add_option("-r", "--num_of_runs",
> action="store",
> dest="numberOfRuns",
> - help="number of runs capturing the lockdump data",
> + help="number of runs capturing the lockdump data(default: 3 runs)",
> type="int",
> metavar="<number of runs>",
> - default=2)
> + default=3)
> cmdParser.add_option("-s", "--seconds_sleep",
> action="store",
> dest="secondsToSleep",
> - help="number of seconds to sleep between runs of capturing the lockdump data",
> + help="number of seconds to sleep between runs of capturing the lockdump data(default: 120 seconds)",
> type="int",
> metavar="<seconds to sleep>",
> default=120)
> cmdParser.add_option("-n", "--fs_name",
> action="extend",
> dest="listOfGFS2Names",
> - help="name of the GFS2 filesystem(s) that will have their lockdump data captured",
> + help="name of the GFS2 filesystem(s) that will have their lockdump data captured(default: all GFS2 file-systems will be captured)",
> type="string",
> metavar="<name of GFS2 filesystem>",
> default=[])
> @@ -994,14 +1154,15 @@ class OptionParserExtended(OptionParser):
>
> examplesMessage += "\nIt will do 3 runs of gathering the lockdump information in 10 second intervals for only the"
> examplesMessage += "\nGFS2 filesystems with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress"
> - examplesMessage += "\nthe data collected. All of the lockdump data will be written to the directory: "
> - examplesMessage += "\n/tmp/2012-11-12_095556-gfs2_lockcapture and all the questions will be answered with yes.\n"
> - examplesMessage += "\n# %s -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1 -o /tmp/2012-11-12_095556-gfs2_lockcapture -y\n" %(self.__commandName)
> + examplesMessage += "\nthe data collected in the output directory:"
> + examplesMessage += "\n/tmp/cluster42-gfs2_lockcapture and all the questions will be answered with yes.\n"
> + examplesMessage += "\n# %s -r 3 -s 10 -n myGFS2vol2,myGFS2vol1 -o /tmp/cluster42-gfs2_lockcapture -y\n" %(self.__commandName)
>
> examplesMessage += "\nIt will do 2 runs of gathering the lockdump information in 25 second intervals for all the"
> - examplesMessage += "\nmounted GFS2 filesystems. Then it will archive and compress the data collected. All of the"
> - examplesMessage += "\nlockdump data will be written to the directory: /tmp/2012-11-12_095556-gfs2_lockcapture.\n"
> - examplesMessage += "\n# %s -r 2 -s 25 -t -o /tmp/2012-11-12_095556-gfs2_lockcapture\n" %(self.__commandName)
> + examplesMessage += "\nmounted GFS2 filesystems. The gathering process data will be disabled. Then it will archive and compress"
> + examplesMessage += "\nthe data collected in the output directory:"
> + examplesMessage += "\n/tmp/cluster42-gfs2_lockcapture and all the questions will be answered with yes.\n"
> + examplesMessage += "\n# %s -r 2 -s 25 -P -o /tmp/cluster42-gfs2_lockcapture\n" %(self.__commandName)
> OptionParser.print_help(self)
> print examplesMessage
>
> @@ -1073,6 +1234,14 @@ if __name__ == "__main__":
> # Create a new status function and level.
> logging.STATUS = logging.INFO + 2
> logging.addLevelName(logging.STATUS, "STATUS")
> +
> + # Log to main system logger that script has started then close the
> + # handler before the other handlers are created.
> + sysLogHandler = logging.handlers.SysLogHandler(address = '/dev/log')
> + logger.addHandler(sysLogHandler)
> + logger.info("Capturing of the data to analyze GFS2 lockdumps.")
> + logger.removeHandler(sysLogHandler)
> +
> # Create a function for the STATUS_LEVEL since not defined by python. This
> # means you can call it like the other predefined message
> # functions. Example: logging.getLogger("loggerName").status(message)
> @@ -1128,7 +1297,6 @@ if __name__ == "__main__":
> message += " %s" %(name)
> message += "."
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
> - exitScript(removePidFile=True, errorCode=1)
> if (cmdLineOpts.enablePrintInfo):
> logging.disable(logging.CRITICAL)
> print "List of all the mounted GFS2 filesystems that can have their lockdump data captured:"
> @@ -1231,27 +1399,48 @@ if __name__ == "__main__":
> # Going to sleep for 2 seconds, so that TIMESTAMP should be in the
> # past in the logs so that capturing sysrq data will be guaranteed.
> time.sleep(2)
> - # Gather the backtraces for all the pids, by grabbing the /proc/<pid
> - # number> or triggering sysrq events to capture task bask traces
> - # from log.
> - message = "Pass (%d/%d): Triggering the sysrq events for the host." %(i, cmdLineOpts.numberOfRuns)
> - logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> - # Gather the data in the /proc/<pid> directory if the file
> - # </proc/<pid>/stack exists. If file exists we will not trigger
> - # sysrq events.
> - pathToPidData = "/proc"
> - if (isProcPidStackEnabled(pathToPidData)):
> - gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
> - else:
> - triggerSysRQEvents()
> +
> + # If enabled then gather the process data.
> + if (not cmdLineOpts.disableProcessGather):
> + # Gather the backtraces for all the pids, by grabbing the /proc/<pid
> + # number> or triggering sysrq events to capture task bask traces
> + # from log.
> + # Gather the data in the /proc/<pid> directory if the file
> + # </proc/<pid>/stack exists. If file exists we will not trigger
> + # sysrq events.
> +
> + # Should I gather anyhow and only capture sysrq if needed.
> + pathToPidData = "/proc"
> + if (isProcPidStackEnabled(pathToPidData)):
> + message = "Pass (%d/%d): Triggering the capture of all pid directories in %s." %(i, cmdLineOpts.numberOfRuns, pathToPidData)
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> + gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
> + else:
> + message = "Pass (%d/%d): Triggering the sysrq events for the host since stack was not captured in pid directory." %(i, cmdLineOpts.numberOfRuns)
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> + triggerSysRQEvents()
> +
> + # #######################################################################
> + # Gather the DLM data and lock-dumps
> + # #######################################################################
> + # Gather data for the DLM lockspaces that are found.
> + lockspaceNames = clusternode.getMountedGFS2FilesystemNames(includeClusterName=False)
> + # In addition always gather these lockspaces(if they exist).
> + lockspaceNames.append("clvmd")
> + lockspaceNames.append("rgmanager")
> + # Verify that these lockspace names exist.
> + lockspaceNames = getVerifiedDLMLockspaceNames(lockspaceNames)
> # Gather the dlm locks.
> - lockDumpType = "dlm"
> - message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> + message = "Pass (%d/%d): Gathering the DLM lock-dumps for the host." %(i, cmdLineOpts.numberOfRuns)
> logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> - gatherDLMLockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames(includeClusterName=False))
> + # Add other notable lockspace names that should be captured if they exist.
> + gatherDLMLockDumps(pathToOutputRunDir, lockspaceNames)
> +
> + # #######################################################################
> + # Gather the GFS2 data and lock-dumps
> + # #######################################################################
> # Gather the glock locks from gfs2.
> - lockDumpType = "gfs2"
> - message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> + message = "Pass (%d/%d): Gathering the GFS2 lock-dumps for the host." %(i, cmdLineOpts.numberOfRuns)
> logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> if(gatherGFS2LockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames())):
> exitCode = 0
> @@ -1274,16 +1463,21 @@ if __name__ == "__main__":
> # #######################################################################
> message = "All the files have been gathered and this directory contains all the captured data: %s" %(pathToOutputDir)
> logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - if (cmdLineOpts.enableArchiveOutputDir):
> - message = "The lockdump data will now be archived. This could some time depending on the size of the data collected."
> + message = "The lockdump data will now be archive. This could some time depending on the size of the data collected."
> + logging.getLogger(MAIN_LOGGER_NAME).info(message)
> + pathToTarFilename = archiveData(pathToOutputDir)
> + if (os.path.exists(pathToTarFilename)):
> + message = "The compressed archvied file was created: %s" %(pathToTarFilename)
> logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - pathToTarFilename = archiveData(pathToOutputDir)
> - if (os.path.exists(pathToTarFilename)):
> - message = "The compressed archvied file was created: %s" %(pathToTarFilename)
> - logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - else:
> - message = "The compressed archvied failed to be created: %s" %(pathToTarFilename)
> + # Do some cleanup by removing the directory of the data if file archived file was created.
> + try:
> + shutil.rmtree(pathToOutputDir)
> + except OSError:
> + message = "There was an error removing the directory: %s." %(pathToOutputDir)
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + else:
> + message = "The compressed archvied failed to be created: %s" %(pathToTarFilename)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> # #######################################################################
> except KeyboardInterrupt:
> print ""
>
next prev parent reply other threads:[~2013-06-06 9:09 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-05 19:49 [Cluster-devel] [PATCH] gfs2_lockcapture: Added option to disable process data gathering, added gathering of dlm_tool lockdebug, df, lsof, DLM hash table sizes sbradley
2013-06-06 9:09 ` Andrew Price [this message]
2013-06-06 12:26 ` Shane Bradley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B051AC.7030401@redhat.com \
--to=anprice@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).