From: Shane Bradley <sbradley@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] gfs2_lockcapture: Added option to disable process data gathering, added gathering of dlm_tool lockdebug, df, lsof, DLM hash table sizes.
Date: Thu, 6 Jun 2013 08:26:50 -0400 [thread overview]
Message-ID: <00257159-062C-4169-B91C-554440B0646D@redhat.com> (raw)
In-Reply-To: <51B051AC.7030401@redhat.com>
Thanks for helping clean up man page.
----
Shane Bradley
Senior Software Maintenance Engineer (Cluster HA, GFS, GFS2)
Red Hat Global Support Services VC3 Raleigh, NC
On Jun 6, 2013, at 5:09 AM, Andrew Price <anprice@redhat.com> wrote:
> Hi Shane,
>
> On 05/06/13 20:49, sbradley at redhat.com wrote:
>> From: Shane Bradley <sbradley@redhat.com>
>>
>> The script no longer requires GFS2 mounts to capture data which allows the
>> capturing of dlm data without having a GFS2 mount. Added -P option so that
>> process gathering can be disabled. The following commands will have their
>> output saved: dlm_tool lockdebug, df -h, lsof, and contents of
>> /sys/kernel/config/dlm/cluster/*_size. The -t option was removed and all
>> output directories are .tar.bz2. The man page was updated with list of all
>> the files or command outputs that will be in the output directory.
>>
>> Signed-off-by: Shane Bradley <sbradley@redhat.com>
>
> I've pushed your patch with some tweaks to make the shortlog short and to tidy up some language in the man page a bit.
>
> Thanks,
>
> Andy
>
>> ---
>> gfs2/man/gfs2_lockcapture.8 | 85 +++++++---
>> gfs2/scripts/gfs2_lockcapture | 366 ++++++++++++++++++++++++++++++++----------
>> 2 files changed, 347 insertions(+), 104 deletions(-)
>>
>> diff --git a/gfs2/man/gfs2_lockcapture.8 b/gfs2/man/gfs2_lockcapture.8
>> index acd9113..0f2fd9a 100644
>> --- a/gfs2/man/gfs2_lockcapture.8
>> +++ b/gfs2/man/gfs2_lockcapture.8
>> @@ -1,22 +1,23 @@
>> .TH gfs2_lockcapture 8
>>
>> .SH NAME
>> -gfs2_lockcapture \- will capture locking information from GFS2 file systems and DLM.
>> +gfs2_lockcapture \- will capture locking information from GFS2 file-systems and DLM.
>>
>> .SH SYNOPSIS
>> -.B gfs2_lockcapture \fR[-dqyt] [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 filesystem]\fP
>> +.B gfs2_lockcapture \fR[-dqyP] [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 file-system]\fP
>> .PP
>> .B gfs2_lockcapture \fR[-dqyi]
>>
>> .SH DESCRIPTION
>> \fIgfs2_lockcapture\fR is used to capture all the GFS2 lockdump data and
>> -corresponding DLM data. The command can be configured to capture the data
>> +corresponding DLM data for GFS2 file-systems. The command can be configured to capture the data
>> multiple times and how much time to sleep between each iteration of capturing
>> -the data. By default all of the mounted GFS2 filesystems will have their data
>> -collected unless GFS2 filesystems are specified.
>> +the data. By default all of the mounted GFS2 file-systems will have their data
>> +collected unless GFS2 file-systems are specified.
>> .PP
>> -Please note that sysrq -t and -m events are trigger or the pid directories in /proc are
>> -collected on each iteration of capturing the data.
>> +Please note that sysrq -t(thread) and -m(memory) events are trigger or the
>> +pid directories in /proc are collected on each iteration of capturing the
>> +data unless they are disabled with the -P option.
>>
>> .SH OPTIONS
>> .TP
>> @@ -24,31 +25,79 @@ collected on each iteration of capturing the data.
>> Prints out a short usage message and exits.
>> .TP
>> \fB-d, --debug\fP
>> -enables debug logging.
>> +Enables debug logging.
>> .TP
>> \fB-q, --quiet\fP
>> -disables logging to console.
>> +Disables logging to console.
>> .TP
>> \fB-y, --no_ask\fP
>> -disables all questions and assumes yes.
>> +Disables all questions and assumes yes.
>> .TP
>> \fB-i, --info\fP
>> -prints information about the mounted GFS2 file systems.
>> +Prints information about the mounted GFS2 file-systems.
>> .TP
>> -\fB-t, --archive\fP
>> -the output directory will be archived(tar) and compressed(.bz2).
>> +\fB-P, --disable_process_gather\fP
>> +The gathering of process information will be disabled.
>> .TP
>> \fB-o \fI<output directory>, \fB--path_to_output_dir\fR=\fI<output directory>\fP
>> -the directory where all the collect data will stored.
>> +The directory where all the collect data will stored.
>> .TP
>> \fB-r \fI<number of runs>, \fB--num_of_runs\fR=\fI<number of runs>\fP
>> -number of runs capturing the lockdump data.
>> +The number of runs capturing the lockdump data. The default is 3 runs.
>> .TP
>> \fB-s \fI<seconds to sleep>, \fB--seconds_sleep\fR=\fI<seconds to sleep>\fP
>> -number of seconds to sleep between runs of capturing the lockdump data.
>> +The number of seconds to sleep between runs of capturing the lockdump data. The default is 120 seconds.
>> .TP
>> \fB-n \fI<name of GFS2 filesystem>, \fB--fs_name\fR=\fI<name of GFS2 filesystem>\fP
>> -name of the GFS2 filesystem(s) that will have their lockdump data captured.
>> +The name of the GFS2 filesystem(s) that will have their lockdump data captured. By default, all mounted GFS2 file-systems will have their data captured.
>> .
>> +.SH NOTES
>> +The following commands will be ran when capturing the data:
>> +.IP \(bu 2
>> +uname -a
>> +.IP \(bu 2
>> +uptime
>> +.IP \(bu 2
>> +ps h -AL -o "tid,s,cmd"
>> +.IP \(bu 2
>> +df -h
>> +.IP \(bu 2
>> +lsof
>> +.IP \(bu 2
>> +mount -l
>> +.IP \(bu 2
>> +dlm_tool ls
>> +.IP \(bu 2
>> +dlm_tool lockdebug -v -s -w <lockspace name>
>> +.IP \(bu 2
>> +echo "t" > /proc/sysrq-trigger (If /proc/1/stack does not exist)
>> +.IP \(bu 2
>> +echo "m" > /proc/sysrq-trigger (If /proc/1/stack does not exist)
>> +
>> +.SH AUTHOR
>> +.nf
>> +Shane Bradley <sbradley@fedoraproject.org>
>> +.fi
>> +.SH FILES
>> +.I /proc/mounts
>> +.br
>> +.I /proc/slabinfo
>> +.br
>> +.I /sys/kernel/config/dlm/cluster/lkbtbl_size
>> +.br
>> +.I /sys/kernel/config/dlm/cluster/dirtbl_size
>> +.br
>> +.I /sys/kernel/config/dlm/cluster/rsbtbl_size
>> +.br
>> +.I /sys/kernel/debug/gfs2/
>> +.br
>> +.I /sys/kernel/debug/dlm/
>> +.br
>> +.I /proc/<int>/
>> +(If /proc/1/stack does exists)
>> +.br
>> +.I /var/log/messages
>> +.br
>> +.I /var/log/cluster/
>> +.br
>> .SH SEE ALSO
>> -gfs2_lockanalyze(8)
>> diff --git a/gfs2/scripts/gfs2_lockcapture b/gfs2/scripts/gfs2_lockcapture
>> index 6a63fc8..81a0aeb 100644
>> --- a/gfs2/scripts/gfs2_lockcapture
>> +++ b/gfs2/scripts/gfs2_lockcapture
>> @@ -1,6 +1,6 @@
>> #!/usr/bin/env python
>> """
>> -The script gfs2_lockcapture will capture locking information from GFS2 file
>> +The script "gfs2_lockcapture" will capture locking information from GFS2 file
>> systems and DLM.
>>
>> @author : Shane Bradley
>> @@ -12,6 +12,7 @@ import sys
>> import os
>> import os.path
>> import logging
>> +import logging.handlers
>> from optparse import OptionParser, Option
>> import time
>> import platform
>> @@ -33,7 +34,7 @@ import tarfile
>> sure only 1 instance of this script is running at any time.
>> @type PATH_TO_PID_FILENAME: String
>> """
>> -VERSION_NUMBER = "0.9-3"
>> +VERSION_NUMBER = "0.9-7"
>> MAIN_LOGGER_NAME = "%s" %(os.path.basename(sys.argv[0]))
>> PATH_TO_DEBUG_DIR="/sys/kernel/debug"
>> PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
>> @@ -43,7 +44,7 @@ PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
>> # #####################################################################
>> class ClusterNode:
>> """
>> - This class represents a cluster node that is a current memeber in a cluster.
>> + This class represents a cluster node that is a current member in a cluster.
>> """
>> def __init__(self, clusternodeName, clusternodeID, clusterName, mapOfMountedFilesystemLabels):
>> """
>> @@ -115,7 +116,7 @@ class ClusterNode:
>> mounted GFS2 filesystems. If includeClusterName is False it will only
>> return a list of all the mounted GFS2 filesystem names(ex. mygfs2vol1).
>>
>> - @return: Returns a list of all teh mounted GFS2 filesystem names.
>> + @return: Returns a list of all the mounted GFS2 filesystem names.
>> @rtype: Array
>>
>> @param includeClusterName: By default this option is True and will
>> @@ -134,6 +135,24 @@ class ClusterNode:
>> listOfGFS2MountedFilesystemLabels.append(fsLabelSplit[1])
>> return listOfGFS2MountedFilesystemLabels
>>
>> + def getMountedGFS2FilesystemPaths(self):
>> + """
>> + Returns a map of all the mounted GFS2 filesystem paths. The key is the
>> + GFS2 fs name(clustername:fs name) and value is the mountpoint.
>> +
>> + @return: Returns a map of all the mounted GFS2 filesystem paths. The key
>> + is the GFS2 fs name(clustername:fs name) and value is the mountpoint.
>> + Returns a list of all the mounted GFS2 filesystem paths.
>> + @rtype: Map
>> + """
>> + mapOfGFS2MountedFilesystemPaths = {}
>> + for fsLabel in self.__mapOfMountedFilesystemLabels.keys():
>> + value = self.__mapOfMountedFilesystemLabels.get(fsLabel)
>> + mountPoint = value.split("type", 1)[0].split("on")[1]
>> + if (len(mountPoint) > 0):
>> + mapOfGFS2MountedFilesystemPaths[fsLabel] = mountPoint
>> + return mapOfGFS2MountedFilesystemPaths
>> +
>> # #####################################################################
>> # Helper functions.
>> # #####################################################################
>> @@ -328,7 +347,7 @@ def archiveData(pathToSrcDir):
>> message = "A compressed archvied file already exists and will be removed: %s" %(pathToTarFilename)
>> logging.getLogger(MAIN_LOGGER_NAME).status(message)
>> try:
>> - os.remove(PATH_TO_PID_FILENAME)
>> + os.remove(pathToTarFilename)
>> except IOError:
>> message = "There was an error removing the file: %s." %(pathToTarFilename)
>> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> @@ -508,6 +527,32 @@ def backupOutputDirectory(pathToOutputDir):
>> # existing output directory.
>> return (not os.path.exists(pathToOutputDir))
>>
>> +def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
>> + """
>> + This function will attempt to mount a filesystem. If the filesystem is
>> + already mounted or the filesystem was successfully mounted then True is
>> + returned, otherwise False is returned.
>> +
>> + @return: If the filesystem is already mounted or the filesystem was
>> + successfully mounted then True is returned, otherwise False is returned.
>> + @rtype: Boolean
>> +
>> + @param filesystemType: The type of filesystem that will be mounted.
>> + @type filesystemType: String
>> + @param pathToDevice: The path to the device that will be mounted.
>> + @type pathToDevice: String
>> + @param pathToMountPoint: The path to the directory that will be used as the
>> + mount point for the device.
>> + @type pathToMountPoint: String
>> + """
>> + if (os.path.ismount(PATH_TO_DEBUG_DIR)):
>> + return True
>> + listOfCommandOptions = ["-t", filesystemType, pathToDevice, pathToMountPoint]
>> + if (not runCommand("mount", listOfCommandOptions)):
>> + message = "There was an error mounting the filesystem type %s for the device %s to the mount point %s." %(filesystemType, pathToDevice, pathToMountPoint)
>> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> + return os.path.ismount(PATH_TO_DEBUG_DIR)
>> +
>> def exitScript(removePidFile=True, errorCode=0):
>> """
>> This function will cause the script to exit or quit. It will return an error
>> @@ -615,6 +660,89 @@ def getClusterNode(listOfGFS2Names):
>> else:
>> return None
>>
>> +
>> +def getDLMToolDLMLockspaces():
>> + """
>> + This function returns the names of all the dlm lockspace names found with the
>> + command: "dlm_tool ls".
>> +
>> + @return: A list of all the dlm lockspace names.
>> + @rtype: Array
>> + """
>> + dlmLockspaces = []
>> + stdout = runCommandOutput("dlm_tool", ["ls"])
>> + if (not stdout == None):
>> + stdout = stdout.replace("dlm lockspaces\n", "")
>> + dlmToolLSKeys = ["name", "id", "flags", "change", "members"]
>> + # Split on newlines
>> + stdoutSections = stdout.split("\n\n")
>> + for section in stdoutSections:
>> + # Create tmp map to hold data
>> + dlmToolLSMap = dict.fromkeys(dlmToolLSKeys)
>> + lines = section.split("\n")
>> + for line in lines:
>> + for dlmToolLSKey in dlmToolLSMap.keys():
>> + if (line.startswith(dlmToolLSKey)):
>> + value = line.replace(dlmToolLSKey, " ", 1).strip().rstrip()
>> + dlmToolLSMap[dlmToolLSKey] = value
>> + if ((not dlmToolLSMap.get("name") == None) and (not dlmToolLSMap.get("id") == None)):
>> + dlmLockspaces.append(dlmToolLSMap.get("name"))
>> + return dlmLockspaces
>> +
>> +def getGroupToolDLMLockspaces():
>> + """
>> + This function returns the names of all the dlm lockspace names found with the
>> + command: "group_tool ls".
>> +
>> + @return: A list of all the dlm lockspace names.
>> + @rtype: Array
>> + """
>> + dlmLockspaces = []
>> + stdout = runCommandOutput("group_tool", ["ls"])
>> + if (not stdout == None):
>> + lines = stdout.split("\n")
>> + for line in lines:
>> + if (line.startswith("dlm")):
>> + dlmLockspaces.append(line.split()[2])
>> + return dlmLockspaces
>> +
>> +def getDLMLockspaces():
>> + """
>> + Returns a list of the dlm lockspace names.
>> +
>> + @return: Returns a list of dlm lockspace names.
>> + @rtype: Array
>> + """
>> + message = "Gathering the DLM Lockspace Names."
>> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>> + dlmLockspaces = getDLMToolDLMLockspaces()
>> + if (not len(dlmLockspaces) > 0):
>> + dlmLockspaces = getGroupToolDLMLockspaces()
>> + return dlmLockspaces
>> +
>> +def getVerifiedDLMLockspaceNames(lockspaceNames):
>> + """
>> + Returns a list of DLM lockspaces that have been verified to exists in the
>> + command output of $(dlm_tool ls).
>> +
>> + @return: Returns a list of DLM lockspaces that have been verified to exists
>> + in the command output of $(dlm_tool ls).
>> + @rtype: Array
>> +
>> + @param lockspaceNames: This is the list of DLM lockspaces that will have
>> + their debug directory copied.
>> + @type lockspaceNames: Array
>> + """
>> + # Get a list of all the DLM lockspaces names.
>> + dlmLockspaces = getDLMLockspaces()
>> + # Verify the lockspaceNames are lockspaces that exist.
>> + verifiedLockspaceNames = []
>> + for lockspaceName in lockspaceNames:
>> + if ((lockspaceName in dlmLockspaces) and
>> + (not lockspaceName in verifiedLockspaceNames)):
>> + verifiedLockspaceNames.append(lockspaceName)
>> + return verifiedLockspaceNames
>> +
>> def getMountedGFS2Filesystems():
>> """
>> This function returns a list of all the mounted GFS2 filesystems.
>> @@ -659,32 +787,9 @@ def getLabelMapForMountedFilesystems(clusterName, listOfMountedFilesystems):
>> mapOfMountedFilesystemLabels[fsLabel] = mountedFilesystem
>> return mapOfMountedFilesystemLabels
>>
>> -def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
>> - """
>> - This function will attempt to mount a filesystem. If the filesystem is
>> - already mounted or the filesystem was successfully mounted then True is
>> - returned, otherwise False is returned.
>> -
>> - @return: If the filesystem is already mounted or the filesystem was
>> - successfully mounted then True is returned, otherwise False is returned.
>> - @rtype: Boolean
>> -
>> - @param filesystemType: The type of filesystem that will be mounted.
>> - @type filesystemType: String
>> - @param pathToDevice: The path to the device that will be mounted.
>> - @type pathToDevice: String
>> - @param pathToMountPoint: The path to the directory that will be used as the
>> - mount point for the device.
>> - @type pathToMountPoint: String
>> - """
>> - if (os.path.ismount(PATH_TO_DEBUG_DIR)):
>> - return True
>> - listOfCommandOptions = ["-t", filesystemType, pathToDevice, pathToMountPoint]
>> - if (not runCommand("mount", listOfCommandOptions)):
>> - message = "There was an error mounting the filesystem type %s for the device %s to the mount point %s." %(filesystemType, pathToDevice, pathToMountPoint)
>> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> - return os.path.ismount(PATH_TO_DEBUG_DIR)
>> -
>> +# #####################################################################
>> +# Gather output from command functions
>> +# #####################################################################
>> def gatherGeneralInformation(pathToDSTDir):
>> """
>> This function will gather general information about the cluster and write
>> @@ -712,7 +817,15 @@ def gatherGeneralInformation(pathToDSTDir):
>> pathToSrcFile = "/proc/slabinfo"
>> copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
>>
>> + # Copy the DLM hash table sizes:
>> + pathToHashTableFiles = ["/sys/kernel/config/dlm/cluster/lkbtbl_size", "/sys/kernel/config/dlm/cluster/dirtbl_size",
>> + "/sys/kernel/config/dlm/cluster/rsbtbl_size"]
>> + for pathToSrcFile in pathToHashTableFiles:
>> + if (os.path.exists(pathToSrcFile)):
>> + copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
>> +
>> # Get "ps -eo user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan" data.
>> + # Get " ps h -AL -o tid,s,cmd
>> command = "ps"
>> pathToCommandOutput = os.path.join(pathToDSTDir, "ps_hALo-tid.s.cmd")
>> try:
>> @@ -721,7 +834,29 @@ def gatherGeneralInformation(pathToDSTDir):
>> runCommand(command, ["h", "-AL", "-o", "tid,s,cmd"], standardOut=fout)
>> fout.close()
>> except IOError:
>> - message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
>> + message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
>> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> +
>> + # Get df -h ouput
>> + command = "df"
>> + pathToCommandOutput = os.path.join(pathToDSTDir, "df-h.cmd")
>> + try:
>> + fout = open(pathToCommandOutput, "w")
>> + runCommand(command, ["-h"], standardOut=fout)
>> + fout.close()
>> + except IOError:
>> + message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
>> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> +
>> + # Get lsof ouput
>> + command = "lsof"
>> + pathToCommandOutput = os.path.join(pathToDSTDir, "lsof.cmd")
>> + try:
>> + fout = open(pathToCommandOutput, "w")
>> + runCommand(command, [], standardOut=fout)
>> + fout.close()
>> + except IOError:
>> + message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
>> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>>
>> # Write the status of all the nodes in the cluster out.
>> @@ -746,7 +881,9 @@ def gatherGeneralInformation(pathToDSTDir):
>> message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
>> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>>
>> -
>> +# #####################################################################
>> +# Gather Process Information
>> +# #####################################################################
>> def isProcPidStackEnabled(pathToPidData):
>> """
>> Returns true if the init process has the file "stack" in its pid data
>> @@ -810,6 +947,9 @@ def triggerSysRQEvents():
>> message = "There was an error writing the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
>> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>>
>> +# #####################################################################
>> +# Gather lockdumps and logs
>> +# #####################################################################
>> def gatherLogs(pathToDSTDir):
>> """
>> This function will copy all the cluster logs(/var/log/cluster) and the
>> @@ -828,29 +968,46 @@ def gatherLogs(pathToDSTDir):
>> pathToDSTLogDir = os.path.join(pathToDSTDir, os.path.basename(pathToLogDir))
>> copyDirectory(pathToLogDir, pathToDSTDir)
>>
>> -def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
>> +def gatherDLMLockDumps(pathToDSTDir, lockspaceNames):
>> """
>> - This function copies the debug files for dlm for a GFS2 filesystem in the
>> - list to a directory. The list of GFS2 filesystems will only include the
>> - filesystem name for each item in the list. For example: "mygfs2vol1"
>> + This function copies all the debug files for dlm and sorts them into their
>> + own directory based on name of dlm lockspace.
>>
>> @param pathToDSTDir: This is the path to directory where the files will be
>> copied to.
>> @type pathToDSTDir: String
>> - @param listOfGFS2Filesystems: This is the list of the GFS2 filesystems that
>> - will have their debug directory copied.
>> - @type listOfGFS2Filesystems: Array
>> + @param lockspaceNames: This is the list of DLM lockspaces that will have
>> + their debug directory copied.
>> + @type lockspaceNames: Array
>> """
>> + # This function assumes that verifiedLockspaceNames has already been called
>> + # to verify the lockspace does exist.
>> lockDumpType = "dlm"
>> pathToSrcDir = os.path.join(PATH_TO_DEBUG_DIR, lockDumpType)
>> pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
>> message = "Copying the files in the %s lockdump data directory %s." %(lockDumpType.upper(), pathToSrcDir)
>> logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>> - for filename in os.listdir(pathToSrcDir):
>> - for name in listOfGFS2Filesystems:
>> - if (filename.startswith(name)):
>> - copyFile(os.path.join(pathToSrcDir, filename),
>> - os.path.join(os.path.join(pathToOutputDir, name), filename))
>> +
>> + # Get list of all the dlm lockspaces
>> + if (os.path.exists(pathToSrcDir)):
>> + for filename in os.listdir(pathToSrcDir):
>> + for lockspaceName in lockspaceNames:
>> + if (filename.startswith(lockspaceName)):
>> + copyFile(os.path.join(pathToSrcDir, filename),
>> + os.path.join(os.path.join(pathToOutputDir, lockspaceName), filename))
>> +
>> + # Run dlm_tool lockdebug against the lockspace names and write to file.
>> + for lockspaceName in lockspaceNames:
>> + dstDir = os.path.join(pathToOutputDir, lockspaceName)
>> + if (mkdirs(dstDir)):
>> + pathToCommandOutput = os.path.join(dstDir,"%s_lockdebug" %(lockspaceName))
>> + try:
>> + fout = open(pathToCommandOutput, "w")
>> + runCommand("dlm_tool", ["lockdebug", "-v", "-s", "-w", lockspaceName], standardOut=fout)
>> + fout.close()
>> + except IOError:
>> + message = "There was an error writing the command output to the file %s." %(pathToCommandOutput)
>> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
>>
>> def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>> """
>> @@ -875,6 +1032,8 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>> pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
>> # The number of files that were copied
>> fileCopiedCount = 0
>> + if (not os.path.exists(pathToSrcDir)):
>> + return False
>> for dirName in os.listdir(pathToSrcDir):
>> pathToCurrentDir = os.path.join(pathToSrcDir, dirName)
>> if ((os.path.isdir(pathToCurrentDir)) and (dirName in listOfGFS2Filesystems)):
>> @@ -886,6 +1045,7 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>> # If the number of files(not directories) copied was greater than zero then files were copied
>> # succesfully.
>> return (fileCopiedCount > 0)
>> +
>> # ##############################################################################
>> # Get user selected options
>> # ##############################################################################
>> @@ -922,12 +1082,12 @@ def __getOptions(version) :
>> cmdParser.add_option("-i", "--info",
>> action="store_true",
>> dest="enablePrintInfo",
>> - help="prints information about the mounted GFS2 file systems",
>> + help="prints information about the mounted GFS2 file-systems",
>> default=False)
>> - cmdParser.add_option("-t", "--archive",
>> + cmdParser.add_option("-P", "--disable_process_gather",
>> action="store_true",
>> - dest="enableArchiveOutputDir",
>> - help="the output directory will be archived(tar) and compressed(.bz2)",
>> + dest="disableProcessGather",
>> + help="the gathering of process information will be disabled",
>> default=False)
>> cmdParser.add_option("-o", "--path_to_output_dir",
>> action="store",
>> @@ -939,21 +1099,21 @@ def __getOptions(version) :
>> cmdParser.add_option("-r", "--num_of_runs",
>> action="store",
>> dest="numberOfRuns",
>> - help="number of runs capturing the lockdump data",
>> + help="number of runs capturing the lockdump data(default: 3 runs)",
>> type="int",
>> metavar="<number of runs>",
>> - default=2)
>> + default=3)
>> cmdParser.add_option("-s", "--seconds_sleep",
>> action="store",
>> dest="secondsToSleep",
>> - help="number of seconds to sleep between runs of capturing the lockdump data",
>> + help="number of seconds to sleep between runs of capturing the lockdump data(default: 120 seconds)",
>> type="int",
>> metavar="<seconds to sleep>",
>> default=120)
>> cmdParser.add_option("-n", "--fs_name",
>> action="extend",
>> dest="listOfGFS2Names",
>> - help="name of the GFS2 filesystem(s) that will have their lockdump data captured",
>> + help="name of the GFS2 filesystem(s) that will have their lockdump data captured(default: all GFS2 file-systems will be captured)",
>> type="string",
>> metavar="<name of GFS2 filesystem>",
>> default=[])
>> @@ -994,14 +1154,15 @@ class OptionParserExtended(OptionParser):
>>
>> examplesMessage += "\nIt will do 3 runs of gathering the lockdump information in 10 second intervals for only the"
>> examplesMessage += "\nGFS2 filesystems with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress"
>> - examplesMessage += "\nthe data collected. All of the lockdump data will be written to the directory: "
>> - examplesMessage += "\n/tmp/2012-11-12_095556-gfs2_lockcapture and all the questions will be answered with yes.\n"
>> - examplesMessage += "\n# %s -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1 -o /tmp/2012-11-12_095556-gfs2_lockcapture -y\n" %(self.__commandName)
>> + examplesMessage += "\nthe data collected in the output directory:"
>> + examplesMessage += "\n/tmp/cluster42-gfs2_lockcapture and all the questions will be answered with yes.\n"
>> + examplesMessage += "\n# %s -r 3 -s 10 -n myGFS2vol2,myGFS2vol1 -o /tmp/cluster42-gfs2_lockcapture -y\n" %(self.__commandName)
>>
>> examplesMessage += "\nIt will do 2 runs of gathering the lockdump information in 25 second intervals for all the"
>> - examplesMessage += "\nmounted GFS2 filesystems. Then it will archive and compress the data collected. All of the"
>> - examplesMessage += "\nlockdump data will be written to the directory: /tmp/2012-11-12_095556-gfs2_lockcapture.\n"
>> - examplesMessage += "\n# %s -r 2 -s 25 -t -o /tmp/2012-11-12_095556-gfs2_lockcapture\n" %(self.__commandName)
>> + examplesMessage += "\nmounted GFS2 filesystems. The gathering process data will be disabled. Then it will archive and compress"
>> + examplesMessage += "\nthe data collected in the output directory:"
>> + examplesMessage += "\n/tmp/cluster42-gfs2_lockcapture and all the questions will be answered with yes.\n"
>> + examplesMessage += "\n# %s -r 2 -s 25 -P -o /tmp/cluster42-gfs2_lockcapture\n" %(self.__commandName)
>> OptionParser.print_help(self)
>> print examplesMessage
>>
>> @@ -1073,6 +1234,14 @@ if __name__ == "__main__":
>> # Create a new status function and level.
>> logging.STATUS = logging.INFO + 2
>> logging.addLevelName(logging.STATUS, "STATUS")
>> +
>> + # Log to main system logger that script has started then close the
>> + # handler before the other handlers are created.
>> + sysLogHandler = logging.handlers.SysLogHandler(address = '/dev/log')
>> + logger.addHandler(sysLogHandler)
>> + logger.info("Capturing of the data to analyze GFS2 lockdumps.")
>> + logger.removeHandler(sysLogHandler)
>> +
>> # Create a function for the STATUS_LEVEL since not defined by python. This
>> # means you can call it like the other predefined message
>> # functions. Example: logging.getLogger("loggerName").status(message)
>> @@ -1128,7 +1297,6 @@ if __name__ == "__main__":
>> message += " %s" %(name)
>> message += "."
>> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> - exitScript(removePidFile=True, errorCode=1)
>> if (cmdLineOpts.enablePrintInfo):
>> logging.disable(logging.CRITICAL)
>> print "List of all the mounted GFS2 filesystems that can have their lockdump data captured:"
>> @@ -1231,27 +1399,48 @@ if __name__ == "__main__":
>> # Going to sleep for 2 seconds, so that TIMESTAMP should be in the
>> # past in the logs so that capturing sysrq data will be guaranteed.
>> time.sleep(2)
>> - # Gather the backtraces for all the pids, by grabbing the /proc/<pid
>> - # number> or triggering sysrq events to capture task bask traces
>> - # from log.
>> - message = "Pass (%d/%d): Triggering the sysrq events for the host." %(i, cmdLineOpts.numberOfRuns)
>> - logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>> - # Gather the data in the /proc/<pid> directory if the file
>> - # </proc/<pid>/stack exists. If file exists we will not trigger
>> - # sysrq events.
>> - pathToPidData = "/proc"
>> - if (isProcPidStackEnabled(pathToPidData)):
>> - gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
>> - else:
>> - triggerSysRQEvents()
>> +
>> + # If enabled then gather the process data.
>> + if (not cmdLineOpts.disableProcessGather):
>> + # Gather the backtraces for all the pids, by grabbing the /proc/<pid
>> + # number> or triggering sysrq events to capture task bask traces
>> + # from log.
>> + # Gather the data in the /proc/<pid> directory if the file
>> + # </proc/<pid>/stack exists. If file exists we will not trigger
>> + # sysrq events.
>> +
>> + # Should I gather anyhow and only capture sysrq if needed.
>> + pathToPidData = "/proc"
>> + if (isProcPidStackEnabled(pathToPidData)):
>> + message = "Pass (%d/%d): Triggering the capture of all pid directories in %s." %(i, cmdLineOpts.numberOfRuns, pathToPidData)
>> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>> + gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
>> + else:
>> + message = "Pass (%d/%d): Triggering the sysrq events for the host since stack was not captured in pid directory." %(i, cmdLineOpts.numberOfRuns)
>> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>> + triggerSysRQEvents()
>> +
>> + # #######################################################################
>> + # Gather the DLM data and lock-dumps
>> + # #######################################################################
>> + # Gather data for the DLM lockspaces that are found.
>> + lockspaceNames = clusternode.getMountedGFS2FilesystemNames(includeClusterName=False)
>> + # In addition always gather these lockspaces(if they exist).
>> + lockspaceNames.append("clvmd")
>> + lockspaceNames.append("rgmanager")
>> + # Verify that these lockspace names exist.
>> + lockspaceNames = getVerifiedDLMLockspaceNames(lockspaceNames)
>> # Gather the dlm locks.
>> - lockDumpType = "dlm"
>> - message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
>> + message = "Pass (%d/%d): Gathering the DLM lock-dumps for the host." %(i, cmdLineOpts.numberOfRuns)
>> logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>> - gatherDLMLockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames(includeClusterName=False))
>> + # Add other notable lockspace names that should be captured if they exist.
>> + gatherDLMLockDumps(pathToOutputRunDir, lockspaceNames)
>> +
>> + # #######################################################################
>> + # Gather the GFS2 data and lock-dumps
>> + # #######################################################################
>> # Gather the glock locks from gfs2.
>> - lockDumpType = "gfs2"
>> - message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
>> + message = "Pass (%d/%d): Gathering the GFS2 lock-dumps for the host." %(i, cmdLineOpts.numberOfRuns)
>> logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>> if(gatherGFS2LockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames())):
>> exitCode = 0
>> @@ -1274,16 +1463,21 @@ if __name__ == "__main__":
>> # #######################################################################
>> message = "All the files have been gathered and this directory contains all the captured data: %s" %(pathToOutputDir)
>> logging.getLogger(MAIN_LOGGER_NAME).info(message)
>> - if (cmdLineOpts.enableArchiveOutputDir):
>> - message = "The lockdump data will now be archived. This could some time depending on the size of the data collected."
>> + message = "The lockdump data will now be archive. This could some time depending on the size of the data collected."
>> + logging.getLogger(MAIN_LOGGER_NAME).info(message)
>> + pathToTarFilename = archiveData(pathToOutputDir)
>> + if (os.path.exists(pathToTarFilename)):
>> + message = "The compressed archvied file was created: %s" %(pathToTarFilename)
>> logging.getLogger(MAIN_LOGGER_NAME).info(message)
>> - pathToTarFilename = archiveData(pathToOutputDir)
>> - if (os.path.exists(pathToTarFilename)):
>> - message = "The compressed archvied file was created: %s" %(pathToTarFilename)
>> - logging.getLogger(MAIN_LOGGER_NAME).info(message)
>> - else:
>> - message = "The compressed archvied failed to be created: %s" %(pathToTarFilename)
>> + # Do some cleanup by removing the directory of the data if file archived file was created.
>> + try:
>> + shutil.rmtree(pathToOutputDir)
>> + except OSError:
>> + message = "There was an error removing the directory: %s." %(pathToOutputDir)
>> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> + else:
>> + message = "The compressed archvied failed to be created: %s" %(pathToTarFilename)
>> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
>> # #######################################################################
>> except KeyboardInterrupt:
>> print ""
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20130606/cc3727d0/attachment.htm>
prev parent reply other threads:[~2013-06-06 12:26 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-05 19:49 [Cluster-devel] [PATCH] gfs2_lockcapture: Added option to disable process data gathering, added gathering of dlm_tool lockdebug, df, lsof, DLM hash table sizes sbradley
2013-06-06 9:09 ` Andrew Price
2013-06-06 12:26 ` Shane Bradley [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00257159-062C-4169-B91C-554440B0646D@redhat.com \
--to=sbradley@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).