cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Andrew Price <anprice@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] gfs2_lockcapture: Added option to disable process data gathering, added     gathering of dlm_tool lockdebug, df, lsof, DLM hash table sizes.
Date: Thu, 06 Jun 2013 10:09:00 +0100	[thread overview]
Message-ID: <51B051AC.7030401@redhat.com> (raw)
In-Reply-To: <1370461752-18653-1-git-send-email-sbradley@redhat.com>

Hi Shane,

On 05/06/13 20:49, sbradley at redhat.com wrote:
> From: Shane Bradley <sbradley@redhat.com>
>
>      The script no longer requires GFS2 mounts to capture data which allows the
>      capturing of dlm data without having a GFS2 mount. Added -P option so that
>      process gathering can be disabled.  The following commands will have their
>      output saved: dlm_tool lockdebug, df -h, lsof, and contents of
>      /sys/kernel/config/dlm/cluster/*_size. The -t option was removed and all
>      output directories are .tar.bz2. The man page was updated with list of all
>      the files or command outputs that will be in the output directory.
>
>      Signed-off-by: Shane Bradley <sbradley@redhat.com>

I've pushed your patch with some tweaks to make the shortlog short and 
to tidy up some language in the man page a bit.

Thanks,

Andy

> ---
>   gfs2/man/gfs2_lockcapture.8   |  85 +++++++---
>   gfs2/scripts/gfs2_lockcapture | 366 ++++++++++++++++++++++++++++++++----------
>   2 files changed, 347 insertions(+), 104 deletions(-)
>
> diff --git a/gfs2/man/gfs2_lockcapture.8 b/gfs2/man/gfs2_lockcapture.8
> index acd9113..0f2fd9a 100644
> --- a/gfs2/man/gfs2_lockcapture.8
> +++ b/gfs2/man/gfs2_lockcapture.8
> @@ -1,22 +1,23 @@
>   .TH gfs2_lockcapture 8
>
>   .SH NAME
> -gfs2_lockcapture \- will capture locking information from GFS2 file systems and DLM.
> +gfs2_lockcapture \- will capture locking information from GFS2 file-systems and DLM.
>
>   .SH SYNOPSIS
> -.B gfs2_lockcapture \fR[-dqyt]  [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 filesystem]\fP
> +.B gfs2_lockcapture \fR[-dqyP]  [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 file-system]\fP
>   .PP
>   .B gfs2_lockcapture \fR[-dqyi]
>
>   .SH DESCRIPTION
>   \fIgfs2_lockcapture\fR is used to capture all the GFS2 lockdump data and
> -corresponding DLM data. The command can be configured to capture the data
> +corresponding DLM data for GFS2 file-systems. The command can be configured to capture the data
>   multiple times and how much time to sleep between each iteration of capturing
> -the data. By default all of the mounted GFS2 filesystems will have their data
> -collected unless GFS2 filesystems are specified.
> +the data. By default all of the mounted GFS2 file-systems will have their data
> +collected unless GFS2 file-systems are specified.
>   .PP
> -Please note that sysrq -t and -m events are trigger or the pid directories in /proc are
> -collected on each iteration of capturing the data.
> +Please note that sysrq -t(thread) and -m(memory) events are trigger or the
> +pid directories in /proc are collected on each iteration of capturing the
> +data unless they are disabled with the -P option.
>
>   .SH OPTIONS
>   .TP
> @@ -24,31 +25,79 @@ collected on each iteration of capturing the data.
>   Prints out a short usage message and exits.
>   .TP
>   \fB-d,  --debug\fP
> -enables debug logging.
> +Enables debug logging.
>   .TP
>   \fB-q,  --quiet\fP
> -disables logging to console.
> +Disables logging to console.
>   .TP
>   \fB-y,  --no_ask\fP
> -disables all questions and assumes yes.
> +Disables all questions and assumes yes.
>   .TP
>   \fB-i,  --info\fP
> -prints information about the mounted GFS2 file systems.
> +Prints information about the mounted GFS2 file-systems.
>   .TP
> -\fB-t,  --archive\fP
> -the output directory will be archived(tar) and compressed(.bz2).
> +\fB-P,  --disable_process_gather\fP
> +The gathering of process information will be disabled.
>   .TP
>   \fB-o \fI<output directory>, \fB--path_to_output_dir\fR=\fI<output directory>\fP
> -the directory where all the collect data will stored.
> +The directory where all the collect data will stored.
>   .TP
>   \fB-r \fI<number of runs>,  \fB--num_of_runs\fR=\fI<number of runs>\fP
> -number of runs capturing the lockdump data.
> +The number of runs capturing the lockdump data. The default is 3 runs.
>   .TP
>   \fB-s \fI<seconds to sleep>,  \fB--seconds_sleep\fR=\fI<seconds to sleep>\fP
> -number of seconds to sleep between runs of capturing the lockdump data.
> +The number of seconds to sleep between runs of capturing the lockdump data. The default is 120 seconds.
>   .TP
>   \fB-n \fI<name of GFS2 filesystem>,  \fB--fs_name\fR=\fI<name of GFS2 filesystem>\fP
> -name of the GFS2 filesystem(s) that will have their lockdump data captured.
> +The name of the GFS2 filesystem(s) that will have their lockdump data captured. By default, all mounted GFS2 file-systems will have their data captured.
>   .
> +.SH NOTES
> +The following commands will be ran when capturing the data:
> +.IP \(bu 2
> +uname -a
> +.IP \(bu 2
> +uptime
> +.IP \(bu 2
> +ps h -AL -o "tid,s,cmd"
> +.IP \(bu 2
> +df -h
> +.IP \(bu 2
> +lsof
> +.IP \(bu 2
> +mount -l
> +.IP \(bu 2
> +dlm_tool ls
> +.IP \(bu 2
> +dlm_tool lockdebug -v -s -w <lockspace name>
> +.IP \(bu 2
> +echo "t" > /proc/sysrq-trigger (If /proc/1/stack does not exist)
> +.IP \(bu 2
> +echo "m" > /proc/sysrq-trigger (If /proc/1/stack does not exist)
> +
> +.SH AUTHOR
> +.nf
> +Shane Bradley <sbradley@fedoraproject.org>
> +.fi
> +.SH FILES
> +.I /proc/mounts
> +.br
> +.I /proc/slabinfo
> +.br
> +.I /sys/kernel/config/dlm/cluster/lkbtbl_size
> +.br
> +.I /sys/kernel/config/dlm/cluster/dirtbl_size
> +.br
> +.I /sys/kernel/config/dlm/cluster/rsbtbl_size
> +.br
> +.I /sys/kernel/debug/gfs2/
> +.br
> +.I /sys/kernel/debug/dlm/
> +.br
> +.I /proc/<int>/
> +(If /proc/1/stack does exists)
> +.br
> +.I /var/log/messages
> +.br
> +.I /var/log/cluster/
> +.br
>   .SH SEE ALSO
> -gfs2_lockanalyze(8)
> diff --git a/gfs2/scripts/gfs2_lockcapture b/gfs2/scripts/gfs2_lockcapture
> index 6a63fc8..81a0aeb 100644
> --- a/gfs2/scripts/gfs2_lockcapture
> +++ b/gfs2/scripts/gfs2_lockcapture
> @@ -1,6 +1,6 @@
>   #!/usr/bin/env python
>   """
> -The script gfs2_lockcapture will capture locking information from GFS2 file
> +The script "gfs2_lockcapture" will capture locking information from GFS2 file
>   systems and DLM.
>
>   @author    : Shane Bradley
> @@ -12,6 +12,7 @@ import sys
>   import os
>   import os.path
>   import logging
> +import logging.handlers
>   from optparse import OptionParser, Option
>   import time
>   import platform
> @@ -33,7 +34,7 @@ import tarfile
>   sure only 1 instance of this script is running at any time.
>   @type PATH_TO_PID_FILENAME: String
>   """
> -VERSION_NUMBER = "0.9-3"
> +VERSION_NUMBER = "0.9-7"
>   MAIN_LOGGER_NAME = "%s" %(os.path.basename(sys.argv[0]))
>   PATH_TO_DEBUG_DIR="/sys/kernel/debug"
>   PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
> @@ -43,7 +44,7 @@ PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
>   # #####################################################################
>   class ClusterNode:
>       """
> -    This class represents a cluster node that is a current memeber in a cluster.
> +    This class represents a cluster node that is a current member in a cluster.
>       """
>       def __init__(self, clusternodeName, clusternodeID, clusterName, mapOfMountedFilesystemLabels):
>           """
> @@ -115,7 +116,7 @@ class ClusterNode:
>           mounted GFS2 filesystems. If includeClusterName is False it will only
>           return a list of all the mounted GFS2 filesystem names(ex. mygfs2vol1).
>
> -        @return: Returns a list of all teh mounted GFS2 filesystem names.
> +        @return: Returns a list of all the mounted GFS2 filesystem names.
>           @rtype: Array
>
>           @param includeClusterName: By default this option is True and will
> @@ -134,6 +135,24 @@ class ClusterNode:
>                       listOfGFS2MountedFilesystemLabels.append(fsLabelSplit[1])
>               return listOfGFS2MountedFilesystemLabels
>
> +    def getMountedGFS2FilesystemPaths(self):
> +        """
> +        Returns a map of all the mounted GFS2 filesystem paths. The key is the
> +        GFS2 fs name(clustername:fs name) and value is the mountpoint.
> +
> +        @return: Returns a map of all the mounted GFS2 filesystem paths. The key
> +        is the GFS2 fs name(clustername:fs name) and value is the mountpoint.
> +        Returns a list of all the mounted GFS2 filesystem paths.
> +        @rtype: Map
> +        """
> +        mapOfGFS2MountedFilesystemPaths = {}
> +        for fsLabel in self.__mapOfMountedFilesystemLabels.keys():
> +            value = self.__mapOfMountedFilesystemLabels.get(fsLabel)
> +            mountPoint = value.split("type", 1)[0].split("on")[1]
> +            if (len(mountPoint) > 0):
> +                mapOfGFS2MountedFilesystemPaths[fsLabel] = mountPoint
> +        return mapOfGFS2MountedFilesystemPaths
> +
>   # #####################################################################
>   # Helper functions.
>   # #####################################################################
> @@ -328,7 +347,7 @@ def archiveData(pathToSrcDir):
>               message = "A compressed archvied file already exists and will be removed: %s" %(pathToTarFilename)
>               logging.getLogger(MAIN_LOGGER_NAME).status(message)
>               try:
> -                os.remove(PATH_TO_PID_FILENAME)
> +                os.remove(pathToTarFilename)
>               except IOError:
>                   message = "There was an error removing the file: %s." %(pathToTarFilename)
>                   logging.getLogger(MAIN_LOGGER_NAME).error(message)
> @@ -508,6 +527,32 @@ def backupOutputDirectory(pathToOutputDir):
>       # existing output directory.
>       return (not os.path.exists(pathToOutputDir))
>
> +def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
> +    """
> +    This function will attempt to mount a filesystem. If the filesystem is
> +    already mounted or the filesystem was successfully mounted then True is
> +    returned, otherwise False is returned.
> +
> +    @return: If the filesystem is already mounted or the filesystem was
> +    successfully mounted then True is returned, otherwise False is returned.
> +    @rtype: Boolean
> +
> +    @param filesystemType: The type of filesystem that will be mounted.
> +    @type filesystemType: String
> +    @param pathToDevice: The path to the device that will be mounted.
> +    @type pathToDevice: String
> +    @param pathToMountPoint: The path to the directory that will be used as the
> +    mount point for the device.
> +    @type pathToMountPoint: String
> +    """
> +    if (os.path.ismount(PATH_TO_DEBUG_DIR)):
> +        return True
> +    listOfCommandOptions = ["-t", filesystemType, pathToDevice, pathToMountPoint]
> +    if (not runCommand("mount", listOfCommandOptions)):
> +        message = "There was an error mounting the filesystem type %s for the device %s to the mount point %s." %(filesystemType, pathToDevice, pathToMountPoint)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +    return  os.path.ismount(PATH_TO_DEBUG_DIR)
> +
>   def exitScript(removePidFile=True, errorCode=0):
>       """
>       This function will cause the script to exit or quit. It will return an error
> @@ -615,6 +660,89 @@ def getClusterNode(listOfGFS2Names):
>       else:
>           return None
>
> +
> +def getDLMToolDLMLockspaces():
> +    """
> +    This function returns the names of all the dlm lockspace names found with the
> +    command: "dlm_tool ls".
> +
> +    @return: A list of all the dlm lockspace names.
> +    @rtype: Array
> +    """
> +    dlmLockspaces = []
> +    stdout = runCommandOutput("dlm_tool", ["ls"])
> +    if (not stdout == None):
> +        stdout = stdout.replace("dlm lockspaces\n", "")
> +        dlmToolLSKeys = ["name", "id", "flags", "change", "members"]
> +        # Split on newlines
> +        stdoutSections = stdout.split("\n\n")
> +        for section in stdoutSections:
> +            # Create tmp map to hold data
> +            dlmToolLSMap = dict.fromkeys(dlmToolLSKeys)
> +            lines = section.split("\n")
> +            for line in lines:
> +                for dlmToolLSKey in dlmToolLSMap.keys():
> +                    if (line.startswith(dlmToolLSKey)):
> +                        value = line.replace(dlmToolLSKey, " ", 1).strip().rstrip()
> +                        dlmToolLSMap[dlmToolLSKey] = value
> +                if ((not dlmToolLSMap.get("name") == None) and (not dlmToolLSMap.get("id") == None)):
> +                    dlmLockspaces.append(dlmToolLSMap.get("name"))
> +    return dlmLockspaces
> +
> +def getGroupToolDLMLockspaces():
> +    """
> +    This function returns the names of all the dlm lockspace names found with the
> +    command: "group_tool ls".
> +
> +    @return: A list of all the dlm lockspace names.
> +    @rtype: Array
> +    """
> +    dlmLockspaces = []
> +    stdout = runCommandOutput("group_tool", ["ls"])
> +    if (not stdout == None):
> +        lines = stdout.split("\n")
> +        for line in lines:
> +            if (line.startswith("dlm")):
> +                dlmLockspaces.append(line.split()[2])
> +    return dlmLockspaces
> +
> +def getDLMLockspaces():
> +    """
> +    Returns a list of the dlm lockspace names.
> +
> +    @return: Returns a list of dlm lockspace names.
> +    @rtype: Array
> +    """
> +    message = "Gathering the DLM Lockspace Names."
> +    logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> +    dlmLockspaces = getDLMToolDLMLockspaces()
> +    if (not len(dlmLockspaces) > 0):
> +        dlmLockspaces = getGroupToolDLMLockspaces()
> +    return dlmLockspaces
> +
> +def getVerifiedDLMLockspaceNames(lockspaceNames):
> +    """
> +    Returns a list of DLM lockspaces that have been verified to exists in the
> +    command output of $(dlm_tool ls).
> +
> +    @return: Returns a list of DLM lockspaces that have been verified to exists
> +    in the command output of $(dlm_tool ls).
> +    @rtype: Array
> +
> +    @param lockspaceNames: This is the list of DLM lockspaces that will have
> +    their debug directory copied.
> +    @type lockspaceNames: Array
> +    """
> +    # Get a list of all the DLM lockspaces names.
> +    dlmLockspaces = getDLMLockspaces()
> +    # Verify the lockspaceNames are lockspaces that exist.
> +    verifiedLockspaceNames = []
> +    for lockspaceName in lockspaceNames:
> +        if ((lockspaceName in dlmLockspaces) and
> +            (not lockspaceName in verifiedLockspaceNames)):
> +            verifiedLockspaceNames.append(lockspaceName)
> +    return verifiedLockspaceNames
> +
>   def getMountedGFS2Filesystems():
>       """
>       This function returns a list of all the mounted GFS2 filesystems.
> @@ -659,32 +787,9 @@ def getLabelMapForMountedFilesystems(clusterName, listOfMountedFilesystems):
>                   mapOfMountedFilesystemLabels[fsLabel] = mountedFilesystem
>       return mapOfMountedFilesystemLabels
>
> -def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
> -    """
> -    This function will attempt to mount a filesystem. If the filesystem is
> -    already mounted or the filesystem was successfully mounted then True is
> -    returned, otherwise False is returned.
> -
> -    @return: If the filesystem is already mounted or the filesystem was
> -    successfully mounted then True is returned, otherwise False is returned.
> -    @rtype: Boolean
> -
> -    @param filesystemType: The type of filesystem that will be mounted.
> -    @type filesystemType: String
> -    @param pathToDevice: The path to the device that will be mounted.
> -    @type pathToDevice: String
> -    @param pathToMountPoint: The path to the directory that will be used as the
> -    mount point for the device.
> -    @type pathToMountPoint: String
> -    """
> -    if (os.path.ismount(PATH_TO_DEBUG_DIR)):
> -        return True
> -    listOfCommandOptions = ["-t", filesystemType, pathToDevice, pathToMountPoint]
> -    if (not runCommand("mount", listOfCommandOptions)):
> -        message = "There was an error mounting the filesystem type %s for the device %s to the mount point %s." %(filesystemType, pathToDevice, pathToMountPoint)
> -        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> -    return  os.path.ismount(PATH_TO_DEBUG_DIR)
> -
> +# #####################################################################
> +# Gather output from command functions
> +# #####################################################################
>   def gatherGeneralInformation(pathToDSTDir):
>       """
>       This function will gather general information about the cluster and write
> @@ -712,7 +817,15 @@ def gatherGeneralInformation(pathToDSTDir):
>       pathToSrcFile = "/proc/slabinfo"
>       copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
>
> +    # Copy the DLM hash table sizes:
> +    pathToHashTableFiles = ["/sys/kernel/config/dlm/cluster/lkbtbl_size", "/sys/kernel/config/dlm/cluster/dirtbl_size",
> +                            "/sys/kernel/config/dlm/cluster/rsbtbl_size"]
> +    for pathToSrcFile in pathToHashTableFiles:
> +        if (os.path.exists(pathToSrcFile)):
> +            copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
> +
>       # Get "ps -eo user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan" data.
> +    # Get " ps h -AL -o tid,s,cmd
>       command = "ps"
>       pathToCommandOutput = os.path.join(pathToDSTDir, "ps_hALo-tid.s.cmd")
>       try:
> @@ -721,7 +834,29 @@ def gatherGeneralInformation(pathToDSTDir):
>           runCommand(command, ["h", "-AL", "-o", "tid,s,cmd"], standardOut=fout)
>           fout.close()
>       except IOError:
> -        message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
> +        message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +
> +    # Get df -h ouput
> +    command = "df"
> +    pathToCommandOutput = os.path.join(pathToDSTDir, "df-h.cmd")
> +    try:
> +        fout = open(pathToCommandOutput, "w")
> +        runCommand(command, ["-h"], standardOut=fout)
> +        fout.close()
> +    except IOError:
> +        message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +
> +    # Get lsof ouput
> +    command = "lsof"
> +    pathToCommandOutput = os.path.join(pathToDSTDir, "lsof.cmd")
> +    try:
> +        fout = open(pathToCommandOutput, "w")
> +        runCommand(command, [], standardOut=fout)
> +        fout.close()
> +    except IOError:
> +        message = "There was an error writing the command output for %s to the file %s." %(command, pathToCommandOutput)
>           logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
>       # Write the status of all the nodes in the cluster out.
> @@ -746,7 +881,9 @@ def gatherGeneralInformation(pathToDSTDir):
>               message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
>               logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> -
> +# #####################################################################
> +# Gather Process Information
> +# #####################################################################
>   def isProcPidStackEnabled(pathToPidData):
>       """
>       Returns true if the init process has the file "stack" in its pid data
> @@ -810,6 +947,9 @@ def triggerSysRQEvents():
>               message = "There was an error writing the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
>               logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> +# #####################################################################
> +# Gather lockdumps and logs
> +# #####################################################################
>   def gatherLogs(pathToDSTDir):
>       """
>       This function will copy all the cluster logs(/var/log/cluster) and the
> @@ -828,29 +968,46 @@ def gatherLogs(pathToDSTDir):
>           pathToDSTLogDir = os.path.join(pathToDSTDir, os.path.basename(pathToLogDir))
>           copyDirectory(pathToLogDir, pathToDSTDir)
>
> -def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
> +def gatherDLMLockDumps(pathToDSTDir, lockspaceNames):
>       """
> -    This function copies the debug files for dlm for a GFS2 filesystem in the
> -    list to a directory. The list of GFS2 filesystems will only include the
> -    filesystem name for each item in the list. For example: "mygfs2vol1"
> +    This function copies all the debug files for dlm and sorts them into their
> +    own directory based on name of dlm lockspace.
>
>       @param pathToDSTDir: This is the path to directory where the files will be
>       copied to.
>       @type pathToDSTDir: String
> -    @param listOfGFS2Filesystems: This is the list of the GFS2 filesystems that
> -    will have their debug directory copied.
> -    @type listOfGFS2Filesystems: Array
> +    @param lockspaceNames: This is the list of DLM lockspaces that will have
> +    their debug directory copied.
> +    @type lockspaceNames: Array
>       """
> +    # This function assumes that verifiedLockspaceNames has already been called
> +    # to verify the lockspace does exist.
>       lockDumpType = "dlm"
>       pathToSrcDir = os.path.join(PATH_TO_DEBUG_DIR, lockDumpType)
>       pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
>       message = "Copying the files in the %s lockdump data directory %s." %(lockDumpType.upper(), pathToSrcDir)
>       logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> -    for filename in os.listdir(pathToSrcDir):
> -        for name in listOfGFS2Filesystems:
> -            if (filename.startswith(name)):
> -                copyFile(os.path.join(pathToSrcDir, filename),
> -                         os.path.join(os.path.join(pathToOutputDir, name), filename))
> +
> +    # Get list of all the dlm lockspaces
> +    if (os.path.exists(pathToSrcDir)):
> +        for filename in os.listdir(pathToSrcDir):
> +            for lockspaceName in lockspaceNames:
> +                if (filename.startswith(lockspaceName)):
> +                    copyFile(os.path.join(pathToSrcDir, filename),
> +                             os.path.join(os.path.join(pathToOutputDir, lockspaceName), filename))
> +
> +    # Run dlm_tool lockdebug against the lockspace names and write to file.
> +    for lockspaceName in lockspaceNames:
> +        dstDir = os.path.join(pathToOutputDir, lockspaceName)
> +        if (mkdirs(dstDir)):
> +            pathToCommandOutput = os.path.join(dstDir,"%s_lockdebug" %(lockspaceName))
> +            try:
> +                fout = open(pathToCommandOutput, "w")
> +                runCommand("dlm_tool", ["lockdebug", "-v", "-s", "-w", lockspaceName], standardOut=fout)
> +                fout.close()
> +            except IOError:
> +                message = "There was an error writing the command output to the file %s." %(pathToCommandOutput)
> +                logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
>   def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>       """
> @@ -875,6 +1032,8 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>       pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
>       # The number of files that were copied
>       fileCopiedCount = 0
> +    if (not os.path.exists(pathToSrcDir)):
> +        return False
>       for dirName in os.listdir(pathToSrcDir):
>           pathToCurrentDir = os.path.join(pathToSrcDir, dirName)
>           if ((os.path.isdir(pathToCurrentDir)) and (dirName in listOfGFS2Filesystems)):
> @@ -886,6 +1045,7 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>       # If the number of files(not directories) copied was greater than zero then files were copied
>       # succesfully.
>       return (fileCopiedCount > 0)
> +
>   # ##############################################################################
>   # Get user selected options
>   # ##############################################################################
> @@ -922,12 +1082,12 @@ def __getOptions(version) :
>       cmdParser.add_option("-i", "--info",
>                            action="store_true",
>                            dest="enablePrintInfo",
> -                         help="prints information about the mounted GFS2 file systems",
> +                         help="prints information about the mounted GFS2 file-systems",
>                            default=False)
> -    cmdParser.add_option("-t", "--archive",
> +    cmdParser.add_option("-P", "--disable_process_gather",
>                            action="store_true",
> -                         dest="enableArchiveOutputDir",
> -                         help="the output directory will be archived(tar) and compressed(.bz2)",
> +                         dest="disableProcessGather",
> +                         help="the gathering of process information will be disabled",
>                            default=False)
>       cmdParser.add_option("-o", "--path_to_output_dir",
>                            action="store",
> @@ -939,21 +1099,21 @@ def __getOptions(version) :
>       cmdParser.add_option("-r", "--num_of_runs",
>                            action="store",
>                            dest="numberOfRuns",
> -                         help="number of runs capturing the lockdump data",
> +                         help="number of runs capturing the lockdump data(default: 3 runs)",
>                            type="int",
>                            metavar="<number of runs>",
> -                         default=2)
> +                         default=3)
>       cmdParser.add_option("-s", "--seconds_sleep",
>                            action="store",
>                            dest="secondsToSleep",
> -                         help="number of seconds to sleep between runs of capturing the lockdump data",
> +                         help="number of seconds to sleep between runs of capturing the lockdump data(default: 120 seconds)",
>                            type="int",
>                            metavar="<seconds to sleep>",
>                            default=120)
>       cmdParser.add_option("-n", "--fs_name",
>                            action="extend",
>                            dest="listOfGFS2Names",
> -                         help="name of the GFS2 filesystem(s) that will have their lockdump data captured",
> +                         help="name of the GFS2 filesystem(s) that will have their lockdump data captured(default: all GFS2 file-systems will be captured)",
>                            type="string",
>                            metavar="<name of GFS2 filesystem>",
>                            default=[])
> @@ -994,14 +1154,15 @@ class OptionParserExtended(OptionParser):
>
>           examplesMessage += "\nIt will do 3 runs of gathering the lockdump information in 10 second intervals for only the"
>           examplesMessage += "\nGFS2 filesystems with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress"
> -        examplesMessage += "\nthe data collected. All of the lockdump data will be written to the directory: "
> -        examplesMessage += "\n/tmp/2012-11-12_095556-gfs2_lockcapture and all the questions will be answered with yes.\n"
> -        examplesMessage += "\n# %s -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1 -o /tmp/2012-11-12_095556-gfs2_lockcapture -y\n" %(self.__commandName)
> +        examplesMessage += "\nthe data collected in the output directory:"
> +        examplesMessage += "\n/tmp/cluster42-gfs2_lockcapture and all the questions will be answered with yes.\n"
> +        examplesMessage += "\n# %s -r 3 -s 10 -n myGFS2vol2,myGFS2vol1 -o /tmp/cluster42-gfs2_lockcapture -y\n" %(self.__commandName)
>
>           examplesMessage += "\nIt will do 2 runs of gathering the lockdump information in 25 second intervals for all the"
> -        examplesMessage += "\nmounted GFS2 filesystems. Then it will archive and compress the data collected. All of the"
> -        examplesMessage += "\nlockdump data will be written to the directory: /tmp/2012-11-12_095556-gfs2_lockcapture.\n"
> -        examplesMessage += "\n# %s -r 2 -s 25 -t -o /tmp/2012-11-12_095556-gfs2_lockcapture\n" %(self.__commandName)
> +        examplesMessage += "\nmounted GFS2 filesystems. The gathering process data will be disabled. Then it will archive and compress"
> +        examplesMessage += "\nthe data collected in the output directory:"
> +        examplesMessage += "\n/tmp/cluster42-gfs2_lockcapture and all the questions will be answered with yes.\n"
> +        examplesMessage += "\n# %s -r 2 -s 25 -P -o /tmp/cluster42-gfs2_lockcapture\n" %(self.__commandName)
>           OptionParser.print_help(self)
>           print examplesMessage
>
> @@ -1073,6 +1234,14 @@ if __name__ == "__main__":
>           # Create a new status function and level.
>           logging.STATUS = logging.INFO + 2
>           logging.addLevelName(logging.STATUS, "STATUS")
> +
> +        # Log to main system logger that script has started then close the
> +        # handler before the other handlers are created.
> +        sysLogHandler = logging.handlers.SysLogHandler(address = '/dev/log')
> +        logger.addHandler(sysLogHandler)
> +        logger.info("Capturing of the data to analyze GFS2 lockdumps.")
> +        logger.removeHandler(sysLogHandler)
> +
>           # Create a function for the STATUS_LEVEL since not defined by python. This
>           # means you can call it like the other predefined message
>           # functions. Example: logging.getLogger("loggerName").status(message)
> @@ -1128,7 +1297,6 @@ if __name__ == "__main__":
>                       message += " %s" %(name)
>                   message += "."
>               logging.getLogger(MAIN_LOGGER_NAME).error(message)
> -            exitScript(removePidFile=True, errorCode=1)
>           if (cmdLineOpts.enablePrintInfo):
>               logging.disable(logging.CRITICAL)
>               print "List of all the mounted GFS2 filesystems that can have their lockdump data captured:"
> @@ -1231,27 +1399,48 @@ if __name__ == "__main__":
>               # Going to sleep for 2 seconds, so that TIMESTAMP should be in the
>               # past in the logs so that capturing sysrq data will be guaranteed.
>               time.sleep(2)
> -            # Gather the backtraces for all the pids, by grabbing the /proc/<pid
> -            # number> or triggering sysrq events to capture task bask traces
> -            # from log.
> -            message = "Pass (%d/%d): Triggering the sysrq events for the host." %(i, cmdLineOpts.numberOfRuns)
> -            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> -            # Gather the data in the /proc/<pid> directory if the file
> -            # </proc/<pid>/stack exists. If file exists we will not trigger
> -            # sysrq events.
> -            pathToPidData = "/proc"
> -            if (isProcPidStackEnabled(pathToPidData)):
> -                gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
> -            else:
> -                triggerSysRQEvents()
> +
> +            # If enabled then gather the process data.
> +            if (not cmdLineOpts.disableProcessGather):
> +                # Gather the backtraces for all the pids, by grabbing the /proc/<pid
> +                # number> or triggering sysrq events to capture task bask traces
> +                # from log.
> +                # Gather the data in the /proc/<pid> directory if the file
> +                # </proc/<pid>/stack exists. If file exists we will not trigger
> +                # sysrq events.
> +
> +                # Should I gather anyhow and only capture sysrq if needed.
> +                pathToPidData = "/proc"
> +                if (isProcPidStackEnabled(pathToPidData)):
> +                    message = "Pass (%d/%d): Triggering the capture of all pid directories in %s." %(i, cmdLineOpts.numberOfRuns, pathToPidData)
> +                    logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> +                    gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
> +                else:
> +                    message = "Pass (%d/%d): Triggering the sysrq events for the host since stack was not captured in pid directory." %(i, cmdLineOpts.numberOfRuns)
> +                    logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> +                    triggerSysRQEvents()
> +
> +            # #######################################################################
> +            # Gather the DLM data and lock-dumps
> +            # #######################################################################
> +            # Gather data for the  DLM lockspaces that are found.
> +            lockspaceNames = clusternode.getMountedGFS2FilesystemNames(includeClusterName=False)
> +            # In addition always gather these lockspaces(if they exist).
> +            lockspaceNames.append("clvmd")
> +            lockspaceNames.append("rgmanager")
> +            # Verify that these lockspace names exist.
> +            lockspaceNames = getVerifiedDLMLockspaceNames(lockspaceNames)
>               # Gather the dlm locks.
> -            lockDumpType = "dlm"
> -            message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> +            message = "Pass (%d/%d): Gathering the DLM lock-dumps for the host." %(i, cmdLineOpts.numberOfRuns)
>               logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> -            gatherDLMLockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames(includeClusterName=False))
> +            # Add other notable lockspace names that should be captured if they exist.
> +            gatherDLMLockDumps(pathToOutputRunDir, lockspaceNames)
> +
> +            # #######################################################################
> +            # Gather the GFS2 data and lock-dumps
> +            # #######################################################################
>               # Gather the glock locks from gfs2.
> -            lockDumpType = "gfs2"
> -            message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> +            message = "Pass (%d/%d): Gathering the GFS2 lock-dumps for the host." %(i, cmdLineOpts.numberOfRuns)
>               logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>               if(gatherGFS2LockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames())):
>                   exitCode = 0
> @@ -1274,16 +1463,21 @@ if __name__ == "__main__":
>           # #######################################################################
>           message = "All the files have been gathered and this directory contains all the captured data: %s" %(pathToOutputDir)
>           logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -        if (cmdLineOpts.enableArchiveOutputDir):
> -            message = "The lockdump data will now be archived. This could some time depending on the size of the data collected."
> +        message = "The lockdump data will now be archive. This could some time depending on the size of the data collected."
> +        logging.getLogger(MAIN_LOGGER_NAME).info(message)
> +        pathToTarFilename = archiveData(pathToOutputDir)
> +        if (os.path.exists(pathToTarFilename)):
> +            message = "The compressed archvied file was created: %s" %(pathToTarFilename)
>               logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -            pathToTarFilename = archiveData(pathToOutputDir)
> -            if (os.path.exists(pathToTarFilename)):
> -                message = "The compressed archvied file was created: %s" %(pathToTarFilename)
> -                logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -            else:
> -                message = "The compressed archvied failed to be created: %s" %(pathToTarFilename)
> +            # Do some cleanup by removing the directory of the data if file archived file was created.
> +            try:
> +                shutil.rmtree(pathToOutputDir)
> +            except OSError:
> +                message = "There was an error removing the directory: %s." %(pathToOutputDir)
>                   logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        else:
> +            message = "The compressed archvied failed to be created: %s" %(pathToTarFilename)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
>           # #######################################################################
>       except KeyboardInterrupt:
>           print ""
>



  reply	other threads:[~2013-06-06  9:09 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-05 19:49 [Cluster-devel] [PATCH] gfs2_lockcapture: Added option to disable process data gathering, added gathering of dlm_tool lockdebug, df, lsof, DLM hash table sizes sbradley
2013-06-06  9:09 ` Andrew Price [this message]
2013-06-06 12:26   ` Shane Bradley

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B051AC.7030401@redhat.com \
    --to=anprice@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).