[Cluster-devel] [PATCH] gfs2-lockcapture: Modified some of the data gathered

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH] gfs2-lockcapture: Modified some of the data gathered
@ 2012-12-13 15:14 sbradley
  2012-12-14 13:37 ` Andrew Price
  0 siblings, 1 reply; 2+ messages in thread
From: sbradley @ 2012-12-13 15:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

From: sbradley <sbradley@redhat.com>

Changed some var names in host data collected, added /proc/<pid>/ to files
collected, and added man page.

Signed-off-by: shane bradley <sbradley@redhat.com>
---
 gfs2/lockcapture/gfs2_lockcapture | 465 +++++++++++++++++++++++++-------------
 gfs2/man/Makefile.am              |   3 +-
 gfs2/man/gfs2_lockcapture.8       |  53 +++++
 3 files changed, 364 insertions(+), 157 deletions(-)
 create mode 100644 gfs2/man/gfs2_lockcapture.8

diff --git a/gfs2/lockcapture/gfs2_lockcapture b/gfs2/lockcapture/gfs2_lockcapture
index a930a2f..1a64188 100644
--- a/gfs2/lockcapture/gfs2_lockcapture
+++ b/gfs2/lockcapture/gfs2_lockcapture
@@ -1,9 +1,7 @@
 #!/usr/bin/env python
 """
-This script will gather GFS2 glocks and dlm lock dump information for a cluster
-node. The script can get all the mounted GFS2 filesystem data or set of selected
-GFS2 filesystems. The script will also gather some general information about the
-system.
+The script gfs2_lockcapture will capture locking information from GFS2 file
+systems and DLM.
 
 @author    : Shane Bradley
 @contact   : sbradley at redhat.com
@@ -35,7 +33,7 @@ import tarfile
 sure only 1 instance of this script is running at any time.
 @type PATH_TO_PID_FILENAME: String
 """
-VERSION_NUMBER = "0.9-1"
+VERSION_NUMBER = "0.9-2"
 MAIN_LOGGER_NAME = "%s" %(os.path.basename(sys.argv[0]))
 PATH_TO_DEBUG_DIR="/sys/kernel/debug"
 PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
@@ -313,7 +311,7 @@ def archiveData(pathToSrcDir):
     @type pathToSrcDir: String
     """
     if (os.path.exists(pathToSrcDir)):
-        pathToTarFilename = "%s.tar.bz2" %(pathToSrcDir)
+        pathToTarFilename = "%s-%s.tar.bz2" %(pathToSrcDir, platform.node())
         if (os.path.exists(pathToTarFilename)):
             message = "A compressed archvied file already exists and will be removed: %s" %(pathToTarFilename)
             logging.getLogger(MAIN_LOGGER_NAME).status(message)
@@ -337,6 +335,127 @@ def archiveData(pathToSrcDir):
             return pathToTarFilename
     return ""
 
+def getDataFromFile(pathToSrcFile) :
+    """
+    This function will return the data in an array. Where each newline in file
+    is a seperate item in the array. This should really just be used on
+    relatively small files.
+
+    None is returned if no file is found.
+
+    @return: Returns an array of Strings, where each newline in file is an item
+    in the array.
+    @rtype: Array
+
+    @param pathToSrcFile: The path to the file which will be read.
+    @type pathToSrcFile: String
+    """
+    if (len(pathToSrcFile) > 0) :
+        try:
+            fin = open(pathToSrcFile, "r")
+            data = fin.readlines()
+            fin.close()
+            return data
+        except (IOError, os.error):
+            message = "An error occured reading the file: %s." %(pathToSrcFile)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+    return None
+
+def copyFile(pathToSrcFile, pathToDstFile):
+    """
+    This function will copy a src file to dst file.
+
+    @return: Returns True if the file was copied successfully.
+    @rtype: Boolean
+
+    @param pathToSrcFile: The path to the source file that will be copied.
+    @type pathToSrcFile: String
+    @param pathToDstFile: The path to the destination of the file.
+    @type pathToDstFile: String
+    """
+    if(not os.path.exists(pathToSrcFile)):
+        message = "The file does not exist with the path: %s." %(pathToSrcFile)
+        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        return False
+    elif (not os.path.isfile(pathToSrcFile)):
+        message = "The path to the source file is not a regular file: %s." %(pathToSrcFile)
+        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        return False
+    elif (pathToSrcFile == pathToDstFile):
+        message = "The path to the source file and path to destination file cannot be the same: %s." %(pathToDstFile)
+        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        return False
+    else:
+        # Create the directory structure if it does not exist.
+        (head, tail) = os.path.split(pathToDstFile)
+        if (not mkdirs(head)) :
+            # The path to the directory was not created so file
+            # could not be copied.
+            return False
+        # Copy the file to the dst path.
+        try:
+            shutil.copy(pathToSrcFile, pathToDstFile)
+        except shutil.Error:
+            message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            return False
+        except OSError:
+            message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            return False
+        except IOError:
+            message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            return False
+        return (os.path.exists(pathToDstFile))
+
+def copyDirectory(pathToSrcDir, pathToDstDir):
+    """
+    This function will copy a src dir to dst dir.
+
+    @return: Returns True if the dir was copied successfully.
+    @rtype: Boolean
+
+    @param pathToSrcDir: The path to the source dir that will be copied.
+    @type pathToSrcDir: String
+    @param pathToDstDir: The path to the destination of the dir.
+    @type pathToDstDir: String
+    """
+    if(not os.path.exists(pathToSrcDir)):
+        message = "The directory does not exist with the path: %s." %(pathToSrcDir)
+        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        return False
+    elif (not os.path.isdir(pathToSrcDir)):
+        message = "The path to the source directory is not a directory: %s." %(pathToSrcDir)
+        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        return False
+    elif (pathToSrcDir == pathToDstDir):
+        message = "The path to the source directory and path to destination directory cannot be the same: %s." %(pathToDstDir)
+        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        return False
+    else:
+        if (not mkdirs(pathToDstDir)) :
+            # The path to the directory was not created so file
+            # could not be copied.
+            return False
+        # Copy the file to the dst path.
+        dst = os.path.join(pathToDstDir, os.path.basename(pathToSrcDir))
+        try:
+            shutil.copytree(pathToSrcDir, dst)
+        except shutil.Error:
+            message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            return False
+        except OSError:
+            message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            return False
+        except IOError:
+            message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            return False
+        return (os.path.exists(dst))
+
 def backupOutputDirectory(pathToOutputDir):
     """
     This function will return True if the pathToOutputDir does not exist or the
@@ -464,8 +583,8 @@ def getClusterNode(listOfGFS2Names):
         if (len(listOfGFS2Names) > 0):
             for label in mapOfMountedFilesystemLabels.keys():
                 foundMatch = False
-                for name in listOfGFS2Names:
-                    if ((name == label) or ("%s:%s"%(clusterName, name) == label)):
+                for gfs2FSName in listOfGFS2Names:
+                    if ((gfs2FSName == label) or ("%s:%s"%(clusterName, gfs2FSName) == label)):
                         foundMatch = True
                         break
                 if ((not foundMatch) and (mapOfMountedFilesystemLabels.has_key(label))):
@@ -518,33 +637,6 @@ def getLabelMapForMountedFilesystems(clusterName, listOfMountedFilesystems):
                 mapOfMountedFilesystemLabels[fsLabel] = mountedFilesystem
     return mapOfMountedFilesystemLabels
 
-def verifyDebugFilesystemMounted(enableMounting=True):
-    """
-    This function verifies that the debug filesystem is mounted. If the debug
-    filesystem is mounted then True is returned, otherwise False is returned.
-
-    @return: If the debug filesystem is mounted then True is returned, otherwise
-    False is returned.
-    @rtype: Boolean
-
-    @param enableMounting: If True then the debug filesystem will be mounted if
-    it is currently not mounted.
-    @type enableMounting: Boolean
-    """
-    if (os.path.ismount(PATH_TO_DEBUG_DIR)):
-        message = "The debug filesystem %s is mounted." %(PATH_TO_DEBUG_DIR)
-        logging.getLogger(MAIN_LOGGER_NAME).info(message)
-        return True
-    else:
-        message = "The debug filesystem %s is not mounted." %(PATH_TO_DEBUG_DIR)
-        logging.getLogger(MAIN_LOGGER_NAME).warning(message)
-        if (cmdLineOpts.enableMountDebugFS):
-            if(mountFilesystem("debugfs", "none", PATH_TO_DEBUG_DIR)):
-                message = "The debug filesystem was mounted: %s." %(PATH_TO_DEBUG_DIR)
-                logging.getLogger(MAIN_LOGGER_NAME).info(message)
-                return True
-    return False
-
 def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
     """
     This function will attempt to mount a filesystem. If the filesystem is
@@ -583,29 +675,24 @@ def gatherGeneralInformation(pathToDSTDir):
     @type pathToDSTDir: String
     """
     # Gather some general information and write to system.txt.
-    systemString = "HOSTNAME: %s\nDATE: %s\n" %(platform.node(), time.strftime("%Y-%m-%d_%H:%M:%S"))
-    stdout = runCommandOutput("uname", ["-a"])
+    systemString = "HOSTNAME=%s\nTIMESTAMP=%s\n" %(platform.node(), time.strftime("%Y-%m-%d %H:%M:%S"))
+    stdout = runCommandOutput("uname", ["-a"]).strip().rstrip()
     if (not stdout == None):
-        systemString += "UNAME-A: %s\n" %(stdout)
-    stdout = runCommandOutput("uptime", [])
+        systemString += "UNAMEA=%s\n" %(stdout)
+    stdout = runCommandOutput("uptime", []).strip().rstrip()
     if (not stdout == None):
-        systemString += "UPTIME: %s\n" %(stdout)
-    writeToFile(os.path.join(pathToDSTDir, "system.txt"), systemString, createFile=True)
+        systemString += "UPTIME=%s" %(stdout)
+    writeToFile(os.path.join(pathToDSTDir, "hostinformation.txt"), systemString, createFile=True)
 
-    # Get "mount -l" filesystem data.
-    command = "cat"
-    pathToCommandOutput = os.path.join(pathToDSTDir, "cat-proc_mounts.txt")
-    try:
-        fout = open(pathToCommandOutput, "w")
-        runCommand(command, ["/proc/mounts"], standardOut=fout)
-        fout.close()
-    except IOError:
-        message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
-        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+    # Copy misc files
+    pathToSrcFile = "/proc/mounts"
+    copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
+    pathToSrcFile = "/proc/slabinfo"
+    copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
 
     # Get "ps -eo user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan" data.
     command = "ps"
-    pathToCommandOutput = os.path.join(pathToDSTDir, "ps.txt")
+    pathToCommandOutput = os.path.join(pathToDSTDir, "ps_hALo-tid.s.cmd")
     try:
         fout = open(pathToCommandOutput, "w")
         #runCommand(command, ["-eo", "user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan"], standardOut=fout)
@@ -615,6 +702,48 @@ def gatherGeneralInformation(pathToDSTDir):
         message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
         logging.getLogger(MAIN_LOGGER_NAME).error(message)
 
+
+def isProcPidStackEnabled(pathToPidData):
+    """
+    Returns true if the init process has the file "stack" in its pid data
+    directory which contains the task functions for that process.
+
+    @return: Returns true if the init process has the file "stack" in its pid
+    data directory which contains the task functions for that process.
+    @rtype: Boolean
+
+    @param pathToPidData: The path to the directory where all the pid data
+    directories are located.
+    @type pathToPidData: String
+    """
+    return os.path.exists(os.path.join(pathToPidData, "1/stack"))
+
+def gatherPidData(pathToPidData, pathToDSTDir):
+    """
+    This command will gather all the directories which contain data about all the pids.
+
+    @return: Returns a list of paths to the directory that contains the
+    information about the pid.
+    @rtype: Array
+
+    @param pathToPidData: The path to the directory where all the pid data
+    directories are located.
+    @type pathToPidData: String
+    """
+    # Status has: command name, pid, ppid, state, possibly registers
+    listOfFilesToCopy = ["cmdline", "stack", "status"]
+    listOfPathToPidsData = []
+    if (os.path.exists(pathToPidData)):
+        for srcFilename in os.listdir(pathToPidData):
+            pathToPidDirDST = os.path.join(pathToDSTDir, srcFilename)
+            if (srcFilename.isdigit()):
+                pathToSrcDir = os.path.join(pathToPidData, srcFilename)
+                for filenameToCopy in listOfFilesToCopy:
+                    copyFile(os.path.join(pathToSrcDir, filenameToCopy), os.path.join(pathToPidDirDST, filenameToCopy))
+                if (os.path.exists(pathToPidDirDST)):
+                    listOfPathToPidsData.append(pathToPidDirDST)
+    return listOfPathToPidsData
+
 def triggerSysRQEvents():
     """
     This command will trigger sysrq events which will write the output to
@@ -626,14 +755,15 @@ def triggerSysRQEvents():
     pathToSysrqTriggerFile = "/proc/sysrq-trigger"
     # m - dump information about memory allocation
     # t - dump thread state information
-    triggers = ["m", "t"]
+    # triggers = ["m", "t"]
+    triggers = ["t"]
     for trigger in triggers:
         try:
             fout = open(pathToSysrqTriggerFile, "w")
             runCommand(command, [trigger], standardOut=fout)
             fout.close()
         except IOError:
-            message = "There was an error the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
+            message = "There was an error writing the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
             logging.getLogger(MAIN_LOGGER_NAME).error(message)
 
 def gatherLogs(pathToDSTDir):
@@ -645,24 +775,14 @@ def gatherLogs(pathToDSTDir):
     copied to.
     @type pathToDSTDir: String
     """
-    if (mkdirs(pathToDSTDir)):
-        # Copy messages logs that contain the sysrq data.
-        pathToLogFile = "/var/log/messages"
-        pathToDSTLogFile = os.path.join(pathToDSTDir, os.path.basename(pathToLogFile))
-        try:
-            shutil.copyfile(pathToLogFile, pathToDSTLogFile)
-        except shutil.Error:
-            message = "There was an error copying the file: %s to %s." %(pathToLogFile, pathToDSTLogFile)
-            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+    pathToLogFile = "/var/log/messages"
+    pathToDSTLogFile = os.path.join(pathToDSTDir, os.path.basename(pathToLogFile))
+    copyFile(pathToLogFile, pathToDSTLogFile)
 
-        pathToLogDir = "/var/log/cluster"
+    pathToLogDir = "/var/log/cluster"
+    if (os.path.exists(pathToLogDir)):
         pathToDSTLogDir = os.path.join(pathToDSTDir, os.path.basename(pathToLogDir))
-        if (os.path.isdir(pathToLogDir)):
-            try:
-                shutil.copytree(pathToLogDir, pathToDSTLogDir)
-            except shutil.Error:
-                message = "There was an error copying the directory: %s to %s." %(pathToLogDir, pathToDSTLogDir)
-                logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        copyDirectory(pathToLogDir, pathToDSTDir)
 
 def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
     """
@@ -680,23 +800,13 @@ def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
     lockDumpType = "dlm"
     pathToSrcDir = os.path.join(PATH_TO_DEBUG_DIR, lockDumpType)
     pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
-    message = "Copying the files in the %s lockdump data directory %s for the selected GFS2 filesystem with dlm debug files." %(lockDumpType.upper(), pathToSrcDir)
-    logging.getLogger(MAIN_LOGGER_NAME).status(message)
+    message = "Copying the files in the %s lockdump data directory %s." %(lockDumpType.upper(), pathToSrcDir)
+    logging.getLogger(MAIN_LOGGER_NAME).debug(message)
     for filename in os.listdir(pathToSrcDir):
         for name in listOfGFS2Filesystems:
             if (filename.startswith(name)):
-                pathToCurrentFilename = os.path.join(pathToSrcDir, filename)
-                pathToDSTDir = os.path.join(pathToOutputDir, name)
-                mkdirs(pathToDSTDir)
-                pathToDSTFilename = os.path.join(pathToDSTDir, filename)
-                try:
-                    shutil.copy(pathToCurrentFilename, pathToDSTFilename)
-                except shutil.Error:
-                    message = "There was an error copying the file: %s to %s." %(pathToCurrentFilename, pathToDSTFilename)
-                    logging.getLogger(MAIN_LOGGER_NAME).error(message)
-                except OSError:
-                    message = "There was an error copying the file: %s to %s." %(pathToCurrentFilename, pathToDSTFilename)
-                    logging.getLogger(MAIN_LOGGER_NAME).error(message)
+                copyFile(os.path.join(pathToSrcDir, filename),
+                         os.path.join(os.path.join(pathToOutputDir, name), filename))
 
 def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
     """
@@ -718,18 +828,9 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
     for dirName in os.listdir(pathToSrcDir):
         pathToCurrentDir = os.path.join(pathToSrcDir, dirName)
         if ((os.path.isdir(pathToCurrentDir)) and (dirName in listOfGFS2Filesystems)):
-            mkdirs(pathToOutputDir)
-            pathToDSTDir = os.path.join(pathToOutputDir, dirName)
-            try:
-                message = "Copying the lockdump data for the %s filesystem: %s" %(lockDumpType.upper(), dirName)
-                logging.getLogger(MAIN_LOGGER_NAME).status(message)
-                shutil.copytree(pathToCurrentDir, pathToDSTDir)
-            except shutil.Error:
-                message = "There was an error copying the directory: %s to %s." %(pathToCurrentDir, pathToDSTDir)
-                logging.getLogger(MAIN_LOGGER_NAME).error(message)
-            except OSError:
-                message = "There was an error copying the directory: %s to %s." %(pathToCurrentDir, pathToDSTDir)
-                logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            message = "Copying the lockdump data for the %s filesystem: %s" %(lockDumpType.upper(), dirName)
+            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
+            copyDirectory(pathToCurrentDir, pathToOutputDir)
 
 # ##############################################################################
 # Get user selected options
@@ -752,52 +853,57 @@ def __getOptions(version) :
     cmdParser.add_option("-d", "--debug",
                          action="store_true",
                          dest="enableDebugLogging",
-                         help="Enables debug logging.",
+                         help="enables debug logging",
                          default=False)
     cmdParser.add_option("-q", "--quiet",
                          action="store_true",
                          dest="disableLoggingToConsole",
-                         help="Disables logging to console.",
+                         help="disables logging to console",
+                         default=False)
+    cmdParser.add_option("-y", "--no_ask",
+                         action="store_true",
+                         dest="disableQuestions",
+                         help="disables all questions and assumes yes",
                          default=False)
     cmdParser.add_option("-i", "--info",
                          action="store_true",
                          dest="enablePrintInfo",
-                         help="Prints to console some basic information about the GFS2 filesystems mounted on the cluster node.",
+                         help="prints information about the mounted GFS2 file systems",
                          default=False)
-    cmdParser.add_option("-M", "--mount_debug_fs",
+    cmdParser.add_option("-t", "--archive",
                          action="store_true",
-                         dest="enableMountDebugFS",
-                         help="Enables the mounting of the debug filesystem if it is not mounted. Default is disabled.",
+                         dest="enableArchiveOutputDir",
+                         help="the output directory will be archived(tar) and compressed(.bz2)",
                          default=False)
     cmdParser.add_option("-o", "--path_to_output_dir",
                          action="store",
                          dest="pathToOutputDir",
-                         help="The path to the output directory where all the collect data will be stored. Default is /tmp/<date>-<hostname>-%s" %(os.path.basename(sys.argv[0])),
+                         help="the directory where all the collect data will be stored",
                          type="string",
+                         metavar="<output directory>",
                          default="")
     cmdParser.add_option("-r", "--num_of_runs",
                          action="store",
                          dest="numberOfRuns",
-                         help="The number of lockdumps runs to do. Default is 2.",
+                         help="number of runs capturing the lockdump data",
                          type="int",
+                         metavar="<number of runs>",
                          default=2)
     cmdParser.add_option("-s", "--seconds_sleep",
                          action="store",
                          dest="secondsToSleep",
-                         help="The number of seconds sleep between runs. Default is 120 seconds.",
+                         help="number of seconds to sleep between runs of capturing the lockdump data",
                          type="int",
+                         metavar="<seconds to sleep>",
                          default=120)
-    cmdParser.add_option("-t", "--archive",
-                         action="store_true",
-                         dest="enableArchiveOutputDir",
-                         help="Enables archiving and compressing of the output directory with tar and bzip2. Default is disabled.",
-                         default=False)
     cmdParser.add_option("-n", "--fs_name",
                          action="extend",
                          dest="listOfGFS2Names",
-                         help="List of GFS2 filesystems that will have their lockdump data gathered.",
+                         help="name of the GFS2 filesystem(s) that will have their lockdump data captured",
                          type="string",
-                         default=[])    # Get the options and return the result.
+                         metavar="<name of GFS2 filesystem>",
+                         default=[])
+ # Get the options and return the result.
     (cmdLineOpts, cmdLineArgs) = cmdParser.parse_args()
     return (cmdLineOpts, cmdLineArgs)
 
@@ -817,7 +923,7 @@ class OptionParserExtended(OptionParser):
         self.__commandName = os.path.basename(sys.argv[0])
         versionMessage = "%s %s\n" %(self.__commandName, version)
 
-        commandDescription  ="%s will capture information about lockdata data for GFS2 and DLM required to analyze a GFS2 filesystem.\n"%(self.__commandName)
+        commandDescription  ="%s gfs2_lockcapture will capture locking information from GFS2 file systems and DLM.\n"%(self.__commandName)
 
         OptionParser.__init__(self, option_class=ExtendOption,
                               version=versionMessage,
@@ -831,10 +937,17 @@ class OptionParserExtended(OptionParser):
         examplesMessage = "\n"
         examplesMessage = "\nPrints information about the available GFS2 filesystems that can have lockdump data captured."
         examplesMessage += "\n$ %s -i\n" %(self.__commandName)
-        examplesMessage += "\nThis command will mount the debug directory if it is not mounted. It will do 3 runs of\n"
-        examplesMessage += "gathering the lockdump information in 10 second intervals for only the GFS2 filesystems\n"
-        examplesMessage += "with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress the data collected."
-        examplesMessage += "\n$ %s -M -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1\n" %(self.__commandName)
+
+        examplesMessage += "\nIt will do 3 runs of gathering the lockdump information in 10 second intervals for only the"
+        examplesMessage += "\nGFS2 filesystems with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress"
+        examplesMessage += "\nthe data collected. All of the lockdump data will be written to the directory: "
+        examplesMessage += "\n/tmp/2012-11-12_095556-gfs2_lockcapture and all the questions will be answered with yes.\n"
+        examplesMessage += "\n$ %s -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1 -o /tmp/2012-11-12_095556-gfs2_lockcapture -y\n" %(self.__commandName)
+
+        examplesMessage += "\nIt will do 2 runs of gathering the lockdump information in 25 second intervals for all the"
+        examplesMessage += "\nmounted GFS2 filesystems. Then it will archive and compress the data collected. All of the"
+        examplesMessage += "\nlockdump data will be written to the directory: /tmp/2012-11-12_095556-gfs2_lockcapture.\n"
+        examplesMessage += "\n$ %s -r 2 -s 25 -t -o /tmp/2012-11-12_095556-gfs2_lockcapture\n" %(self.__commandName)
         OptionParser.print_help(self)
         print examplesMessage
 
@@ -869,11 +982,13 @@ class ExtendOption (Option):
         @type parser: OptionParser
         """
         if (action == "extend") :
-            valueList=[]
+            valueList = []
             try:
                 for v in value.split(","):
                     # Need to add code for dealing with paths if there is option for paths.
-                    valueList.append(v)
+                    newValue = value.strip().rstrip()
+                    if (len(newValue) > 0):
+                        valueList.append(newValue)
             except:
                 pass
             else:
@@ -912,17 +1027,10 @@ if __name__ == "__main__":
         streamHandler.setFormatter(logging.Formatter("%(levelname)s %(message)s"))
         logger.addHandler(streamHandler)
 
-        # Set the handler for writing to log file.
-        pathToLogFile = "/tmp/%s.log" %(MAIN_LOGGER_NAME)
-        if (((os.access(pathToLogFile, os.W_OK) and os.access("/tmp", os.R_OK))) or (not os.path.exists(pathToLogFile))):
-            fileHandler = logging.FileHandler(pathToLogFile)
-            fileHandler.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%d %H:%M:%S"))
-            logger.addHandler(fileHandler)
-            message = "A log file will be created or appened to: %s" %(pathToLogFile)
-            logging.getLogger(MAIN_LOGGER_NAME).info(message)
-        else:
-            message = "There was permission problem accessing the write attributes for the log file: %s." %(pathToLogFile)
-            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+        # Please note there will not be a global log file created. If a log file
+        # is needed then redirect the output. There will be a log file created
+        # for each run in the corresponding directory.
+
         # #######################################################################
         # Set the logging levels.
         # #######################################################################
@@ -949,6 +1057,26 @@ if __name__ == "__main__":
             # script running.
             writeToFile(PATH_TO_PID_FILENAME, str(os.getpid()), createFile=True)
         # #######################################################################
+        # Verify they want to continue because this script will trigger sysrq events.
+        # #######################################################################
+        if (not cmdLineOpts.disableQuestions):
+            valid = {"yes":True, "y":True, "no":False, "n":False}
+            question = "This script will trigger a sysrq -t event or collect the data for each pid directory located in /proc for each run. Are you sure you want to continue?"
+            prompt = " [y/n] "
+            while True:
+                sys.stdout.write(question + prompt)
+                choice = raw_input().lower()
+                if (choice in valid):
+                    if (valid.get(choice)):
+                        # If yes, or y then exit loop and continue.
+                        break
+                    else:
+                        message = "The script will not continue since you chose not to continue."
+                        logging.getLogger(MAIN_LOGGER_NAME).error(message)
+                        exitScript(removePidFile=True, errorCode=1)
+                else:
+                    sys.stdout.write("Please respond with '(y)es' or '(n)o'.\n")
+        # #######################################################################
         # Get the clusternode name and verify that mounted GFS2 filesystems were
         # found.
         # #######################################################################
@@ -976,8 +1104,6 @@ if __name__ == "__main__":
         # proceeding unless it is already created from a previous run data needs
         # to be analyzed. Probably could add more debugging on if file or dir.
         # #######################################################################
-        message = "The gathering of the lockdumps will be performed on the clusternode \"%s\" which is part of the cluster \"%s\"." %(clusternode.getClusterNodeName(), clusternode.getClusterName())
-        logging.getLogger(MAIN_LOGGER_NAME).info(message)
         pathToOutputDir = cmdLineOpts.pathToOutputDir
         if (not len(pathToOutputDir) > 0):
             pathToOutputDir = "%s" %(os.path.join("/tmp", "%s-%s-%s" %(time.strftime("%Y-%m-%d_%H%M%S"), clusternode.getClusterNodeName(), os.path.basename(sys.argv[0]))))
@@ -1000,56 +1126,83 @@ if __name__ == "__main__":
         # Check to see if the debug directory is mounted. If not then
         # log an error.
         # #######################################################################
-        result = verifyDebugFilesystemMounted(cmdLineOpts.enableMountDebugFS)
-        if (not result):
-            message = "Please mount the debug filesystem before running this script. For example: $ mount none -t debugfs %s" %(PATH_TO_DEBUG_DIR)
+        if(mountFilesystem("debugfs", "none", PATH_TO_DEBUG_DIR)):
+            message = "The debug filesystem %s is mounted." %(PATH_TO_DEBUG_DIR)
+            logging.getLogger(MAIN_LOGGER_NAME).info(message)
+        else:
+            message = "There was a problem mounting the debug filesystem: %s" %(PATH_TO_DEBUG_DIR)
+            logging.getLogger(MAIN_LOGGER_NAME).error(message)
+            message = "The debug filesystem is required to be mounted for this script to run."
             logging.getLogger(MAIN_LOGGER_NAME).info(message)
             exitScript(errorCode=1)
-
         # #######################################################################
         # Gather data and the lockdumps.
         # #######################################################################
-        message = "The process of gathering all the required files will begin before capturing the lockdumps."
-        logging.getLogger(MAIN_LOGGER_NAME).info(message)
-        for i in range(0,cmdLineOpts.numberOfRuns):
+        if (cmdLineOpts.numberOfRuns <= 0):
+            message = "The number of runs should be greater than zero."
+            exitScript(errorCode=1)
+        for i in range(1,(cmdLineOpts.numberOfRuns + 1)):
             # The current log count that will start@1 and not zero to make it
             # make sense in logs.
-            currentLogRunCount = (i + 1)
             # Add clusternode name under each run dir to make combining multple
             # clusternode gfs2_lockgather data together and all data in each run directory.
             pathToOutputRunDir = os.path.join(pathToOutputDir, "run%d/%s" %(i, clusternode.getClusterNodeName()))
+            # Create the the directory that will be used to capture the data.
             if (not mkdirs(pathToOutputRunDir)):
                 exitScript(errorCode=1)
-            # Gather various bits of data from the clusternode.
-            message = "Gathering some general information about the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
+            # Set the handler for writing to log file for this run.
+            currentRunFileHandler = None
+            pathToLogFile = os.path.join(pathToOutputRunDir, "%s.log" %(MAIN_LOGGER_NAME))
+            if (((os.access(pathToLogFile, os.W_OK) and os.access("/tmp", os.R_OK))) or (not os.path.exists(pathToLogFile))):
+                currentRunFileHandler = logging.FileHandler(pathToLogFile)
+                currentRunFileHandler.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%d %H:%M:%S"))
+                logging.getLogger(MAIN_LOGGER_NAME).addHandler(currentRunFileHandler)
+            message = "Pass (%d/%d): Gathering all the lockdump data." %(i, cmdLineOpts.numberOfRuns)
             logging.getLogger(MAIN_LOGGER_NAME).status(message)
+
+            # Gather various bits of data from the clusternode.
+            message = "Pass (%d/%d): Gathering general information about the host." %(i, cmdLineOpts.numberOfRuns)
+            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
             gatherGeneralInformation(pathToOutputRunDir)
-            # Trigger sysrq events to capture memory and thread information
-            message = "Triggering the sysrq events for the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
-            logging.getLogger(MAIN_LOGGER_NAME).status(message)
-            triggerSysRQEvents()
+            # Going to sleep for 2 seconds, so that TIMESTAMP should be in the
+            # past in the logs so that capturing sysrq data will be guaranteed.
+            time.sleep(2)
+            # Gather the backtraces for all the pids, by grabbing the /proc/<pid
+            # number> or triggering sysrq events to capture task bask traces
+            # from log.
+            message = "Pass (%d/%d): Triggering the sysrq events for the host." %(i, cmdLineOpts.numberOfRuns)
+            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
+            # Gather the data in the /proc/<pid> directory if the file
+            # </proc/<pid>/stack exists. If file exists we will not trigger
+            # sysrq events.
+            pathToPidData = "/proc"
+            if (isProcPidStackEnabled(pathToPidData)):
+                gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
+            else:
+                triggerSysRQEvents()
             # Gather the dlm locks.
             lockDumpType = "dlm"
-            message = "Gathering the %s lock dumps for clusternode %s for run %d/%d." %(lockDumpType.upper(), clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
-            logging.getLogger(MAIN_LOGGER_NAME).status(message)
+            message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
+            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
             gatherDLMLockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames(includeClusterName=False))
             # Gather the glock locks from gfs2.
             lockDumpType = "gfs2"
-            message = "Gathering the %s lock dumps for clusternode %s for run %d/%d." %(lockDumpType.upper(), clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
-            logging.getLogger(MAIN_LOGGER_NAME).status(message)
+            message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
+            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
             gatherGFS2LockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames())
             # Gather log files
-            message = "Gathering the log files for the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
-            logging.getLogger(MAIN_LOGGER_NAME).status(message)
+            message = "Pass (%d/%d): Gathering the log files for the host." %(i, cmdLineOpts.numberOfRuns)
+            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
             gatherLogs(os.path.join(pathToOutputRunDir, "logs"))
             # Sleep between each run if secondsToSleep is greater than or equal
             # to 0 and current run is not the last run.
-            if ((cmdLineOpts.secondsToSleep >= 0) and (i < (cmdLineOpts.numberOfRuns - 1))):
-                message = "The script will sleep for %d seconds between each run of capturing the lockdumps." %(cmdLineOpts.secondsToSleep)
+            if ((cmdLineOpts.secondsToSleep >= 0) and (i <= (cmdLineOpts.numberOfRuns))):
+                message = "The script will sleep for %d seconds between each run of capturing the lockdump data." %(cmdLineOpts.secondsToSleep)
                 logging.getLogger(MAIN_LOGGER_NAME).info(message)
-                message = "The script is sleeping before beginning the next run."
-                logging.getLogger(MAIN_LOGGER_NAME).status(message)
                 time.sleep(cmdLineOpts.secondsToSleep)
+            # Remove the handler:
+            logging.getLogger(MAIN_LOGGER_NAME).removeHandler(currentRunFileHandler)
+
         # #######################################################################
         # Archive the directory that contains all the data and archive it after
         # all the information has been gathered.
diff --git a/gfs2/man/Makefile.am b/gfs2/man/Makefile.am
index 83d6251..8655a76 100644
--- a/gfs2/man/Makefile.am
+++ b/gfs2/man/Makefile.am
@@ -7,4 +7,5 @@ dist_man_MANS		= fsck.gfs2.8 \
 			  gfs2_grow.8 \
 			  gfs2_jadd.8 \
 			  mkfs.gfs2.8 \
-			  tunegfs2.8
+			  tunegfs2.8 \
+			  gfs2_lockcapture.8 
diff --git a/gfs2/man/gfs2_lockcapture.8 b/gfs2/man/gfs2_lockcapture.8
new file mode 100644
index 0000000..854cd71
--- /dev/null
+++ b/gfs2/man/gfs2_lockcapture.8
@@ -0,0 +1,53 @@
+.TH gfs2_lockcapture 8
+
+.SH NAME
+gfs2_lockcapture \- will capture locking information from GFS2 file systems and DLM.
+
+.SH SYNOPSIS
+.B gfs2_lockcapture \fR[-dqyt]  [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 filesystem]\fP
+.PP 
+.B gfs2_lockcapture \fR[-dqyi]
+
+.SH DESCRIPTION
+\fIgfs2_lockcapture\fR is used to capture all the GFS2 lockdump data and
+corresponding DLM data. The command can be configured to capture the data
+multiple times and how much time to sleep between each iteration of capturing
+the data. By default all of the mounted GFS2 filesystems will have their data
+collected unless GFS2 filesystems are specified.
+.PP
+Please note that sysrq -t and -m events are trigger or the pid directories in /proc are 
+collected on each iteration of capturing the data.
+
+.SH OPTIONS
+.TP
+\fB-h,  --help\fP
+Prints out a short usage message and exits.
+.TP
+\fB-d,  --debug\fP
+enables debug logging.
+.TP
+\fB-q,  --quiet\fP
+disables logging to console.
+.TP
+\fB-y,  --no_ask\fP
+disables all questions and assumes yes.
+.TP
+\fB-i,  --info\fP
+prints information about the mounted GFS2 file systems.
+.TP
+\fB-t,  --archive\fP
+the output directory will be archived(tar) and compressed(.bz2).
+.TP
+\fB-o \fI<output directory>, \fB--path_to_output_dir\fR=\fI<output directory>\fP
+the directory where all the collect data will stored.
+.TP
+\fB-r \fI<number of runs>,  \fB--num_of_runs\fR=\fI<number of runs>\fP
+number of runs capturing the lockdump data.
+.TP
+\fB-s \fI<seconds to sleep>,  \fB--seconds_sleep\fR=\fI<seconds to sleep>\fP
+number of seconds to sleep between runs of capturing the lockdump data.
+.TP
+\fB-n \fI<name of GFS2 filesystem>,  \fB--fs_name\fR=\fI<name of GFS2 filesystem>\fP
+name of the GFS2 filesystem(s) that will have their lockdump data captured.
+.
+.SH SEE ALSO
-- 
1.8.0.2



^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [Cluster-devel] [PATCH] gfs2-lockcapture: Modified some of the data gathered
  2012-12-13 15:14 [Cluster-devel] [PATCH] gfs2-lockcapture: Modified some of the data gathered sbradley
@ 2012-12-14 13:37 ` Andrew Price
  0 siblings, 0 replies; 2+ messages in thread
From: Andrew Price @ 2012-12-14 13:37 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Shane,

On 13/12/12 15:14, sbradley at redhat.com wrote:
> From: sbradley <sbradley@redhat.com>

Looks good to me. Thanks for adding a manpage. Please make sure the 
authorship info is corrected before pushing.

Andy

> Changed some var names in host data collected, added /proc/<pid>/ to files
> collected, and added man page.
>
> Signed-off-by: shane bradley <sbradley@redhat.com>
> ---
>   gfs2/lockcapture/gfs2_lockcapture | 465 +++++++++++++++++++++++++-------------
>   gfs2/man/Makefile.am              |   3 +-
>   gfs2/man/gfs2_lockcapture.8       |  53 +++++
>   3 files changed, 364 insertions(+), 157 deletions(-)
>   create mode 100644 gfs2/man/gfs2_lockcapture.8
>
> diff --git a/gfs2/lockcapture/gfs2_lockcapture b/gfs2/lockcapture/gfs2_lockcapture
> index a930a2f..1a64188 100644
> --- a/gfs2/lockcapture/gfs2_lockcapture
> +++ b/gfs2/lockcapture/gfs2_lockcapture
> @@ -1,9 +1,7 @@
>   #!/usr/bin/env python
>   """
> -This script will gather GFS2 glocks and dlm lock dump information for a cluster
> -node. The script can get all the mounted GFS2 filesystem data or set of selected
> -GFS2 filesystems. The script will also gather some general information about the
> -system.
> +The script gfs2_lockcapture will capture locking information from GFS2 file
> +systems and DLM.
>
>   @author    : Shane Bradley
>   @contact   : sbradley at redhat.com
> @@ -35,7 +33,7 @@ import tarfile
>   sure only 1 instance of this script is running at any time.
>   @type PATH_TO_PID_FILENAME: String
>   """
> -VERSION_NUMBER = "0.9-1"
> +VERSION_NUMBER = "0.9-2"
>   MAIN_LOGGER_NAME = "%s" %(os.path.basename(sys.argv[0]))
>   PATH_TO_DEBUG_DIR="/sys/kernel/debug"
>   PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
> @@ -313,7 +311,7 @@ def archiveData(pathToSrcDir):
>       @type pathToSrcDir: String
>       """
>       if (os.path.exists(pathToSrcDir)):
> -        pathToTarFilename = "%s.tar.bz2" %(pathToSrcDir)
> +        pathToTarFilename = "%s-%s.tar.bz2" %(pathToSrcDir, platform.node())
>           if (os.path.exists(pathToTarFilename)):
>               message = "A compressed archvied file already exists and will be removed: %s" %(pathToTarFilename)
>               logging.getLogger(MAIN_LOGGER_NAME).status(message)
> @@ -337,6 +335,127 @@ def archiveData(pathToSrcDir):
>               return pathToTarFilename
>       return ""
>
> +def getDataFromFile(pathToSrcFile) :
> +    """
> +    This function will return the data in an array. Where each newline in file
> +    is a seperate item in the array. This should really just be used on
> +    relatively small files.
> +
> +    None is returned if no file is found.
> +
> +    @return: Returns an array of Strings, where each newline in file is an item
> +    in the array.
> +    @rtype: Array
> +
> +    @param pathToSrcFile: The path to the file which will be read.
> +    @type pathToSrcFile: String
> +    """
> +    if (len(pathToSrcFile) > 0) :
> +        try:
> +            fin = open(pathToSrcFile, "r")
> +            data = fin.readlines()
> +            fin.close()
> +            return data
> +        except (IOError, os.error):
> +            message = "An error occured reading the file: %s." %(pathToSrcFile)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +    return None
> +
> +def copyFile(pathToSrcFile, pathToDstFile):
> +    """
> +    This function will copy a src file to dst file.
> +
> +    @return: Returns True if the file was copied successfully.
> +    @rtype: Boolean
> +
> +    @param pathToSrcFile: The path to the source file that will be copied.
> +    @type pathToSrcFile: String
> +    @param pathToDstFile: The path to the destination of the file.
> +    @type pathToDstFile: String
> +    """
> +    if(not os.path.exists(pathToSrcFile)):
> +        message = "The file does not exist with the path: %s." %(pathToSrcFile)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        return False
> +    elif (not os.path.isfile(pathToSrcFile)):
> +        message = "The path to the source file is not a regular file: %s." %(pathToSrcFile)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        return False
> +    elif (pathToSrcFile == pathToDstFile):
> +        message = "The path to the source file and path to destination file cannot be the same: %s." %(pathToDstFile)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        return False
> +    else:
> +        # Create the directory structure if it does not exist.
> +        (head, tail) = os.path.split(pathToDstFile)
> +        if (not mkdirs(head)) :
> +            # The path to the directory was not created so file
> +            # could not be copied.
> +            return False
> +        # Copy the file to the dst path.
> +        try:
> +            shutil.copy(pathToSrcFile, pathToDstFile)
> +        except shutil.Error:
> +            message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            return False
> +        except OSError:
> +            message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            return False
> +        except IOError:
> +            message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            return False
> +        return (os.path.exists(pathToDstFile))
> +
> +def copyDirectory(pathToSrcDir, pathToDstDir):
> +    """
> +    This function will copy a src dir to dst dir.
> +
> +    @return: Returns True if the dir was copied successfully.
> +    @rtype: Boolean
> +
> +    @param pathToSrcDir: The path to the source dir that will be copied.
> +    @type pathToSrcDir: String
> +    @param pathToDstDir: The path to the destination of the dir.
> +    @type pathToDstDir: String
> +    """
> +    if(not os.path.exists(pathToSrcDir)):
> +        message = "The directory does not exist with the path: %s." %(pathToSrcDir)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        return False
> +    elif (not os.path.isdir(pathToSrcDir)):
> +        message = "The path to the source directory is not a directory: %s." %(pathToSrcDir)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        return False
> +    elif (pathToSrcDir == pathToDstDir):
> +        message = "The path to the source directory and path to destination directory cannot be the same: %s." %(pathToDstDir)
> +        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        return False
> +    else:
> +        if (not mkdirs(pathToDstDir)) :
> +            # The path to the directory was not created so file
> +            # could not be copied.
> +            return False
> +        # Copy the file to the dst path.
> +        dst = os.path.join(pathToDstDir, os.path.basename(pathToSrcDir))
> +        try:
> +            shutil.copytree(pathToSrcDir, dst)
> +        except shutil.Error:
> +            message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            return False
> +        except OSError:
> +            message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            return False
> +        except IOError:
> +            message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            return False
> +        return (os.path.exists(dst))
> +
>   def backupOutputDirectory(pathToOutputDir):
>       """
>       This function will return True if the pathToOutputDir does not exist or the
> @@ -464,8 +583,8 @@ def getClusterNode(listOfGFS2Names):
>           if (len(listOfGFS2Names) > 0):
>               for label in mapOfMountedFilesystemLabels.keys():
>                   foundMatch = False
> -                for name in listOfGFS2Names:
> -                    if ((name == label) or ("%s:%s"%(clusterName, name) == label)):
> +                for gfs2FSName in listOfGFS2Names:
> +                    if ((gfs2FSName == label) or ("%s:%s"%(clusterName, gfs2FSName) == label)):
>                           foundMatch = True
>                           break
>                   if ((not foundMatch) and (mapOfMountedFilesystemLabels.has_key(label))):
> @@ -518,33 +637,6 @@ def getLabelMapForMountedFilesystems(clusterName, listOfMountedFilesystems):
>                   mapOfMountedFilesystemLabels[fsLabel] = mountedFilesystem
>       return mapOfMountedFilesystemLabels
>
> -def verifyDebugFilesystemMounted(enableMounting=True):
> -    """
> -    This function verifies that the debug filesystem is mounted. If the debug
> -    filesystem is mounted then True is returned, otherwise False is returned.
> -
> -    @return: If the debug filesystem is mounted then True is returned, otherwise
> -    False is returned.
> -    @rtype: Boolean
> -
> -    @param enableMounting: If True then the debug filesystem will be mounted if
> -    it is currently not mounted.
> -    @type enableMounting: Boolean
> -    """
> -    if (os.path.ismount(PATH_TO_DEBUG_DIR)):
> -        message = "The debug filesystem %s is mounted." %(PATH_TO_DEBUG_DIR)
> -        logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -        return True
> -    else:
> -        message = "The debug filesystem %s is not mounted." %(PATH_TO_DEBUG_DIR)
> -        logging.getLogger(MAIN_LOGGER_NAME).warning(message)
> -        if (cmdLineOpts.enableMountDebugFS):
> -            if(mountFilesystem("debugfs", "none", PATH_TO_DEBUG_DIR)):
> -                message = "The debug filesystem was mounted: %s." %(PATH_TO_DEBUG_DIR)
> -                logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -                return True
> -    return False
> -
>   def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
>       """
>       This function will attempt to mount a filesystem. If the filesystem is
> @@ -583,29 +675,24 @@ def gatherGeneralInformation(pathToDSTDir):
>       @type pathToDSTDir: String
>       """
>       # Gather some general information and write to system.txt.
> -    systemString = "HOSTNAME: %s\nDATE: %s\n" %(platform.node(), time.strftime("%Y-%m-%d_%H:%M:%S"))
> -    stdout = runCommandOutput("uname", ["-a"])
> +    systemString = "HOSTNAME=%s\nTIMESTAMP=%s\n" %(platform.node(), time.strftime("%Y-%m-%d %H:%M:%S"))
> +    stdout = runCommandOutput("uname", ["-a"]).strip().rstrip()
>       if (not stdout == None):
> -        systemString += "UNAME-A: %s\n" %(stdout)
> -    stdout = runCommandOutput("uptime", [])
> +        systemString += "UNAMEA=%s\n" %(stdout)
> +    stdout = runCommandOutput("uptime", []).strip().rstrip()
>       if (not stdout == None):
> -        systemString += "UPTIME: %s\n" %(stdout)
> -    writeToFile(os.path.join(pathToDSTDir, "system.txt"), systemString, createFile=True)
> +        systemString += "UPTIME=%s" %(stdout)
> +    writeToFile(os.path.join(pathToDSTDir, "hostinformation.txt"), systemString, createFile=True)
>
> -    # Get "mount -l" filesystem data.
> -    command = "cat"
> -    pathToCommandOutput = os.path.join(pathToDSTDir, "cat-proc_mounts.txt")
> -    try:
> -        fout = open(pathToCommandOutput, "w")
> -        runCommand(command, ["/proc/mounts"], standardOut=fout)
> -        fout.close()
> -    except IOError:
> -        message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
> -        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +    # Copy misc files
> +    pathToSrcFile = "/proc/mounts"
> +    copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
> +    pathToSrcFile = "/proc/slabinfo"
> +    copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
>
>       # Get "ps -eo user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan" data.
>       command = "ps"
> -    pathToCommandOutput = os.path.join(pathToDSTDir, "ps.txt")
> +    pathToCommandOutput = os.path.join(pathToDSTDir, "ps_hALo-tid.s.cmd")
>       try:
>           fout = open(pathToCommandOutput, "w")
>           #runCommand(command, ["-eo", "user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan"], standardOut=fout)
> @@ -615,6 +702,48 @@ def gatherGeneralInformation(pathToDSTDir):
>           message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
>           logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> +
> +def isProcPidStackEnabled(pathToPidData):
> +    """
> +    Returns true if the init process has the file "stack" in its pid data
> +    directory which contains the task functions for that process.
> +
> +    @return: Returns true if the init process has the file "stack" in its pid
> +    data directory which contains the task functions for that process.
> +    @rtype: Boolean
> +
> +    @param pathToPidData: The path to the directory where all the pid data
> +    directories are located.
> +    @type pathToPidData: String
> +    """
> +    return os.path.exists(os.path.join(pathToPidData, "1/stack"))
> +
> +def gatherPidData(pathToPidData, pathToDSTDir):
> +    """
> +    This command will gather all the directories which contain data about all the pids.
> +
> +    @return: Returns a list of paths to the directory that contains the
> +    information about the pid.
> +    @rtype: Array
> +
> +    @param pathToPidData: The path to the directory where all the pid data
> +    directories are located.
> +    @type pathToPidData: String
> +    """
> +    # Status has: command name, pid, ppid, state, possibly registers
> +    listOfFilesToCopy = ["cmdline", "stack", "status"]
> +    listOfPathToPidsData = []
> +    if (os.path.exists(pathToPidData)):
> +        for srcFilename in os.listdir(pathToPidData):
> +            pathToPidDirDST = os.path.join(pathToDSTDir, srcFilename)
> +            if (srcFilename.isdigit()):
> +                pathToSrcDir = os.path.join(pathToPidData, srcFilename)
> +                for filenameToCopy in listOfFilesToCopy:
> +                    copyFile(os.path.join(pathToSrcDir, filenameToCopy), os.path.join(pathToPidDirDST, filenameToCopy))
> +                if (os.path.exists(pathToPidDirDST)):
> +                    listOfPathToPidsData.append(pathToPidDirDST)
> +    return listOfPathToPidsData
> +
>   def triggerSysRQEvents():
>       """
>       This command will trigger sysrq events which will write the output to
> @@ -626,14 +755,15 @@ def triggerSysRQEvents():
>       pathToSysrqTriggerFile = "/proc/sysrq-trigger"
>       # m - dump information about memory allocation
>       # t - dump thread state information
> -    triggers = ["m", "t"]
> +    # triggers = ["m", "t"]
> +    triggers = ["t"]
>       for trigger in triggers:
>           try:
>               fout = open(pathToSysrqTriggerFile, "w")
>               runCommand(command, [trigger], standardOut=fout)
>               fout.close()
>           except IOError:
> -            message = "There was an error the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
> +            message = "There was an error writing the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
>               logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
>   def gatherLogs(pathToDSTDir):
> @@ -645,24 +775,14 @@ def gatherLogs(pathToDSTDir):
>       copied to.
>       @type pathToDSTDir: String
>       """
> -    if (mkdirs(pathToDSTDir)):
> -        # Copy messages logs that contain the sysrq data.
> -        pathToLogFile = "/var/log/messages"
> -        pathToDSTLogFile = os.path.join(pathToDSTDir, os.path.basename(pathToLogFile))
> -        try:
> -            shutil.copyfile(pathToLogFile, pathToDSTLogFile)
> -        except shutil.Error:
> -            message = "There was an error copying the file: %s to %s." %(pathToLogFile, pathToDSTLogFile)
> -            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +    pathToLogFile = "/var/log/messages"
> +    pathToDSTLogFile = os.path.join(pathToDSTDir, os.path.basename(pathToLogFile))
> +    copyFile(pathToLogFile, pathToDSTLogFile)
>
> -        pathToLogDir = "/var/log/cluster"
> +    pathToLogDir = "/var/log/cluster"
> +    if (os.path.exists(pathToLogDir)):
>           pathToDSTLogDir = os.path.join(pathToDSTDir, os.path.basename(pathToLogDir))
> -        if (os.path.isdir(pathToLogDir)):
> -            try:
> -                shutil.copytree(pathToLogDir, pathToDSTLogDir)
> -            except shutil.Error:
> -                message = "There was an error copying the directory: %s to %s." %(pathToLogDir, pathToDSTLogDir)
> -                logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        copyDirectory(pathToLogDir, pathToDSTDir)
>
>   def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
>       """
> @@ -680,23 +800,13 @@ def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
>       lockDumpType = "dlm"
>       pathToSrcDir = os.path.join(PATH_TO_DEBUG_DIR, lockDumpType)
>       pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
> -    message = "Copying the files in the %s lockdump data directory %s for the selected GFS2 filesystem with dlm debug files." %(lockDumpType.upper(), pathToSrcDir)
> -    logging.getLogger(MAIN_LOGGER_NAME).status(message)
> +    message = "Copying the files in the %s lockdump data directory %s." %(lockDumpType.upper(), pathToSrcDir)
> +    logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>       for filename in os.listdir(pathToSrcDir):
>           for name in listOfGFS2Filesystems:
>               if (filename.startswith(name)):
> -                pathToCurrentFilename = os.path.join(pathToSrcDir, filename)
> -                pathToDSTDir = os.path.join(pathToOutputDir, name)
> -                mkdirs(pathToDSTDir)
> -                pathToDSTFilename = os.path.join(pathToDSTDir, filename)
> -                try:
> -                    shutil.copy(pathToCurrentFilename, pathToDSTFilename)
> -                except shutil.Error:
> -                    message = "There was an error copying the file: %s to %s." %(pathToCurrentFilename, pathToDSTFilename)
> -                    logging.getLogger(MAIN_LOGGER_NAME).error(message)
> -                except OSError:
> -                    message = "There was an error copying the file: %s to %s." %(pathToCurrentFilename, pathToDSTFilename)
> -                    logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +                copyFile(os.path.join(pathToSrcDir, filename),
> +                         os.path.join(os.path.join(pathToOutputDir, name), filename))
>
>   def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>       """
> @@ -718,18 +828,9 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
>       for dirName in os.listdir(pathToSrcDir):
>           pathToCurrentDir = os.path.join(pathToSrcDir, dirName)
>           if ((os.path.isdir(pathToCurrentDir)) and (dirName in listOfGFS2Filesystems)):
> -            mkdirs(pathToOutputDir)
> -            pathToDSTDir = os.path.join(pathToOutputDir, dirName)
> -            try:
> -                message = "Copying the lockdump data for the %s filesystem: %s" %(lockDumpType.upper(), dirName)
> -                logging.getLogger(MAIN_LOGGER_NAME).status(message)
> -                shutil.copytree(pathToCurrentDir, pathToDSTDir)
> -            except shutil.Error:
> -                message = "There was an error copying the directory: %s to %s." %(pathToCurrentDir, pathToDSTDir)
> -                logging.getLogger(MAIN_LOGGER_NAME).error(message)
> -            except OSError:
> -                message = "There was an error copying the directory: %s to %s." %(pathToCurrentDir, pathToDSTDir)
> -                logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            message = "Copying the lockdump data for the %s filesystem: %s" %(lockDumpType.upper(), dirName)
> +            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> +            copyDirectory(pathToCurrentDir, pathToOutputDir)
>
>   # ##############################################################################
>   # Get user selected options
> @@ -752,52 +853,57 @@ def __getOptions(version) :
>       cmdParser.add_option("-d", "--debug",
>                            action="store_true",
>                            dest="enableDebugLogging",
> -                         help="Enables debug logging.",
> +                         help="enables debug logging",
>                            default=False)
>       cmdParser.add_option("-q", "--quiet",
>                            action="store_true",
>                            dest="disableLoggingToConsole",
> -                         help="Disables logging to console.",
> +                         help="disables logging to console",
> +                         default=False)
> +    cmdParser.add_option("-y", "--no_ask",
> +                         action="store_true",
> +                         dest="disableQuestions",
> +                         help="disables all questions and assumes yes",
>                            default=False)
>       cmdParser.add_option("-i", "--info",
>                            action="store_true",
>                            dest="enablePrintInfo",
> -                         help="Prints to console some basic information about the GFS2 filesystems mounted on the cluster node.",
> +                         help="prints information about the mounted GFS2 file systems",
>                            default=False)
> -    cmdParser.add_option("-M", "--mount_debug_fs",
> +    cmdParser.add_option("-t", "--archive",
>                            action="store_true",
> -                         dest="enableMountDebugFS",
> -                         help="Enables the mounting of the debug filesystem if it is not mounted. Default is disabled.",
> +                         dest="enableArchiveOutputDir",
> +                         help="the output directory will be archived(tar) and compressed(.bz2)",
>                            default=False)
>       cmdParser.add_option("-o", "--path_to_output_dir",
>                            action="store",
>                            dest="pathToOutputDir",
> -                         help="The path to the output directory where all the collect data will be stored. Default is /tmp/<date>-<hostname>-%s" %(os.path.basename(sys.argv[0])),
> +                         help="the directory where all the collect data will be stored",
>                            type="string",
> +                         metavar="<output directory>",
>                            default="")
>       cmdParser.add_option("-r", "--num_of_runs",
>                            action="store",
>                            dest="numberOfRuns",
> -                         help="The number of lockdumps runs to do. Default is 2.",
> +                         help="number of runs capturing the lockdump data",
>                            type="int",
> +                         metavar="<number of runs>",
>                            default=2)
>       cmdParser.add_option("-s", "--seconds_sleep",
>                            action="store",
>                            dest="secondsToSleep",
> -                         help="The number of seconds sleep between runs. Default is 120 seconds.",
> +                         help="number of seconds to sleep between runs of capturing the lockdump data",
>                            type="int",
> +                         metavar="<seconds to sleep>",
>                            default=120)
> -    cmdParser.add_option("-t", "--archive",
> -                         action="store_true",
> -                         dest="enableArchiveOutputDir",
> -                         help="Enables archiving and compressing of the output directory with tar and bzip2. Default is disabled.",
> -                         default=False)
>       cmdParser.add_option("-n", "--fs_name",
>                            action="extend",
>                            dest="listOfGFS2Names",
> -                         help="List of GFS2 filesystems that will have their lockdump data gathered.",
> +                         help="name of the GFS2 filesystem(s) that will have their lockdump data captured",
>                            type="string",
> -                         default=[])    # Get the options and return the result.
> +                         metavar="<name of GFS2 filesystem>",
> +                         default=[])
> + # Get the options and return the result.
>       (cmdLineOpts, cmdLineArgs) = cmdParser.parse_args()
>       return (cmdLineOpts, cmdLineArgs)
>
> @@ -817,7 +923,7 @@ class OptionParserExtended(OptionParser):
>           self.__commandName = os.path.basename(sys.argv[0])
>           versionMessage = "%s %s\n" %(self.__commandName, version)
>
> -        commandDescription  ="%s will capture information about lockdata data for GFS2 and DLM required to analyze a GFS2 filesystem.\n"%(self.__commandName)
> +        commandDescription  ="%s gfs2_lockcapture will capture locking information from GFS2 file systems and DLM.\n"%(self.__commandName)
>
>           OptionParser.__init__(self, option_class=ExtendOption,
>                                 version=versionMessage,
> @@ -831,10 +937,17 @@ class OptionParserExtended(OptionParser):
>           examplesMessage = "\n"
>           examplesMessage = "\nPrints information about the available GFS2 filesystems that can have lockdump data captured."
>           examplesMessage += "\n$ %s -i\n" %(self.__commandName)
> -        examplesMessage += "\nThis command will mount the debug directory if it is not mounted. It will do 3 runs of\n"
> -        examplesMessage += "gathering the lockdump information in 10 second intervals for only the GFS2 filesystems\n"
> -        examplesMessage += "with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress the data collected."
> -        examplesMessage += "\n$ %s -M -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1\n" %(self.__commandName)
> +
> +        examplesMessage += "\nIt will do 3 runs of gathering the lockdump information in 10 second intervals for only the"
> +        examplesMessage += "\nGFS2 filesystems with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress"
> +        examplesMessage += "\nthe data collected. All of the lockdump data will be written to the directory: "
> +        examplesMessage += "\n/tmp/2012-11-12_095556-gfs2_lockcapture and all the questions will be answered with yes.\n"
> +        examplesMessage += "\n$ %s -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1 -o /tmp/2012-11-12_095556-gfs2_lockcapture -y\n" %(self.__commandName)
> +
> +        examplesMessage += "\nIt will do 2 runs of gathering the lockdump information in 25 second intervals for all the"
> +        examplesMessage += "\nmounted GFS2 filesystems. Then it will archive and compress the data collected. All of the"
> +        examplesMessage += "\nlockdump data will be written to the directory: /tmp/2012-11-12_095556-gfs2_lockcapture.\n"
> +        examplesMessage += "\n$ %s -r 2 -s 25 -t -o /tmp/2012-11-12_095556-gfs2_lockcapture\n" %(self.__commandName)
>           OptionParser.print_help(self)
>           print examplesMessage
>
> @@ -869,11 +982,13 @@ class ExtendOption (Option):
>           @type parser: OptionParser
>           """
>           if (action == "extend") :
> -            valueList=[]
> +            valueList = []
>               try:
>                   for v in value.split(","):
>                       # Need to add code for dealing with paths if there is option for paths.
> -                    valueList.append(v)
> +                    newValue = value.strip().rstrip()
> +                    if (len(newValue) > 0):
> +                        valueList.append(newValue)
>               except:
>                   pass
>               else:
> @@ -912,17 +1027,10 @@ if __name__ == "__main__":
>           streamHandler.setFormatter(logging.Formatter("%(levelname)s %(message)s"))
>           logger.addHandler(streamHandler)
>
> -        # Set the handler for writing to log file.
> -        pathToLogFile = "/tmp/%s.log" %(MAIN_LOGGER_NAME)
> -        if (((os.access(pathToLogFile, os.W_OK) and os.access("/tmp", os.R_OK))) or (not os.path.exists(pathToLogFile))):
> -            fileHandler = logging.FileHandler(pathToLogFile)
> -            fileHandler.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%d %H:%M:%S"))
> -            logger.addHandler(fileHandler)
> -            message = "A log file will be created or appened to: %s" %(pathToLogFile)
> -            logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -        else:
> -            message = "There was permission problem accessing the write attributes for the log file: %s." %(pathToLogFile)
> -            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +        # Please note there will not be a global log file created. If a log file
> +        # is needed then redirect the output. There will be a log file created
> +        # for each run in the corresponding directory.
> +
>           # #######################################################################
>           # Set the logging levels.
>           # #######################################################################
> @@ -949,6 +1057,26 @@ if __name__ == "__main__":
>               # script running.
>               writeToFile(PATH_TO_PID_FILENAME, str(os.getpid()), createFile=True)
>           # #######################################################################
> +        # Verify they want to continue because this script will trigger sysrq events.
> +        # #######################################################################
> +        if (not cmdLineOpts.disableQuestions):
> +            valid = {"yes":True, "y":True, "no":False, "n":False}
> +            question = "This script will trigger a sysrq -t event or collect the data for each pid directory located in /proc for each run. Are you sure you want to continue?"
> +            prompt = " [y/n] "
> +            while True:
> +                sys.stdout.write(question + prompt)
> +                choice = raw_input().lower()
> +                if (choice in valid):
> +                    if (valid.get(choice)):
> +                        # If yes, or y then exit loop and continue.
> +                        break
> +                    else:
> +                        message = "The script will not continue since you chose not to continue."
> +                        logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +                        exitScript(removePidFile=True, errorCode=1)
> +                else:
> +                    sys.stdout.write("Please respond with '(y)es' or '(n)o'.\n")
> +        # #######################################################################
>           # Get the clusternode name and verify that mounted GFS2 filesystems were
>           # found.
>           # #######################################################################
> @@ -976,8 +1104,6 @@ if __name__ == "__main__":
>           # proceeding unless it is already created from a previous run data needs
>           # to be analyzed. Probably could add more debugging on if file or dir.
>           # #######################################################################
> -        message = "The gathering of the lockdumps will be performed on the clusternode \"%s\" which is part of the cluster \"%s\"." %(clusternode.getClusterNodeName(), clusternode.getClusterName())
> -        logging.getLogger(MAIN_LOGGER_NAME).info(message)
>           pathToOutputDir = cmdLineOpts.pathToOutputDir
>           if (not len(pathToOutputDir) > 0):
>               pathToOutputDir = "%s" %(os.path.join("/tmp", "%s-%s-%s" %(time.strftime("%Y-%m-%d_%H%M%S"), clusternode.getClusterNodeName(), os.path.basename(sys.argv[0]))))
> @@ -1000,56 +1126,83 @@ if __name__ == "__main__":
>           # Check to see if the debug directory is mounted. If not then
>           # log an error.
>           # #######################################################################
> -        result = verifyDebugFilesystemMounted(cmdLineOpts.enableMountDebugFS)
> -        if (not result):
> -            message = "Please mount the debug filesystem before running this script. For example: $ mount none -t debugfs %s" %(PATH_TO_DEBUG_DIR)
> +        if(mountFilesystem("debugfs", "none", PATH_TO_DEBUG_DIR)):
> +            message = "The debug filesystem %s is mounted." %(PATH_TO_DEBUG_DIR)
> +            logging.getLogger(MAIN_LOGGER_NAME).info(message)
> +        else:
> +            message = "There was a problem mounting the debug filesystem: %s" %(PATH_TO_DEBUG_DIR)
> +            logging.getLogger(MAIN_LOGGER_NAME).error(message)
> +            message = "The debug filesystem is required to be mounted for this script to run."
>               logging.getLogger(MAIN_LOGGER_NAME).info(message)
>               exitScript(errorCode=1)
> -
>           # #######################################################################
>           # Gather data and the lockdumps.
>           # #######################################################################
> -        message = "The process of gathering all the required files will begin before capturing the lockdumps."
> -        logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -        for i in range(0,cmdLineOpts.numberOfRuns):
> +        if (cmdLineOpts.numberOfRuns <= 0):
> +            message = "The number of runs should be greater than zero."
> +            exitScript(errorCode=1)
> +        for i in range(1,(cmdLineOpts.numberOfRuns + 1)):
>               # The current log count that will start at 1 and not zero to make it
>               # make sense in logs.
> -            currentLogRunCount = (i + 1)
>               # Add clusternode name under each run dir to make combining multple
>               # clusternode gfs2_lockgather data together and all data in each run directory.
>               pathToOutputRunDir = os.path.join(pathToOutputDir, "run%d/%s" %(i, clusternode.getClusterNodeName()))
> +            # Create the the directory that will be used to capture the data.
>               if (not mkdirs(pathToOutputRunDir)):
>                   exitScript(errorCode=1)
> -            # Gather various bits of data from the clusternode.
> -            message = "Gathering some general information about the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> +            # Set the handler for writing to log file for this run.
> +            currentRunFileHandler = None
> +            pathToLogFile = os.path.join(pathToOutputRunDir, "%s.log" %(MAIN_LOGGER_NAME))
> +            if (((os.access(pathToLogFile, os.W_OK) and os.access("/tmp", os.R_OK))) or (not os.path.exists(pathToLogFile))):
> +                currentRunFileHandler = logging.FileHandler(pathToLogFile)
> +                currentRunFileHandler.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%d %H:%M:%S"))
> +                logging.getLogger(MAIN_LOGGER_NAME).addHandler(currentRunFileHandler)
> +            message = "Pass (%d/%d): Gathering all the lockdump data." %(i, cmdLineOpts.numberOfRuns)
>               logging.getLogger(MAIN_LOGGER_NAME).status(message)
> +
> +            # Gather various bits of data from the clusternode.
> +            message = "Pass (%d/%d): Gathering general information about the host." %(i, cmdLineOpts.numberOfRuns)
> +            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>               gatherGeneralInformation(pathToOutputRunDir)
> -            # Trigger sysrq events to capture memory and thread information
> -            message = "Triggering the sysrq events for the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> -            logging.getLogger(MAIN_LOGGER_NAME).status(message)
> -            triggerSysRQEvents()
> +            # Going to sleep for 2 seconds, so that TIMESTAMP should be in the
> +            # past in the logs so that capturing sysrq data will be guaranteed.
> +            time.sleep(2)
> +            # Gather the backtraces for all the pids, by grabbing the /proc/<pid
> +            # number> or triggering sysrq events to capture task bask traces
> +            # from log.
> +            message = "Pass (%d/%d): Triggering the sysrq events for the host." %(i, cmdLineOpts.numberOfRuns)
> +            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> +            # Gather the data in the /proc/<pid> directory if the file
> +            # </proc/<pid>/stack exists. If file exists we will not trigger
> +            # sysrq events.
> +            pathToPidData = "/proc"
> +            if (isProcPidStackEnabled(pathToPidData)):
> +                gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
> +            else:
> +                triggerSysRQEvents()
>               # Gather the dlm locks.
>               lockDumpType = "dlm"
> -            message = "Gathering the %s lock dumps for clusternode %s for run %d/%d." %(lockDumpType.upper(), clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> -            logging.getLogger(MAIN_LOGGER_NAME).status(message)
> +            message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> +            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>               gatherDLMLockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames(includeClusterName=False))
>               # Gather the glock locks from gfs2.
>               lockDumpType = "gfs2"
> -            message = "Gathering the %s lock dumps for clusternode %s for run %d/%d." %(lockDumpType.upper(), clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> -            logging.getLogger(MAIN_LOGGER_NAME).status(message)
> +            message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> +            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>               gatherGFS2LockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames())
>               # Gather log files
> -            message = "Gathering the log files for the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> -            logging.getLogger(MAIN_LOGGER_NAME).status(message)
> +            message = "Pass (%d/%d): Gathering the log files for the host." %(i, cmdLineOpts.numberOfRuns)
> +            logging.getLogger(MAIN_LOGGER_NAME).debug(message)
>               gatherLogs(os.path.join(pathToOutputRunDir, "logs"))
>               # Sleep between each run if secondsToSleep is greater than or equal
>               # to 0 and current run is not the last run.
> -            if ((cmdLineOpts.secondsToSleep >= 0) and (i < (cmdLineOpts.numberOfRuns - 1))):
> -                message = "The script will sleep for %d seconds between each run of capturing the lockdumps." %(cmdLineOpts.secondsToSleep)
> +            if ((cmdLineOpts.secondsToSleep >= 0) and (i <= (cmdLineOpts.numberOfRuns))):
> +                message = "The script will sleep for %d seconds between each run of capturing the lockdump data." %(cmdLineOpts.secondsToSleep)
>                   logging.getLogger(MAIN_LOGGER_NAME).info(message)
> -                message = "The script is sleeping before beginning the next run."
> -                logging.getLogger(MAIN_LOGGER_NAME).status(message)
>                   time.sleep(cmdLineOpts.secondsToSleep)
> +            # Remove the handler:
> +            logging.getLogger(MAIN_LOGGER_NAME).removeHandler(currentRunFileHandler)
> +
>           # #######################################################################
>           # Archive the directory that contains all the data and archive it after
>           # all the information has been gathered.
> diff --git a/gfs2/man/Makefile.am b/gfs2/man/Makefile.am
> index 83d6251..8655a76 100644
> --- a/gfs2/man/Makefile.am
> +++ b/gfs2/man/Makefile.am
> @@ -7,4 +7,5 @@ dist_man_MANS		= fsck.gfs2.8 \
>   			  gfs2_grow.8 \
>   			  gfs2_jadd.8 \
>   			  mkfs.gfs2.8 \
> -			  tunegfs2.8
> +			  tunegfs2.8 \
> +			  gfs2_lockcapture.8
> diff --git a/gfs2/man/gfs2_lockcapture.8 b/gfs2/man/gfs2_lockcapture.8
> new file mode 100644
> index 0000000..854cd71
> --- /dev/null
> +++ b/gfs2/man/gfs2_lockcapture.8
> @@ -0,0 +1,53 @@
> +.TH gfs2_lockcapture 8
> +
> +.SH NAME
> +gfs2_lockcapture \- will capture locking information from GFS2 file systems and DLM.
> +
> +.SH SYNOPSIS
> +.B gfs2_lockcapture \fR[-dqyt]  [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 filesystem]\fP
> +.PP
> +.B gfs2_lockcapture \fR[-dqyi]
> +
> +.SH DESCRIPTION
> +\fIgfs2_lockcapture\fR is used to capture all the GFS2 lockdump data and
> +corresponding DLM data. The command can be configured to capture the data
> +multiple times and how much time to sleep between each iteration of capturing
> +the data. By default all of the mounted GFS2 filesystems will have their data
> +collected unless GFS2 filesystems are specified.
> +.PP
> +Please note that sysrq -t and -m events are trigger or the pid directories in /proc are
> +collected on each iteration of capturing the data.
> +
> +.SH OPTIONS
> +.TP
> +\fB-h,  --help\fP
> +Prints out a short usage message and exits.
> +.TP
> +\fB-d,  --debug\fP
> +enables debug logging.
> +.TP
> +\fB-q,  --quiet\fP
> +disables logging to console.
> +.TP
> +\fB-y,  --no_ask\fP
> +disables all questions and assumes yes.
> +.TP
> +\fB-i,  --info\fP
> +prints information about the mounted GFS2 file systems.
> +.TP
> +\fB-t,  --archive\fP
> +the output directory will be archived(tar) and compressed(.bz2).
> +.TP
> +\fB-o \fI<output directory>, \fB--path_to_output_dir\fR=\fI<output directory>\fP
> +the directory where all the collect data will stored.
> +.TP
> +\fB-r \fI<number of runs>,  \fB--num_of_runs\fR=\fI<number of runs>\fP
> +number of runs capturing the lockdump data.
> +.TP
> +\fB-s \fI<seconds to sleep>,  \fB--seconds_sleep\fR=\fI<seconds to sleep>\fP
> +number of seconds to sleep between runs of capturing the lockdump data.
> +.TP
> +\fB-n \fI<name of GFS2 filesystem>,  \fB--fs_name\fR=\fI<name of GFS2 filesystem>\fP
> +name of the GFS2 filesystem(s) that will have their lockdump data captured.
> +.
> +.SH SEE ALSO
>



^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-12-14 13:37 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-13 15:14 [Cluster-devel] [PATCH] gfs2-lockcapture: Modified some of the data gathered sbradley
2012-12-14 13:37 ` Andrew Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).