From: Andrew Price <anprice@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH] gfs2-lockcapture: Modified some of the data gathered
Date: Fri, 14 Dec 2012 13:37:47 +0000 [thread overview]
Message-ID: <50CB2BAB.4040707@redhat.com> (raw)
In-Reply-To: <1355411652-6150-1-git-send-email-sbradley@redhat.com>
Hi Shane,
On 13/12/12 15:14, sbradley at redhat.com wrote:
> From: sbradley <sbradley@redhat.com>
Looks good to me. Thanks for adding a manpage. Please make sure the
authorship info is corrected before pushing.
Andy
> Changed some var names in host data collected, added /proc/<pid>/ to files
> collected, and added man page.
>
> Signed-off-by: shane bradley <sbradley@redhat.com>
> ---
> gfs2/lockcapture/gfs2_lockcapture | 465 +++++++++++++++++++++++++-------------
> gfs2/man/Makefile.am | 3 +-
> gfs2/man/gfs2_lockcapture.8 | 53 +++++
> 3 files changed, 364 insertions(+), 157 deletions(-)
> create mode 100644 gfs2/man/gfs2_lockcapture.8
>
> diff --git a/gfs2/lockcapture/gfs2_lockcapture b/gfs2/lockcapture/gfs2_lockcapture
> index a930a2f..1a64188 100644
> --- a/gfs2/lockcapture/gfs2_lockcapture
> +++ b/gfs2/lockcapture/gfs2_lockcapture
> @@ -1,9 +1,7 @@
> #!/usr/bin/env python
> """
> -This script will gather GFS2 glocks and dlm lock dump information for a cluster
> -node. The script can get all the mounted GFS2 filesystem data or set of selected
> -GFS2 filesystems. The script will also gather some general information about the
> -system.
> +The script gfs2_lockcapture will capture locking information from GFS2 file
> +systems and DLM.
>
> @author : Shane Bradley
> @contact : sbradley at redhat.com
> @@ -35,7 +33,7 @@ import tarfile
> sure only 1 instance of this script is running at any time.
> @type PATH_TO_PID_FILENAME: String
> """
> -VERSION_NUMBER = "0.9-1"
> +VERSION_NUMBER = "0.9-2"
> MAIN_LOGGER_NAME = "%s" %(os.path.basename(sys.argv[0]))
> PATH_TO_DEBUG_DIR="/sys/kernel/debug"
> PATH_TO_PID_FILENAME = "/var/run/%s.pid" %(os.path.basename(sys.argv[0]))
> @@ -313,7 +311,7 @@ def archiveData(pathToSrcDir):
> @type pathToSrcDir: String
> """
> if (os.path.exists(pathToSrcDir)):
> - pathToTarFilename = "%s.tar.bz2" %(pathToSrcDir)
> + pathToTarFilename = "%s-%s.tar.bz2" %(pathToSrcDir, platform.node())
> if (os.path.exists(pathToTarFilename)):
> message = "A compressed archvied file already exists and will be removed: %s" %(pathToTarFilename)
> logging.getLogger(MAIN_LOGGER_NAME).status(message)
> @@ -337,6 +335,127 @@ def archiveData(pathToSrcDir):
> return pathToTarFilename
> return ""
>
> +def getDataFromFile(pathToSrcFile) :
> + """
> + This function will return the data in an array. Where each newline in file
> + is a seperate item in the array. This should really just be used on
> + relatively small files.
> +
> + None is returned if no file is found.
> +
> + @return: Returns an array of Strings, where each newline in file is an item
> + in the array.
> + @rtype: Array
> +
> + @param pathToSrcFile: The path to the file which will be read.
> + @type pathToSrcFile: String
> + """
> + if (len(pathToSrcFile) > 0) :
> + try:
> + fin = open(pathToSrcFile, "r")
> + data = fin.readlines()
> + fin.close()
> + return data
> + except (IOError, os.error):
> + message = "An error occured reading the file: %s." %(pathToSrcFile)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return None
> +
> +def copyFile(pathToSrcFile, pathToDstFile):
> + """
> + This function will copy a src file to dst file.
> +
> + @return: Returns True if the file was copied successfully.
> + @rtype: Boolean
> +
> + @param pathToSrcFile: The path to the source file that will be copied.
> + @type pathToSrcFile: String
> + @param pathToDstFile: The path to the destination of the file.
> + @type pathToDstFile: String
> + """
> + if(not os.path.exists(pathToSrcFile)):
> + message = "The file does not exist with the path: %s." %(pathToSrcFile)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + elif (not os.path.isfile(pathToSrcFile)):
> + message = "The path to the source file is not a regular file: %s." %(pathToSrcFile)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + elif (pathToSrcFile == pathToDstFile):
> + message = "The path to the source file and path to destination file cannot be the same: %s." %(pathToDstFile)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + else:
> + # Create the directory structure if it does not exist.
> + (head, tail) = os.path.split(pathToDstFile)
> + if (not mkdirs(head)) :
> + # The path to the directory was not created so file
> + # could not be copied.
> + return False
> + # Copy the file to the dst path.
> + try:
> + shutil.copy(pathToSrcFile, pathToDstFile)
> + except shutil.Error:
> + message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + except OSError:
> + message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + except IOError:
> + message = "Cannot copy the file %s to %s." %(pathToSrcFile, pathToDstFile)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + return (os.path.exists(pathToDstFile))
> +
> +def copyDirectory(pathToSrcDir, pathToDstDir):
> + """
> + This function will copy a src dir to dst dir.
> +
> + @return: Returns True if the dir was copied successfully.
> + @rtype: Boolean
> +
> + @param pathToSrcDir: The path to the source dir that will be copied.
> + @type pathToSrcDir: String
> + @param pathToDstDir: The path to the destination of the dir.
> + @type pathToDstDir: String
> + """
> + if(not os.path.exists(pathToSrcDir)):
> + message = "The directory does not exist with the path: %s." %(pathToSrcDir)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + elif (not os.path.isdir(pathToSrcDir)):
> + message = "The path to the source directory is not a directory: %s." %(pathToSrcDir)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + elif (pathToSrcDir == pathToDstDir):
> + message = "The path to the source directory and path to destination directory cannot be the same: %s." %(pathToDstDir)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + else:
> + if (not mkdirs(pathToDstDir)) :
> + # The path to the directory was not created so file
> + # could not be copied.
> + return False
> + # Copy the file to the dst path.
> + dst = os.path.join(pathToDstDir, os.path.basename(pathToSrcDir))
> + try:
> + shutil.copytree(pathToSrcDir, dst)
> + except shutil.Error:
> + message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + except OSError:
> + message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + except IOError:
> + message = "Cannot copy the directory %s to %s." %(pathToSrcDir, dst)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + return False
> + return (os.path.exists(dst))
> +
> def backupOutputDirectory(pathToOutputDir):
> """
> This function will return True if the pathToOutputDir does not exist or the
> @@ -464,8 +583,8 @@ def getClusterNode(listOfGFS2Names):
> if (len(listOfGFS2Names) > 0):
> for label in mapOfMountedFilesystemLabels.keys():
> foundMatch = False
> - for name in listOfGFS2Names:
> - if ((name == label) or ("%s:%s"%(clusterName, name) == label)):
> + for gfs2FSName in listOfGFS2Names:
> + if ((gfs2FSName == label) or ("%s:%s"%(clusterName, gfs2FSName) == label)):
> foundMatch = True
> break
> if ((not foundMatch) and (mapOfMountedFilesystemLabels.has_key(label))):
> @@ -518,33 +637,6 @@ def getLabelMapForMountedFilesystems(clusterName, listOfMountedFilesystems):
> mapOfMountedFilesystemLabels[fsLabel] = mountedFilesystem
> return mapOfMountedFilesystemLabels
>
> -def verifyDebugFilesystemMounted(enableMounting=True):
> - """
> - This function verifies that the debug filesystem is mounted. If the debug
> - filesystem is mounted then True is returned, otherwise False is returned.
> -
> - @return: If the debug filesystem is mounted then True is returned, otherwise
> - False is returned.
> - @rtype: Boolean
> -
> - @param enableMounting: If True then the debug filesystem will be mounted if
> - it is currently not mounted.
> - @type enableMounting: Boolean
> - """
> - if (os.path.ismount(PATH_TO_DEBUG_DIR)):
> - message = "The debug filesystem %s is mounted." %(PATH_TO_DEBUG_DIR)
> - logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - return True
> - else:
> - message = "The debug filesystem %s is not mounted." %(PATH_TO_DEBUG_DIR)
> - logging.getLogger(MAIN_LOGGER_NAME).warning(message)
> - if (cmdLineOpts.enableMountDebugFS):
> - if(mountFilesystem("debugfs", "none", PATH_TO_DEBUG_DIR)):
> - message = "The debug filesystem was mounted: %s." %(PATH_TO_DEBUG_DIR)
> - logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - return True
> - return False
> -
> def mountFilesystem(filesystemType, pathToDevice, pathToMountPoint):
> """
> This function will attempt to mount a filesystem. If the filesystem is
> @@ -583,29 +675,24 @@ def gatherGeneralInformation(pathToDSTDir):
> @type pathToDSTDir: String
> """
> # Gather some general information and write to system.txt.
> - systemString = "HOSTNAME: %s\nDATE: %s\n" %(platform.node(), time.strftime("%Y-%m-%d_%H:%M:%S"))
> - stdout = runCommandOutput("uname", ["-a"])
> + systemString = "HOSTNAME=%s\nTIMESTAMP=%s\n" %(platform.node(), time.strftime("%Y-%m-%d %H:%M:%S"))
> + stdout = runCommandOutput("uname", ["-a"]).strip().rstrip()
> if (not stdout == None):
> - systemString += "UNAME-A: %s\n" %(stdout)
> - stdout = runCommandOutput("uptime", [])
> + systemString += "UNAMEA=%s\n" %(stdout)
> + stdout = runCommandOutput("uptime", []).strip().rstrip()
> if (not stdout == None):
> - systemString += "UPTIME: %s\n" %(stdout)
> - writeToFile(os.path.join(pathToDSTDir, "system.txt"), systemString, createFile=True)
> + systemString += "UPTIME=%s" %(stdout)
> + writeToFile(os.path.join(pathToDSTDir, "hostinformation.txt"), systemString, createFile=True)
>
> - # Get "mount -l" filesystem data.
> - command = "cat"
> - pathToCommandOutput = os.path.join(pathToDSTDir, "cat-proc_mounts.txt")
> - try:
> - fout = open(pathToCommandOutput, "w")
> - runCommand(command, ["/proc/mounts"], standardOut=fout)
> - fout.close()
> - except IOError:
> - message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + # Copy misc files
> + pathToSrcFile = "/proc/mounts"
> + copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
> + pathToSrcFile = "/proc/slabinfo"
> + copyFile(pathToSrcFile, os.path.join(pathToDSTDir, pathToSrcFile.strip("/")))
>
> # Get "ps -eo user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan" data.
> command = "ps"
> - pathToCommandOutput = os.path.join(pathToDSTDir, "ps.txt")
> + pathToCommandOutput = os.path.join(pathToDSTDir, "ps_hALo-tid.s.cmd")
> try:
> fout = open(pathToCommandOutput, "w")
> #runCommand(command, ["-eo", "user,pid,%cpu,%mem,vsz,rss,tty,stat,start,time,comm,wchan"], standardOut=fout)
> @@ -615,6 +702,48 @@ def gatherGeneralInformation(pathToDSTDir):
> message = "There was an error the command output for %s to the file %s." %(command, pathToCommandOutput)
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> +
> +def isProcPidStackEnabled(pathToPidData):
> + """
> + Returns true if the init process has the file "stack" in its pid data
> + directory which contains the task functions for that process.
> +
> + @return: Returns true if the init process has the file "stack" in its pid
> + data directory which contains the task functions for that process.
> + @rtype: Boolean
> +
> + @param pathToPidData: The path to the directory where all the pid data
> + directories are located.
> + @type pathToPidData: String
> + """
> + return os.path.exists(os.path.join(pathToPidData, "1/stack"))
> +
> +def gatherPidData(pathToPidData, pathToDSTDir):
> + """
> + This command will gather all the directories which contain data about all the pids.
> +
> + @return: Returns a list of paths to the directory that contains the
> + information about the pid.
> + @rtype: Array
> +
> + @param pathToPidData: The path to the directory where all the pid data
> + directories are located.
> + @type pathToPidData: String
> + """
> + # Status has: command name, pid, ppid, state, possibly registers
> + listOfFilesToCopy = ["cmdline", "stack", "status"]
> + listOfPathToPidsData = []
> + if (os.path.exists(pathToPidData)):
> + for srcFilename in os.listdir(pathToPidData):
> + pathToPidDirDST = os.path.join(pathToDSTDir, srcFilename)
> + if (srcFilename.isdigit()):
> + pathToSrcDir = os.path.join(pathToPidData, srcFilename)
> + for filenameToCopy in listOfFilesToCopy:
> + copyFile(os.path.join(pathToSrcDir, filenameToCopy), os.path.join(pathToPidDirDST, filenameToCopy))
> + if (os.path.exists(pathToPidDirDST)):
> + listOfPathToPidsData.append(pathToPidDirDST)
> + return listOfPathToPidsData
> +
> def triggerSysRQEvents():
> """
> This command will trigger sysrq events which will write the output to
> @@ -626,14 +755,15 @@ def triggerSysRQEvents():
> pathToSysrqTriggerFile = "/proc/sysrq-trigger"
> # m - dump information about memory allocation
> # t - dump thread state information
> - triggers = ["m", "t"]
> + # triggers = ["m", "t"]
> + triggers = ["t"]
> for trigger in triggers:
> try:
> fout = open(pathToSysrqTriggerFile, "w")
> runCommand(command, [trigger], standardOut=fout)
> fout.close()
> except IOError:
> - message = "There was an error the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
> + message = "There was an error writing the command output for %s to the file %s." %(command, pathToSysrqTriggerFile)
> logging.getLogger(MAIN_LOGGER_NAME).error(message)
>
> def gatherLogs(pathToDSTDir):
> @@ -645,24 +775,14 @@ def gatherLogs(pathToDSTDir):
> copied to.
> @type pathToDSTDir: String
> """
> - if (mkdirs(pathToDSTDir)):
> - # Copy messages logs that contain the sysrq data.
> - pathToLogFile = "/var/log/messages"
> - pathToDSTLogFile = os.path.join(pathToDSTDir, os.path.basename(pathToLogFile))
> - try:
> - shutil.copyfile(pathToLogFile, pathToDSTLogFile)
> - except shutil.Error:
> - message = "There was an error copying the file: %s to %s." %(pathToLogFile, pathToDSTLogFile)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + pathToLogFile = "/var/log/messages"
> + pathToDSTLogFile = os.path.join(pathToDSTDir, os.path.basename(pathToLogFile))
> + copyFile(pathToLogFile, pathToDSTLogFile)
>
> - pathToLogDir = "/var/log/cluster"
> + pathToLogDir = "/var/log/cluster"
> + if (os.path.exists(pathToLogDir)):
> pathToDSTLogDir = os.path.join(pathToDSTDir, os.path.basename(pathToLogDir))
> - if (os.path.isdir(pathToLogDir)):
> - try:
> - shutil.copytree(pathToLogDir, pathToDSTLogDir)
> - except shutil.Error:
> - message = "There was an error copying the directory: %s to %s." %(pathToLogDir, pathToDSTLogDir)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + copyDirectory(pathToLogDir, pathToDSTDir)
>
> def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
> """
> @@ -680,23 +800,13 @@ def gatherDLMLockDumps(pathToDSTDir, listOfGFS2Filesystems):
> lockDumpType = "dlm"
> pathToSrcDir = os.path.join(PATH_TO_DEBUG_DIR, lockDumpType)
> pathToOutputDir = os.path.join(pathToDSTDir, lockDumpType)
> - message = "Copying the files in the %s lockdump data directory %s for the selected GFS2 filesystem with dlm debug files." %(lockDumpType.upper(), pathToSrcDir)
> - logging.getLogger(MAIN_LOGGER_NAME).status(message)
> + message = "Copying the files in the %s lockdump data directory %s." %(lockDumpType.upper(), pathToSrcDir)
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> for filename in os.listdir(pathToSrcDir):
> for name in listOfGFS2Filesystems:
> if (filename.startswith(name)):
> - pathToCurrentFilename = os.path.join(pathToSrcDir, filename)
> - pathToDSTDir = os.path.join(pathToOutputDir, name)
> - mkdirs(pathToDSTDir)
> - pathToDSTFilename = os.path.join(pathToDSTDir, filename)
> - try:
> - shutil.copy(pathToCurrentFilename, pathToDSTFilename)
> - except shutil.Error:
> - message = "There was an error copying the file: %s to %s." %(pathToCurrentFilename, pathToDSTFilename)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> - except OSError:
> - message = "There was an error copying the file: %s to %s." %(pathToCurrentFilename, pathToDSTFilename)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + copyFile(os.path.join(pathToSrcDir, filename),
> + os.path.join(os.path.join(pathToOutputDir, name), filename))
>
> def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
> """
> @@ -718,18 +828,9 @@ def gatherGFS2LockDumps(pathToDSTDir, listOfGFS2Filesystems):
> for dirName in os.listdir(pathToSrcDir):
> pathToCurrentDir = os.path.join(pathToSrcDir, dirName)
> if ((os.path.isdir(pathToCurrentDir)) and (dirName in listOfGFS2Filesystems)):
> - mkdirs(pathToOutputDir)
> - pathToDSTDir = os.path.join(pathToOutputDir, dirName)
> - try:
> - message = "Copying the lockdump data for the %s filesystem: %s" %(lockDumpType.upper(), dirName)
> - logging.getLogger(MAIN_LOGGER_NAME).status(message)
> - shutil.copytree(pathToCurrentDir, pathToDSTDir)
> - except shutil.Error:
> - message = "There was an error copying the directory: %s to %s." %(pathToCurrentDir, pathToDSTDir)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> - except OSError:
> - message = "There was an error copying the directory: %s to %s." %(pathToCurrentDir, pathToDSTDir)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + message = "Copying the lockdump data for the %s filesystem: %s" %(lockDumpType.upper(), dirName)
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> + copyDirectory(pathToCurrentDir, pathToOutputDir)
>
> # ##############################################################################
> # Get user selected options
> @@ -752,52 +853,57 @@ def __getOptions(version) :
> cmdParser.add_option("-d", "--debug",
> action="store_true",
> dest="enableDebugLogging",
> - help="Enables debug logging.",
> + help="enables debug logging",
> default=False)
> cmdParser.add_option("-q", "--quiet",
> action="store_true",
> dest="disableLoggingToConsole",
> - help="Disables logging to console.",
> + help="disables logging to console",
> + default=False)
> + cmdParser.add_option("-y", "--no_ask",
> + action="store_true",
> + dest="disableQuestions",
> + help="disables all questions and assumes yes",
> default=False)
> cmdParser.add_option("-i", "--info",
> action="store_true",
> dest="enablePrintInfo",
> - help="Prints to console some basic information about the GFS2 filesystems mounted on the cluster node.",
> + help="prints information about the mounted GFS2 file systems",
> default=False)
> - cmdParser.add_option("-M", "--mount_debug_fs",
> + cmdParser.add_option("-t", "--archive",
> action="store_true",
> - dest="enableMountDebugFS",
> - help="Enables the mounting of the debug filesystem if it is not mounted. Default is disabled.",
> + dest="enableArchiveOutputDir",
> + help="the output directory will be archived(tar) and compressed(.bz2)",
> default=False)
> cmdParser.add_option("-o", "--path_to_output_dir",
> action="store",
> dest="pathToOutputDir",
> - help="The path to the output directory where all the collect data will be stored. Default is /tmp/<date>-<hostname>-%s" %(os.path.basename(sys.argv[0])),
> + help="the directory where all the collect data will be stored",
> type="string",
> + metavar="<output directory>",
> default="")
> cmdParser.add_option("-r", "--num_of_runs",
> action="store",
> dest="numberOfRuns",
> - help="The number of lockdumps runs to do. Default is 2.",
> + help="number of runs capturing the lockdump data",
> type="int",
> + metavar="<number of runs>",
> default=2)
> cmdParser.add_option("-s", "--seconds_sleep",
> action="store",
> dest="secondsToSleep",
> - help="The number of seconds sleep between runs. Default is 120 seconds.",
> + help="number of seconds to sleep between runs of capturing the lockdump data",
> type="int",
> + metavar="<seconds to sleep>",
> default=120)
> - cmdParser.add_option("-t", "--archive",
> - action="store_true",
> - dest="enableArchiveOutputDir",
> - help="Enables archiving and compressing of the output directory with tar and bzip2. Default is disabled.",
> - default=False)
> cmdParser.add_option("-n", "--fs_name",
> action="extend",
> dest="listOfGFS2Names",
> - help="List of GFS2 filesystems that will have their lockdump data gathered.",
> + help="name of the GFS2 filesystem(s) that will have their lockdump data captured",
> type="string",
> - default=[]) # Get the options and return the result.
> + metavar="<name of GFS2 filesystem>",
> + default=[])
> + # Get the options and return the result.
> (cmdLineOpts, cmdLineArgs) = cmdParser.parse_args()
> return (cmdLineOpts, cmdLineArgs)
>
> @@ -817,7 +923,7 @@ class OptionParserExtended(OptionParser):
> self.__commandName = os.path.basename(sys.argv[0])
> versionMessage = "%s %s\n" %(self.__commandName, version)
>
> - commandDescription ="%s will capture information about lockdata data for GFS2 and DLM required to analyze a GFS2 filesystem.\n"%(self.__commandName)
> + commandDescription ="%s gfs2_lockcapture will capture locking information from GFS2 file systems and DLM.\n"%(self.__commandName)
>
> OptionParser.__init__(self, option_class=ExtendOption,
> version=versionMessage,
> @@ -831,10 +937,17 @@ class OptionParserExtended(OptionParser):
> examplesMessage = "\n"
> examplesMessage = "\nPrints information about the available GFS2 filesystems that can have lockdump data captured."
> examplesMessage += "\n$ %s -i\n" %(self.__commandName)
> - examplesMessage += "\nThis command will mount the debug directory if it is not mounted. It will do 3 runs of\n"
> - examplesMessage += "gathering the lockdump information in 10 second intervals for only the GFS2 filesystems\n"
> - examplesMessage += "with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress the data collected."
> - examplesMessage += "\n$ %s -M -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1\n" %(self.__commandName)
> +
> + examplesMessage += "\nIt will do 3 runs of gathering the lockdump information in 10 second intervals for only the"
> + examplesMessage += "\nGFS2 filesystems with the names myGFS2vol2,myGFS2vol1. Then it will archive and compress"
> + examplesMessage += "\nthe data collected. All of the lockdump data will be written to the directory: "
> + examplesMessage += "\n/tmp/2012-11-12_095556-gfs2_lockcapture and all the questions will be answered with yes.\n"
> + examplesMessage += "\n$ %s -r 3 -s 10 -t -n myGFS2vol2,myGFS2vol1 -o /tmp/2012-11-12_095556-gfs2_lockcapture -y\n" %(self.__commandName)
> +
> + examplesMessage += "\nIt will do 2 runs of gathering the lockdump information in 25 second intervals for all the"
> + examplesMessage += "\nmounted GFS2 filesystems. Then it will archive and compress the data collected. All of the"
> + examplesMessage += "\nlockdump data will be written to the directory: /tmp/2012-11-12_095556-gfs2_lockcapture.\n"
> + examplesMessage += "\n$ %s -r 2 -s 25 -t -o /tmp/2012-11-12_095556-gfs2_lockcapture\n" %(self.__commandName)
> OptionParser.print_help(self)
> print examplesMessage
>
> @@ -869,11 +982,13 @@ class ExtendOption (Option):
> @type parser: OptionParser
> """
> if (action == "extend") :
> - valueList=[]
> + valueList = []
> try:
> for v in value.split(","):
> # Need to add code for dealing with paths if there is option for paths.
> - valueList.append(v)
> + newValue = value.strip().rstrip()
> + if (len(newValue) > 0):
> + valueList.append(newValue)
> except:
> pass
> else:
> @@ -912,17 +1027,10 @@ if __name__ == "__main__":
> streamHandler.setFormatter(logging.Formatter("%(levelname)s %(message)s"))
> logger.addHandler(streamHandler)
>
> - # Set the handler for writing to log file.
> - pathToLogFile = "/tmp/%s.log" %(MAIN_LOGGER_NAME)
> - if (((os.access(pathToLogFile, os.W_OK) and os.access("/tmp", os.R_OK))) or (not os.path.exists(pathToLogFile))):
> - fileHandler = logging.FileHandler(pathToLogFile)
> - fileHandler.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%d %H:%M:%S"))
> - logger.addHandler(fileHandler)
> - message = "A log file will be created or appened to: %s" %(pathToLogFile)
> - logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - else:
> - message = "There was permission problem accessing the write attributes for the log file: %s." %(pathToLogFile)
> - logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + # Please note there will not be a global log file created. If a log file
> + # is needed then redirect the output. There will be a log file created
> + # for each run in the corresponding directory.
> +
> # #######################################################################
> # Set the logging levels.
> # #######################################################################
> @@ -949,6 +1057,26 @@ if __name__ == "__main__":
> # script running.
> writeToFile(PATH_TO_PID_FILENAME, str(os.getpid()), createFile=True)
> # #######################################################################
> + # Verify they want to continue because this script will trigger sysrq events.
> + # #######################################################################
> + if (not cmdLineOpts.disableQuestions):
> + valid = {"yes":True, "y":True, "no":False, "n":False}
> + question = "This script will trigger a sysrq -t event or collect the data for each pid directory located in /proc for each run. Are you sure you want to continue?"
> + prompt = " [y/n] "
> + while True:
> + sys.stdout.write(question + prompt)
> + choice = raw_input().lower()
> + if (choice in valid):
> + if (valid.get(choice)):
> + # If yes, or y then exit loop and continue.
> + break
> + else:
> + message = "The script will not continue since you chose not to continue."
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + exitScript(removePidFile=True, errorCode=1)
> + else:
> + sys.stdout.write("Please respond with '(y)es' or '(n)o'.\n")
> + # #######################################################################
> # Get the clusternode name and verify that mounted GFS2 filesystems were
> # found.
> # #######################################################################
> @@ -976,8 +1104,6 @@ if __name__ == "__main__":
> # proceeding unless it is already created from a previous run data needs
> # to be analyzed. Probably could add more debugging on if file or dir.
> # #######################################################################
> - message = "The gathering of the lockdumps will be performed on the clusternode \"%s\" which is part of the cluster \"%s\"." %(clusternode.getClusterNodeName(), clusternode.getClusterName())
> - logging.getLogger(MAIN_LOGGER_NAME).info(message)
> pathToOutputDir = cmdLineOpts.pathToOutputDir
> if (not len(pathToOutputDir) > 0):
> pathToOutputDir = "%s" %(os.path.join("/tmp", "%s-%s-%s" %(time.strftime("%Y-%m-%d_%H%M%S"), clusternode.getClusterNodeName(), os.path.basename(sys.argv[0]))))
> @@ -1000,56 +1126,83 @@ if __name__ == "__main__":
> # Check to see if the debug directory is mounted. If not then
> # log an error.
> # #######################################################################
> - result = verifyDebugFilesystemMounted(cmdLineOpts.enableMountDebugFS)
> - if (not result):
> - message = "Please mount the debug filesystem before running this script. For example: $ mount none -t debugfs %s" %(PATH_TO_DEBUG_DIR)
> + if(mountFilesystem("debugfs", "none", PATH_TO_DEBUG_DIR)):
> + message = "The debug filesystem %s is mounted." %(PATH_TO_DEBUG_DIR)
> + logging.getLogger(MAIN_LOGGER_NAME).info(message)
> + else:
> + message = "There was a problem mounting the debug filesystem: %s" %(PATH_TO_DEBUG_DIR)
> + logging.getLogger(MAIN_LOGGER_NAME).error(message)
> + message = "The debug filesystem is required to be mounted for this script to run."
> logging.getLogger(MAIN_LOGGER_NAME).info(message)
> exitScript(errorCode=1)
> -
> # #######################################################################
> # Gather data and the lockdumps.
> # #######################################################################
> - message = "The process of gathering all the required files will begin before capturing the lockdumps."
> - logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - for i in range(0,cmdLineOpts.numberOfRuns):
> + if (cmdLineOpts.numberOfRuns <= 0):
> + message = "The number of runs should be greater than zero."
> + exitScript(errorCode=1)
> + for i in range(1,(cmdLineOpts.numberOfRuns + 1)):
> # The current log count that will start at 1 and not zero to make it
> # make sense in logs.
> - currentLogRunCount = (i + 1)
> # Add clusternode name under each run dir to make combining multple
> # clusternode gfs2_lockgather data together and all data in each run directory.
> pathToOutputRunDir = os.path.join(pathToOutputDir, "run%d/%s" %(i, clusternode.getClusterNodeName()))
> + # Create the the directory that will be used to capture the data.
> if (not mkdirs(pathToOutputRunDir)):
> exitScript(errorCode=1)
> - # Gather various bits of data from the clusternode.
> - message = "Gathering some general information about the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> + # Set the handler for writing to log file for this run.
> + currentRunFileHandler = None
> + pathToLogFile = os.path.join(pathToOutputRunDir, "%s.log" %(MAIN_LOGGER_NAME))
> + if (((os.access(pathToLogFile, os.W_OK) and os.access("/tmp", os.R_OK))) or (not os.path.exists(pathToLogFile))):
> + currentRunFileHandler = logging.FileHandler(pathToLogFile)
> + currentRunFileHandler.setFormatter(logging.Formatter("%(asctime)s %(levelname)s %(message)s", "%Y-%m-%d %H:%M:%S"))
> + logging.getLogger(MAIN_LOGGER_NAME).addHandler(currentRunFileHandler)
> + message = "Pass (%d/%d): Gathering all the lockdump data." %(i, cmdLineOpts.numberOfRuns)
> logging.getLogger(MAIN_LOGGER_NAME).status(message)
> +
> + # Gather various bits of data from the clusternode.
> + message = "Pass (%d/%d): Gathering general information about the host." %(i, cmdLineOpts.numberOfRuns)
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> gatherGeneralInformation(pathToOutputRunDir)
> - # Trigger sysrq events to capture memory and thread information
> - message = "Triggering the sysrq events for the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> - logging.getLogger(MAIN_LOGGER_NAME).status(message)
> - triggerSysRQEvents()
> + # Going to sleep for 2 seconds, so that TIMESTAMP should be in the
> + # past in the logs so that capturing sysrq data will be guaranteed.
> + time.sleep(2)
> + # Gather the backtraces for all the pids, by grabbing the /proc/<pid
> + # number> or triggering sysrq events to capture task bask traces
> + # from log.
> + message = "Pass (%d/%d): Triggering the sysrq events for the host." %(i, cmdLineOpts.numberOfRuns)
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> + # Gather the data in the /proc/<pid> directory if the file
> + # </proc/<pid>/stack exists. If file exists we will not trigger
> + # sysrq events.
> + pathToPidData = "/proc"
> + if (isProcPidStackEnabled(pathToPidData)):
> + gatherPidData(pathToPidData, os.path.join(pathToOutputRunDir, pathToPidData.strip("/")))
> + else:
> + triggerSysRQEvents()
> # Gather the dlm locks.
> lockDumpType = "dlm"
> - message = "Gathering the %s lock dumps for clusternode %s for run %d/%d." %(lockDumpType.upper(), clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> - logging.getLogger(MAIN_LOGGER_NAME).status(message)
> + message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> gatherDLMLockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames(includeClusterName=False))
> # Gather the glock locks from gfs2.
> lockDumpType = "gfs2"
> - message = "Gathering the %s lock dumps for clusternode %s for run %d/%d." %(lockDumpType.upper(), clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> - logging.getLogger(MAIN_LOGGER_NAME).status(message)
> + message = "Pass (%d/%d): Gathering the %s lock dumps for the host." %(i, cmdLineOpts.numberOfRuns, lockDumpType.upper())
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> gatherGFS2LockDumps(pathToOutputRunDir, clusternode.getMountedGFS2FilesystemNames())
> # Gather log files
> - message = "Gathering the log files for the clusternode %s for run %d/%d." %(clusternode.getClusterNodeName(), currentLogRunCount, cmdLineOpts.numberOfRuns)
> - logging.getLogger(MAIN_LOGGER_NAME).status(message)
> + message = "Pass (%d/%d): Gathering the log files for the host." %(i, cmdLineOpts.numberOfRuns)
> + logging.getLogger(MAIN_LOGGER_NAME).debug(message)
> gatherLogs(os.path.join(pathToOutputRunDir, "logs"))
> # Sleep between each run if secondsToSleep is greater than or equal
> # to 0 and current run is not the last run.
> - if ((cmdLineOpts.secondsToSleep >= 0) and (i < (cmdLineOpts.numberOfRuns - 1))):
> - message = "The script will sleep for %d seconds between each run of capturing the lockdumps." %(cmdLineOpts.secondsToSleep)
> + if ((cmdLineOpts.secondsToSleep >= 0) and (i <= (cmdLineOpts.numberOfRuns))):
> + message = "The script will sleep for %d seconds between each run of capturing the lockdump data." %(cmdLineOpts.secondsToSleep)
> logging.getLogger(MAIN_LOGGER_NAME).info(message)
> - message = "The script is sleeping before beginning the next run."
> - logging.getLogger(MAIN_LOGGER_NAME).status(message)
> time.sleep(cmdLineOpts.secondsToSleep)
> + # Remove the handler:
> + logging.getLogger(MAIN_LOGGER_NAME).removeHandler(currentRunFileHandler)
> +
> # #######################################################################
> # Archive the directory that contains all the data and archive it after
> # all the information has been gathered.
> diff --git a/gfs2/man/Makefile.am b/gfs2/man/Makefile.am
> index 83d6251..8655a76 100644
> --- a/gfs2/man/Makefile.am
> +++ b/gfs2/man/Makefile.am
> @@ -7,4 +7,5 @@ dist_man_MANS = fsck.gfs2.8 \
> gfs2_grow.8 \
> gfs2_jadd.8 \
> mkfs.gfs2.8 \
> - tunegfs2.8
> + tunegfs2.8 \
> + gfs2_lockcapture.8
> diff --git a/gfs2/man/gfs2_lockcapture.8 b/gfs2/man/gfs2_lockcapture.8
> new file mode 100644
> index 0000000..854cd71
> --- /dev/null
> +++ b/gfs2/man/gfs2_lockcapture.8
> @@ -0,0 +1,53 @@
> +.TH gfs2_lockcapture 8
> +
> +.SH NAME
> +gfs2_lockcapture \- will capture locking information from GFS2 file systems and DLM.
> +
> +.SH SYNOPSIS
> +.B gfs2_lockcapture \fR[-dqyt] [-o \fIoutput directory]\fR [-r \fInumber of runs]\fR [-s \fIseconds to sleep]\fR [-n \fIname of GFS2 filesystem]\fP
> +.PP
> +.B gfs2_lockcapture \fR[-dqyi]
> +
> +.SH DESCRIPTION
> +\fIgfs2_lockcapture\fR is used to capture all the GFS2 lockdump data and
> +corresponding DLM data. The command can be configured to capture the data
> +multiple times and how much time to sleep between each iteration of capturing
> +the data. By default all of the mounted GFS2 filesystems will have their data
> +collected unless GFS2 filesystems are specified.
> +.PP
> +Please note that sysrq -t and -m events are trigger or the pid directories in /proc are
> +collected on each iteration of capturing the data.
> +
> +.SH OPTIONS
> +.TP
> +\fB-h, --help\fP
> +Prints out a short usage message and exits.
> +.TP
> +\fB-d, --debug\fP
> +enables debug logging.
> +.TP
> +\fB-q, --quiet\fP
> +disables logging to console.
> +.TP
> +\fB-y, --no_ask\fP
> +disables all questions and assumes yes.
> +.TP
> +\fB-i, --info\fP
> +prints information about the mounted GFS2 file systems.
> +.TP
> +\fB-t, --archive\fP
> +the output directory will be archived(tar) and compressed(.bz2).
> +.TP
> +\fB-o \fI<output directory>, \fB--path_to_output_dir\fR=\fI<output directory>\fP
> +the directory where all the collect data will stored.
> +.TP
> +\fB-r \fI<number of runs>, \fB--num_of_runs\fR=\fI<number of runs>\fP
> +number of runs capturing the lockdump data.
> +.TP
> +\fB-s \fI<seconds to sleep>, \fB--seconds_sleep\fR=\fI<seconds to sleep>\fP
> +number of seconds to sleep between runs of capturing the lockdump data.
> +.TP
> +\fB-n \fI<name of GFS2 filesystem>, \fB--fs_name\fR=\fI<name of GFS2 filesystem>\fP
> +name of the GFS2 filesystem(s) that will have their lockdump data captured.
> +.
> +.SH SEE ALSO
>
prev parent reply other threads:[~2012-12-14 13:37 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-13 15:14 [Cluster-devel] [PATCH] gfs2-lockcapture: Modified some of the data gathered sbradley
2012-12-14 13:37 ` Andrew Price [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50CB2BAB.4040707@redhat.com \
--to=anprice@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).