From: Ben Woodard <woodard@redhat.com>
To: linux-ia64@vger.kernel.org
Subject: new utility for decoding salinfo records
Date: Tue, 11 Jan 2005 15:46:28 +0000 [thread overview]
Message-ID: <1105458388.22104.7.camel@quince.llnl.gov> (raw)
[-- Attachment #1: Type: text/plain, Size: 8269 bytes --]
Excuse me if this ends up being a duplicate. I mailed this out last
night but for some reason, it hasn't come through. It is not in the
archives nor have I seen it come back through to my mail box.
Here is a new utility for looking into salinfo records. It several
things differently than salinfo_decode. We have found that this helps
considerably in understanding problem on our itanium servers. The
attached patch applies to salinfo-0.7 and does not modify
salinfo_decode's functioning in any way. In fact the only files that are
modifed are the Makefile and the spec file.
Here is the man page which tries to illustrate some of the features
which were designed into salinfo_decode2.
SALINFO_DECODE2(8) Decode Itanium SAL Error Records SALINFO_DECODE2(8)
NAME
salinfo_decode2 - decode Itanium SAL error records
SYNOPSIS
salinfo_decode2 [OPTION]... [FILE | DIRECTORY]...
DESCRIPTION
salinfo_decode2 decodes CMC/CPE/MCA/INIT records obtained from the SAL.
It will take a list of files or directories and print out the requested
information about the salinfo records that are contained within those
files. This is notably different than the salinfo_decode program which
processes only a single record at a time. Experience has shown that it
can be difficult to identify a hardware failure of the type found in
the salinfo logs because the failure results in many salinfo records
being created. salinfo_decode2 allows a system administrator to glance
at a directory full of errors or some subset of files and obtain an
overall impression of how meaningful the errors are. This is done by
turning down the verbosity and generalizing what is there. More
experienced administrators can turn up the verbosity and get
progressively more detailed information.
salinfo_decode2 also has the capability to generate output that is
designed to be easily parsed by a machine. This is useful when you want
to automate monitoring of large numbers of machines. For example,
instead of having scripts notify you every time an ignorable single bit
memory error occurs, the monitoring scripts can easily ignore those
errors and only point out higher priority error conditions.
If no files or directories are specified on the command line, stdin is
read and is assumed to be a SAL record.
salinfo_decode2 also has the advantage that a SAL record from an ia64
can be inspected and analyzed on a non-ia64, non-little endian machine.
For example, a system administrator using an ia32 workstation can
inspect SAL records from an ia64 cluster. The design of the original
salinfo_decode’s internal architecture precludes this kind of cross-
platform utilization.
OPTIONS
-h, --help
Print usage and exit
-V, --version
Print version information and exit
-c, --cmc
Only print cmc records
-p, --cpe
Only print cpe records
-m, --mca
Only print mca records
-i, --init
Only print init records
-d, --dimm-offset
Count dimms starting at 1 not 0. This is useful when the SAL
reports failures starting with 0 but the numbers silk screened
on a the motherboard begin with 1. This helps reduce system
administrator confusion when replacing the memory DIMM.
-o, --cpu-offset
Count cpus starting at 1 not 0. This is useful when the SAL
reports failures starting with 0 but the numbers silk screened
on the motherboard begin with 1. This helps reduce system
administrator confusion when replacing CPUs.
--tiger4
The same as -d & -o. The Intel Tiger 4 motherboard’s silkscreen
counts both CPUs and DIMMs beginning with 1 rather than 0 which
is what the SAL returns.
-f, --forgiving
Be forgiving of errors when opening files and reading data
-r, --recursive
When a database is a directory traverse its sub-directories
-v, --verbosity
Specify the verbosity to print records. Verbosity can be 1-6.
However, as the verbosity increases, the likelihood that the
printing of the detailed information hasn’t been implemented yet
also increases. Patches to remedy this situation are eagerly
accepted. The goal with the progressive levels of verbosity is
to facilitate understanding of records, not just to blurt out
every scrap of available information. Since verbosity 6 is
largely not implemented yet, if you need all of available
information, use the original salinfo_decode.
-s, --scriptable
Output in a machine readable format. This is designed to
facilitate quick and easy shell scripting with the output. Refer
to the examples section for intended use.
EXAMPLES
Pointing salinfo_decode2 at a directory of a few errors with the
verbosity set very low shows that all the errors are mainly
inconsequential:
$ ./salinfo_decode2 -v1 tigertest/
cpe with severity "corrected" occurred at 12:03:08 on Apr 1 2004
cpe with severity "corrected" occurred at 12:03:10 on Apr 1 2004
cpe with severity "corrected" occurred at 12:32:14 on Apr 1 2004
cpe with severity "corrected" occurred at 17:24:44 on Apr 1 2004
Here is an example of how different levels of verbosity present the
same SAL record differently:
$ ./salinfo_decode2 -v1 sample_data/tdev2-2004-04-01-12:03:08-cpu1-cpe0
cpe with severity "corrected" occurred at 12:03:08 on Apr 1 2004
$ ./salinfo_decode2 -v2 sample_data/tdev2-2004-04-01-12:03:08-cpu1-cpe0
record 612413502631444488 contains the following sections: (PCI component) (PCI component) (PCI component) (PCI component) (memory) (platform specific)
$ ./salinfo_decode2 -v3 sample_data/tdev2-2004-04-01-12:03:08-cpu1-cpe0
record 612413502631444488 contains the following sections:
PCI component with (vend/dev) 8086/500 at (Seg/Bus/Dev/Func) 0/255/24/0 reported a fault
PCI component with (vend/dev) 8086/501 at (Seg/Bus/Dev/Func) 0/255/24/1 reported a fault
PCI component with (vend/dev) 8086/502 at (Seg/Bus/Dev/Func) 0/255/24/2 reported a fault
PCI component with (vend/dev) 8086/503 at (Seg/Bus/Dev/Func) 0/255/24/3 reported a fault
Memory fault at (node/card/module/bank/device) 0/0/8/0/0
OEM component with id 0x44fc4766d807e40f reported a fault
Here is an example of how to use the scriptable interface to change the
formatting of the output and to select one record out of many which
match a specific criteria.
$ ./salinfo_decode2 -v1 -s sample_data/ | while read line;do
> eval $line
> if [ "$severity" != "corrected" ];then
> echo $month/$day/$year
> fi
> done
4/1/2004
BUGS
Many levels of verbosity for many types of errors are not yet
implemented. The project reached a state where it did what the users
needed it to do and then I was asked to work on other things. Patches
are greatfully accepted.
AUTHOR
Ben Woodard <woodard@redhat.com>
SEE ALSO
salinfo_decode(8)
Linux Jan 6, 2005 SALINFO_DECODE2(8)
[-- Attachment #2: salinfo_decode2.patch.gz --]
[-- Type: application/x-gzip, Size: 29546 bytes --]
next reply other threads:[~2005-01-11 15:46 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-11 15:46 Ben Woodard [this message]
2005-01-11 19:03 ` new utility for decoding salinfo records David Mosberger
2005-01-11 19:49 ` Luck, Tony
2005-01-11 20:25 ` David Mosberger
2005-01-11 20:26 ` Ben Woodard
2005-01-11 20:53 ` Mark Goodwin
2005-01-11 21:03 ` Ben Woodard
2005-01-11 21:12 ` Ben Woodard
2005-01-11 21:22 ` Russ Anderson
2005-01-11 21:23 ` Luck, Tony
2005-01-11 21:25 ` David Mosberger
2005-01-11 21:36 ` David Mosberger
2005-01-11 21:36 ` Matthias Fouquet-Lapar
2005-01-11 21:37 ` Ben Woodard
2005-01-11 21:42 ` David Mosberger
2005-01-11 21:58 ` Russ Anderson
2005-01-11 22:02 ` David Mosberger
2005-01-11 22:26 ` Matthias Fouquet-Lapar
2005-01-12 4:10 ` Keith Owens
2005-01-12 6:08 ` Luck, Tony
2005-01-12 6:43 ` Keith Owens
2005-01-12 9:34 ` Matthias Fouquet-Lapar
2005-01-12 16:57 ` Ben Woodard
2005-01-12 20:46 ` Keith Owens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1105458388.22104.7.camel@quince.llnl.gov \
--to=woodard@redhat.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox