All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kirk Bresniker <kirkb@chrome.rose.hp.com>
To: grundler@cup.hp.com (Grant Grundler)
Cc: prumpf@inwestnet.de, parisc-linux@thepuffingroup.com
Subject: Re: [parisc-linux] Linux syscall ABI
Date: Wed, 16 Feb 2000 1:33:37 PST	[thread overview]
Message-ID: <200002160933.BAA19152@chrome.rose.hp.com> (raw)
In-Reply-To: <200002160234.SAA06672@milano.cup.hp.com>; from "Grant Grundler" at Feb 15, 100 6:34 pm

Grant wrote:

| 
| Given the complexity of the systems, knowing *some* (not all)
| of the HW state is marginally useful at best. When we get
| into debugging driver problems later on, this will be clearer.
| 
| Besides the asynchronous nature of HPMCs, PIMs are unique to each
| class of box. So decoding a PIM on a K-class is quite different
| from the PIM on N or L-class. Only recently have tools been made
| internally available to help decode each type of PIM. I wouldn't
| hold my breath waiting for those to get published.

There are two key take aways from what Grant has said: 

1. There are some platform specific tools which help PIM analysis.  As
   someone who has read literally thousands of PIM dumps over 10 years
   worth of server platforms, and as someone who has contributed some of
   the analysis tools, I would say that the tools only automate the
   decoding of status register values (which are all implementation
   specific). There has never been an expert tool which pulls in a 
   PIM dump and spits out the answer. 

2. The platforms which Grant specified are server platforms, not the
   workstations.  In my experience, you're going to find many more
   people familiar with server PIM dump output than workstations, simply
   because of the threshold of pain of the customer base. A server
   customer is much more concerned with getting a fully analysis of
   each and every failure than a workstation customer.

In general, for real hardware faults, PIM dumps are usually as good 
as the underlying hardware error logging registers in telling an
expert what has gone wrong. But, in this case, when there is an OS or
OS/hardware interaction, the PIM is usually not enough. 

| 
| If linux could learn to dump host memory to disk, then HPMC's would
| a bit easier to debug since one could review data structures for suspect
| code. I think that's what the HPMC handler is intended for - not
| attempt to recover. Attempting to recover from an asyncronous fault
| doesn't sound feasible to me. But what do I know anyway....
| 

I don't know what Grant does (n't) know :), but I second the call for a
core dump.  To give an example of a complex hardware/OS interaction, I
was once debugging a system which was regularly getting OS panics due to
data page faults.  As a hardware engineer I would, as a matter of
principle, blaim software and then firmware.  But, the problem was
actually a double bit error due to a bad SRAM in the instruction cache
which was corrupting an instruction.  I only found this out by comparing
instructions and data in the memory dumps with the data stored in
PIM dumps.

As to recovery from HMPCs, I can only speak to the hardware generated
exceptions.  Most of the hardware generated HPMCs are linked to 
events which calls into question the validity of information. Get a
parity error on a private, dirty cache line? Well that means that there
is no valid copy anywhere. Better to dump PIM and halt immediately
rather than possibly commit bad data to permanent storage.  I think
that you have to be pretty confident to continue with other than
a core dump or tombstone page.

KMB
--
+============================================================+
|       Kirk Bresniker    	(916) 748-2393		     |
|       8000 Foothills Blvd                                  |
|       Roseville, CA 95747-5649                             |
|       kirkb@rose.hp.com                                    |

  reply	other threads:[~2000-02-16 10:31 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-02-15  5:36 [parisc-linux] Linux syscall ABI John Marvin
2000-02-15  6:15 ` willy
2000-02-15 12:50 ` Philipp Rumpf
2000-02-16 14:04   ` [parisc-linux] Location of HIL protocol docs? Brian S. Julin
2000-02-16 18:42     ` Grant Grundler
2000-02-15 17:25 ` [parisc-linux] Linux syscall ABI Grant Grundler
2000-02-15 18:18   ` Philipp Rumpf
2000-02-15 19:15     ` Frank Rowand
2000-02-16  2:34     ` Grant Grundler
2000-02-16  9:33       ` Kirk Bresniker [this message]
  -- strict thread matches above, loose matches on Subject: below --
2000-02-17 14:17 John Marvin
2000-02-16 13:57 John Marvin
2000-02-16 17:41 ` Philipp Rumpf
2000-02-14  9:30 John Marvin
2000-02-14 13:34 ` Philipp Rumpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200002160933.BAA19152@chrome.rose.hp.com \
    --to=kirkb@chrome.rose.hp.com \
    --cc=grundler@cup.hp.com \
    --cc=parisc-linux@thepuffingroup.com \
    --cc=prumpf@inwestnet.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.