All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jack Steiner <steiner@sgi.com>
To: linux-ia64@vger.kernel.org
Subject: Re: Preserving CMC/CPE records across reboot
Date: Fri, 13 Jan 2006 15:57:02 +0000	[thread overview]
Message-ID: <20060113155702.GB17542@sgi.com> (raw)
In-Reply-To: <14947.1137113189@ocs3.ocs.com.au>

On Fri, Jan 13, 2006 at 11:46:29AM +1100, Keith Owens wrote:
> CMC/CPE records (unlike MCA/INIT) are copied into kernel space and
> cleared from NVRAM as soon as they occur.  That decision was made by
> Bjorn Helgaas some years ago.  The idea is that if you do not have
> salinfo_decode or some equivalent program running then the correctable
> errors still need to be deleted from NVRAM.  But if the system hangs
> while reading the CMC/CPE then we get no data at all.
> 
> SGI just had an example of this.  A cpu took a CMC, salinfo_decode
> started running and hung while processing the CMC record, the system
> had to be rebooted.  Because the CMC record had been cleared from NVRAM
> before handing a copy to salinfo_decode, the contents were lost.

On SN, CMC/CPE records are never written to NVRAM. They are saved only
in memory. If the system hangs trying to log a CMC/CPE & the system is reset,
all CMC/CPE records are lost.

It is possible that some of this could be changed but it currently works 
this way. Also, writing error records to NVRAM is slow - something to
avoid on performance critical paths. I suppose we could threshhold the
error rate & would limit the rate of writing to NVRAM.



> 
> We should be able to keep the first few CMC/CPE records for each cpu in
> NVRAM and discard the later ones if we start getting a backlog.  Then
> if the system hangs while processing a CMC/CPE, the data will still be
> available in NVRAM and will be processed on the next boot.  If the
> reboot hangs again in salinfo processing then we have a solid error,
> either cpu or SAL, so switch the offending cpu out of the system.
> 
> Any objections from other platforms?
> 

  parent reply	other threads:[~2006-01-13 15:57 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-01-13  0:46 Preserving CMC/CPE records across reboot Keith Owens
2006-01-13 15:05 ` Alex Williamson
2006-01-13 15:57 ` Jack Steiner [this message]
2006-01-13 20:23 ` Luck, Tony
2006-01-14  6:50 ` Keith Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20060113155702.GB17542@sgi.com \
    --to=steiner@sgi.com \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.