public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
* Preserving CMC/CPE records across reboot
@ 2006-01-13  0:46 Keith Owens
  2006-01-13 15:05 ` Alex Williamson
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Keith Owens @ 2006-01-13  0:46 UTC (permalink / raw)
  To: linux-ia64

CMC/CPE records (unlike MCA/INIT) are copied into kernel space and
cleared from NVRAM as soon as they occur.  That decision was made by
Bjorn Helgaas some years ago.  The idea is that if you do not have
salinfo_decode or some equivalent program running then the correctable
errors still need to be deleted from NVRAM.  But if the system hangs
while reading the CMC/CPE then we get no data at all.

SGI just had an example of this.  A cpu took a CMC, salinfo_decode
started running and hung while processing the CMC record, the system
had to be rebooted.  Because the CMC record had been cleared from NVRAM
before handing a copy to salinfo_decode, the contents were lost.

We should be able to keep the first few CMC/CPE records for each cpu in
NVRAM and discard the later ones if we start getting a backlog.  Then
if the system hangs while processing a CMC/CPE, the data will still be
available in NVRAM and will be processed on the next boot.  If the
reboot hangs again in salinfo processing then we have a solid error,
either cpu or SAL, so switch the offending cpu out of the system.

Any objections from other platforms?


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-01-14  6:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-13  0:46 Preserving CMC/CPE records across reboot Keith Owens
2006-01-13 15:05 ` Alex Williamson
2006-01-13 15:57 ` Jack Steiner
2006-01-13 20:23 ` Luck, Tony
2006-01-14  6:50 ` Keith Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox