From: Russ Anderson <rja@sgi.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [PATCH] New way of storing MCA/INIT logs
Date: Fri, 07 Mar 2008 16:55:54 +0000 [thread overview]
Message-ID: <20080307165553.GA32384@sgi.com> (raw)
In-Reply-To: <47CD8142.7050207@bull.net>
On Fri, Mar 07, 2008 at 01:02:47PM +0100, Zoltan Menyhart wrote:
> Russ Anderson wrote:
>
> >Figure 2-1 does show SAL passing up CPEI records to OS, too.
>
> Yes, as I also said:
> "The SAL / PAL can be the origin of CPEIs / CMCIs if they succeed
> in correcting MCAs. They stock the related information until the
> OS calls SAL_GET_STATE_INFO()."
>
> I Just want to emphasize that in case of the platform / CPU HW originated
> CPEIs / CMCIs, the SAL does not know of them before we call
> SAL_GET_STATE_INFO(), therefore it cannot store any information about
> them.
In some implementations SAL builds the records in response to
SAL_GET_STATE_INFO(), in other implementations SAL knows of
the CPEI/CMCI and builds/buffers the records before the
SAL_GET_STATE_INFO() call. The SAL spec does not prohibit SAL
building/buffering the records before SAL_GET_STATE_INFO().
From a practical perspective, I don't think the difference significantly
changes how linux should handle CPEIs/CMCIs. Linux should try to read/log
the CPEI/CMCI as quick as possible. The lack of SAL buffering increases
the chance of a record getting lost (overwritten) while SAL buffering
reduces the chance that a CPEI/CMCI record gets lost (overwritten).
If anything, the lack of SAL buffering would be a reason for more
linux buffers, to reduce the chance of losing records.
> >See section 5.3.2 CMC and CPE Records
> >
> > Each processor or physical platform could have multiple valid corrected
> > machine check or corrected platform error records. The maximum number of
> > these records present in a system depends on the SAL implementation and
> > the storage space available on the system. There is no requirement for
> > these records to be logged into NVM. The SAL may use an implementation
> > specific error record replacement algorithm for overflow situations. The
> > OS needs to make an explicit call to the SAL procedure
> > SAL_CLEAR_STATE_INFO
> > to clear the CMC and CPE records in order to free up the memory resources
> > that may be used for future records.
>
> As far as I can understand, it is about the events not signaled by
> interrupts, but MCAs, and either the PAL or the SAL manages to correct
> them (=> CMCI, CPEI).
Agreed that SAL corrected errors can get passed up as CMCI/CPEI.
I do not believe it prohibits other CMCI/CPEI records from being
built/buffered before the SAL_CLEAR_STATE_INFO() call.
As stated above, from a practical perspective, I don't believe the
difference significanlty changes how linux should behave other than
possibly being a reason for more linux buffers.
> You have got N >= 1 buffers for this kind of errors.
My preference is for a larger N. Scaling N with system size
may be the best solution for small & large systems.
> >5.4.1 Corrected Error Event Record
> >
> > In response to a CMC/CPE condition, SAL builds and maintains the error
> > record for OS retrieval.
>
> It does not say that the SAL knows about CMCI / CPEI signaled errors
> before we call SAL_GET_STATE_INFO().
It does not say that SAL cannot know before the SAL_GET_STATE_INFO() call.
> Example: the Tiger box with i82870:
I take your word as how Tiger SAL behaves.
Please take my word that other SAL implementations behave differently.
Thanks,
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
next prev parent reply other threads:[~2008-03-07 16:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-04 17:05 [PATCH] New way of storing MCA/INIT logs Zoltan Menyhart
2008-03-05 0:23 ` Russ Anderson
2008-03-05 13:14 ` Zoltan Menyhart
2008-03-05 16:59 ` Luck, Tony
2008-03-05 18:56 ` Russ Anderson
2008-03-05 23:38 ` Keith Owens
2008-03-06 10:24 ` Zoltan Menyhart
2008-03-06 13:14 ` Zoltan Menyhart
2008-03-06 17:09 ` Luck, Tony
2008-03-06 17:29 ` Zoltan Menyhart
2008-03-06 17:52 ` Russ Anderson
2008-03-06 21:56 ` Luck, Tony
2008-03-06 22:13 ` Russ Anderson
2008-03-07 12:02 ` Zoltan Menyhart
2008-03-07 16:55 ` Russ Anderson [this message]
2008-03-10 9:36 ` Zoltan Menyhart
2008-03-10 20:36 ` Russ Anderson
2008-03-10 21:10 ` Russ Anderson
2008-03-11 14:07 ` Zoltan Menyhart
2008-03-11 14:32 ` Robin Holt
2008-03-11 21:22 ` Russ Anderson
2008-03-12 1:08 ` Keith Owens
2008-03-12 7:42 ` Zoltan Menyhart
2008-04-01 15:18 ` [PATCH] New way of storing MCA/INIT logs - take 2 Zoltan Menyhart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080307165553.GA32384@sgi.com \
--to=rja@sgi.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.