From: Russ Anderson <rja@sgi.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [PATCH] New way of storing MCA/INIT logs
Date: Fri, 07 Mar 2008 16:55:54 +0000 [thread overview]
Message-ID: <20080307165553.GA32384@sgi.com> (raw)
In-Reply-To: <47CD8142.7050207@bull.net>
On Fri, Mar 07, 2008 at 01:02:47PM +0100, Zoltan Menyhart wrote:
> Russ Anderson wrote:
>
> >Figure 2-1 does show SAL passing up CPEI records to OS, too.
>
> Yes, as I also said:
> "The SAL / PAL can be the origin of CPEIs / CMCIs if they succeed
> in correcting MCAs. They stock the related information until the
> OS calls SAL_GET_STATE_INFO()."
>
> I Just want to emphasize that in case of the platform / CPU HW originated
> CPEIs / CMCIs, the SAL does not know of them before we call
> SAL_GET_STATE_INFO(), therefore it cannot store any information about
> them.
In some implementations SAL builds the records in response to
SAL_GET_STATE_INFO(), in other implementations SAL knows of
the CPEI/CMCI and builds/buffers the records before the
SAL_GET_STATE_INFO() call. The SAL spec does not prohibit SAL
building/buffering the records before SAL_GET_STATE_INFO().
From a practical perspective, I don't think the difference significantly
changes how linux should handle CPEIs/CMCIs. Linux should try to read/log
the CPEI/CMCI as quick as possible. The lack of SAL buffering increases
the chance of a record getting lost (overwritten) while SAL buffering
reduces the chance that a CPEI/CMCI record gets lost (overwritten).
If anything, the lack of SAL buffering would be a reason for more
linux buffers, to reduce the chance of losing records.
> >See section 5.3.2 CMC and CPE Records
> >
> > Each processor or physical platform could have multiple valid corrected
> > machine check or corrected platform error records. The maximum number of
> > these records present in a system depends on the SAL implementation and
> > the storage space available on the system. There is no requirement for
> > these records to be logged into NVM. The SAL may use an implementation
> > specific error record replacement algorithm for overflow situations. The
> > OS needs to make an explicit call to the SAL procedure
> > SAL_CLEAR_STATE_INFO
> > to clear the CMC and CPE records in order to free up the memory resources
> > that may be used for future records.
>
> As far as I can understand, it is about the events not signaled by
> interrupts, but MCAs, and either the PAL or the SAL manages to correct
> them (=> CMCI, CPEI).
Agreed that SAL corrected errors can get passed up as CMCI/CPEI.
I do not believe it prohibits other CMCI/CPEI records from being
built/buffered before the SAL_CLEAR_STATE_INFO() call.
As stated above, from a practical perspective, I don't believe the
difference significanlty changes how linux should behave other than
possibly being a reason for more linux buffers.
> You have got N >= 1 buffers for this kind of errors.
My preference is for a larger N. Scaling N with system size
may be the best solution for small & large systems.
> >5.4.1 Corrected Error Event Record
> >
> > In response to a CMC/CPE condition, SAL builds and maintains the error
> > record for OS retrieval.
>
> It does not say that the SAL knows about CMCI / CPEI signaled errors
> before we call SAL_GET_STATE_INFO().
It does not say that SAL cannot know before the SAL_GET_STATE_INFO() call.
> Example: the Tiger box with i82870:
I take your word as how Tiger SAL behaves.
Please take my word that other SAL implementations behave differently.
Thanks,
--
Russ Anderson, OS RAS/Partitioning Project Lead
SGI - Silicon Graphics Inc rja@sgi.com
next prev parent reply other threads:[~2008-03-07 16:55 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-04 17:05 [PATCH] New way of storing MCA/INIT logs Zoltan Menyhart
2008-03-05 0:23 ` Russ Anderson
2008-03-05 13:14 ` Zoltan Menyhart
2008-03-05 16:59 ` Luck, Tony
2008-03-05 18:56 ` Russ Anderson
2008-03-05 23:38 ` Keith Owens
2008-03-06 10:24 ` Zoltan Menyhart
2008-03-06 13:14 ` Zoltan Menyhart
2008-03-06 17:09 ` Luck, Tony
2008-03-06 17:29 ` Zoltan Menyhart
2008-03-06 17:52 ` Russ Anderson
2008-03-06 21:56 ` Luck, Tony
2008-03-06 22:13 ` Russ Anderson
2008-03-07 12:02 ` Zoltan Menyhart
2008-03-07 16:55 ` Russ Anderson [this message]
2008-03-10 9:36 ` Zoltan Menyhart
2008-03-10 20:36 ` Russ Anderson
2008-03-10 21:10 ` Russ Anderson
2008-03-11 14:07 ` Zoltan Menyhart
2008-03-11 14:32 ` Robin Holt
2008-03-11 21:22 ` Russ Anderson
2008-03-12 1:08 ` Keith Owens
2008-03-12 7:42 ` Zoltan Menyhart
2008-04-01 15:18 ` [PATCH] New way of storing MCA/INIT logs - take 2 Zoltan Menyhart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080307165553.GA32384@sgi.com \
--to=rja@sgi.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox