From: Jack Steiner <steiner@sgi.com>
To: linux-ia64@vger.kernel.org
Subject: Re: [patch] fix per-CPU MCA mess and make UP kernels work again
Date: Fri, 04 Feb 2005 16:24:22 +0000 [thread overview]
Message-ID: <20050204162422.GD20796@sgi.com> (raw)
In-Reply-To: <16887.1203.470842.161249@napali.hpl.hp.com>
On Fri, Feb 04, 2005 at 02:00:15PM +1100, Keith Owens wrote:
> On Thu, 3 Feb 2005 20:09:57 -0600,
> Jack Steiner <steiner@sgi.com> wrote:
> >On Thu, Feb 03, 2005 at 05:48:26PM -0600, Russ Anderson wrote:
> >> According to the SAL Spec, MCAs are supposed to be handled
> >> one at a time.
> >
> >It has been a long time since I looked, but I thought the
> >spec allowed either implemention, ie. serialize OR all-at-once.
> >
> >Maybe I'm remembering the error handling guide but I know
> >I have seen this somewhere.....
>
> It is ambiguous. Extracts from SAL spec.
>
> 4.1.1 says only one processor gets OS_MCA.
>
> When multiple processors experience machine checks simultaneously,
> SAL selects a "monarch" machine check processor to accumulate all the
> error records at the platform level and continue with the machine
> check processing. "Monarch" status is relevant only for the current
> MCA error event.
>
> 4.7.2 (5) also says only one processor.
>
> 5. SAL selects a monarch for handling the error. All slaves
> processors in SAL_MC_RENDEZ check in their status with the SAL on
> the monarch.
>
> But the last sentence of 4.7.2 (8) refers to multiple processors in OS
> MCA.
>
> 8. SAL finishes the MCA handling on all the processors that are in
> MCA and waits for all the processors in MCA to synchronize before
> branching to OS MCA for further processing. Note that the
> hand-off to OS MCA from SAL MCA occurs simultaneously on all
> processors executing in SAL MCA handler.
>
> 4.7.2 (9) lets the OS choose the monarch, which implies that more than
> one cpu can be in OS MCA handler.
>
> 9. OS_MCA may choose a monarch processor to continue with error
> handling. After OS_MCA completes the error handling, the monarch
> processor wakes up all the slaves through a wake-up message as
> shown by (9) in Figure 4-4
>
> The end of 4.7.3 also implies that OS MCA handler can be running on
> multiple cpus. Note 'on all the processors'.
>
> When multiple processors experience machine checks simultaneously,
> SAL selects a monarch machine check processor to accumulate all the
> error records at the platform level. Once this is done, the OS_MCA
> procedure will take control of further error handling on all the
> processors that experienced the machine checks. The OS_MCA layer may
> need to implement a similar monarch processor selection for the error
> recovery phase. The operating system will be aware of which
> processors invoked the SAL_MC_RENDEZ procedure in response to the
> MC_rendezvous interrupt or the INIT signal and shall wake up those
> processors.
To further muddy the waters, it looks like the latest Error Handling Guide
has addressed the issue:
>> Intel® Itanium® Processor Family Error Handling Guide April 2004
>>
>> Document Number: 249278-003
>>
>> 2.7.1
>>
>> ...
>> The MCA error information is provided to the OS_MCA layer. The MCA
>> error record is logged to the NVM. To simplify SAL implementation, it
>> is strongly recommended that SAL process all MCAs by handing off to the
>> OS as soon as possible to prevent some OSes from experiencing time-outs
>> and potentially crashing the system. >>>> The SAL may maintain a variable in
>> the SAL data area that indicates whether SAL, on one of the processors,
>> is already handling an MCA. If so, MCA processing on other processors will
>> wait within the SAL MCA handler until the current MCA is processed. This
>> situation may arise when local MCAs are experienced on multiple
>> processors. <<<<<<<
However, it says "may maintain a variable...". Should I interpret this as
allowing but not requiring serialization?
--
Thanks
Jack Steiner (steiner@sgi.com) 651-683-5302
Principal Engineer SGI - Silicon Graphics, Inc.
next prev parent reply other threads:[~2005-02-04 16:24 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-01-26 2:47 [patch] fix per-CPU MCA mess and make UP kernels work again David Mosberger
2005-01-26 16:25 ` Jesse Barnes
2005-01-26 17:13 ` Russ Anderson
2005-01-26 17:48 ` David Mosberger
2005-01-26 17:53 ` Jesse Barnes
2005-01-26 18:05 ` David Mosberger
2005-01-26 18:11 ` Jesse Barnes
2005-01-26 19:01 ` Russ Anderson
2005-01-26 19:23 ` Luck, Tony
2005-01-26 20:07 ` David Mosberger
2005-01-26 21:40 ` Russ Anderson
2005-01-26 21:50 ` David Mosberger
2005-01-26 22:13 ` Luck, Tony
2005-01-26 22:16 ` David Mosberger
2005-01-26 22:19 ` Jesse Barnes
2005-01-26 22:33 ` Luck, Tony
2005-01-27 0:40 ` David Mosberger
2005-01-27 0:55 ` Luck, Tony
2005-01-28 22:54 ` Russ Anderson
2005-02-02 1:04 ` Luck, Tony
2005-02-02 20:25 ` Russ Anderson
2005-02-03 22:48 ` Luck, Tony
2005-02-03 23:48 ` Russ Anderson
2005-02-04 2:09 ` Jack Steiner
2005-02-04 3:00 ` Keith Owens
2005-02-04 16:24 ` Jack Steiner [this message]
2005-02-04 16:34 ` Russ Anderson
2005-02-06 15:58 ` Russ Anderson
2005-02-07 22:58 ` Luck, Tony
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050204162422.GD20796@sgi.com \
--to=steiner@sgi.com \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.