public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed
From: Zoltan Menyhart <Zoltan.Menyhart_AT_bull.net@nospam.org>
To: linux-ia64@vger.kernel.org
Subject: Re: Rework arch/ia64/kernel/salinfo.c for 2.4
Date: Tue, 21 Oct 2003 11:49:18 +0000	[thread overview]
Message-ID: <marc-linux-ia64-106673692111884@msgid-missing> (raw)
In-Reply-To: <marc-linux-ia64-106664704821234@msgid-missing>

> >I did see an uncorrectable cache error (MCA) and a corrected
> >memory error (CMC) in a single SAL error log record.
> >Can you sort out such a case ?
> 
> That depends on your SAL implementation.  Does it pass one or two
> records to the OS and how does it pass them?  The OS just does what SAL
> says.

It was on an Intel's Tiger box. I asked for an MCA SAL log record and
I got a single record including a corrected memory error.
I just wanted to warn you that things happen...

> Unless I misread the SAL spec, you can only have one MCA event in the
> OS at a time. MCA rendezvous is a normal interrupt that does not
> generate a record.  At the moment the first MCA is catastrophic and
> requires a reboot, which means that the MCA record is not picked up
> until after the reboot.  If we ever do recovery from MCA then the
> interrupt handler will need to be reviewed but without knowing what the
> recovery model is, it is premature to code for it.

We are thinking of :-) implementing some MCA recovery.
Two cases have been identified:
- translation register errors
- "consuming" poisoned memory data / uncorrectable memory error
They are local, they can happen physically parallel on more than one CPUs.
We cannot clear the SAL log inside of the OS_MCA handler, because we cannot
save the error log in an MCA context. If we did and if the recovery failed,
we would lose this information.
Whatever synchronization is used (e.g. rendezvous) another CPU can start its
MCA processing in the mean time.
We have to re-fetch the SAL log in a process context later, save it and
clear the SAL log. If there are more than non cleared SAL logs there,
their platform related information can be mixed up - see App. note 11763
page 3-3. 

> >- do not "clear" nor "shift" MCA logs
> >- the MCA handler can overwrite the buffer of the CPU on which
> >  it executes
> >- for the "read <n>" command, you may:
> >  + calculate a CRC32 of the buffer[n]
> >  + copy_to_user(buffer[n],...)
> >  + calculate again the CRC32 of the buffer[n] and restart
> >    if it is not the same as before
> 
> Doing a CRC at "read <n>" time is too late, the CRC would have to be
> taken in the interrupt handler.  In any case, the record ID is supposed
> to be unique and is the first field in the record.  Checking that the
> ID is unchanged after taking a copy is sufficient and is much cheaper
> than a CRC check.

  parent reply	other threads:[~2003-10-21 11:49 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-10-20 10:47 Rework arch/ia64/kernel/salinfo.c for 2.4 Keith Owens
2003-10-20 14:38 ` Zoltan Menyhart
2003-10-20 14:53 ` Keith Owens
2003-10-20 23:38 ` Bjorn Helgaas
2003-10-21  0:12 ` Keith Owens
2003-10-21 11:49 ` Zoltan Menyhart [this message]
2003-10-21 11:55 ` Zoltan Menyhart
2003-10-21 12:31 ` Keith Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=marc-linux-ia64-106673692111884@msgid-missing \
    --to=zoltan.menyhart_at_bull.net@nospam.org \
    --cc=linux-ia64@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox