From: Zoltan Menyhart <Zoltan.Menyhart@bull.net>
To: linux-ia64@vger.kernel.org
Subject: Re: [PATCH] New way of storing MCA/INIT logs
Date: Thu, 06 Mar 2008 10:24:06 +0000 [thread overview]
Message-ID: <47CFC646.2070402@bull.net> (raw)
In-Reply-To: <47CD8142.7050207@bull.net>
Russ Anderson wrote:
> That is not nearly enough. On a large shared memory system multiple
> CPUs can hit the same memory error at the same time (for example).
> There are several test cases in my test environment that cause
> multiple CPUs to go into MCA at the same time. The value needs
> to scale with system size.
These are the consequences of the same bad memory block.
There is no more information about the health of the machine in
N log instances of the same memory error, than in the first one.
Anyway, the HW guys or the maintenance guys will count the events
as a single occurrence of memory failure.
> What happens on boot up, when salinfo reads all the old records?
> Does that "burst" of records all get logged.
The errors coming from the events before the reboot do not go
through the MCA handler. The salinfo side reads them directly by
calling ia64_sal_get_state_info().
>>The probability to have more than that _independent_ events
>>in a small time frame is very very low. Therefore you can
>>afford losing events of the same "burst".
>
> Large systems turn unlikely probabilities into likely.
A rough estimation can be done as follows:
Assume you have an MTBF of 30,000 hours.
The probability of having an MCA in a one minute time frame is less
than 1 / (60 * 30,000) < 10^(-6).
The probability of having two independent errors causing MCAs in
the same one minute time frame is less than 10^(-12).
> That FIXME was to work around a case where all the CPUs rendezvoued but SAL
> did not identify any of the CPUs as monarch.
I agree, I just wanted to mention that it is not sure that the SALs
fully respect the specification. In addition, it is allowed that a
a rendezvous be unsuccessful.
I designed my code not to reckon on successful rendezvous.
> I have a test case that creates that scenario. With your patch and only
> one of the MCAs (at most) end up getting logged in /var/log/salinfo/decoded .
Can you describe, please, what your test does and what is the
expected behavior of the MCA layer?
Another idea: the integration into the salinfo side in not yet quit smooth, :-)
it is the polling that fetches the logs one by one. Please leave 3 periods
for the polling to see all the logs.
Thanks,
Zoltan
next prev parent reply other threads:[~2008-03-06 10:24 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-03-04 17:05 [PATCH] New way of storing MCA/INIT logs Zoltan Menyhart
2008-03-05 0:23 ` Russ Anderson
2008-03-05 13:14 ` Zoltan Menyhart
2008-03-05 16:59 ` Luck, Tony
2008-03-05 18:56 ` Russ Anderson
2008-03-05 23:38 ` Keith Owens
2008-03-06 10:24 ` Zoltan Menyhart [this message]
2008-03-06 13:14 ` Zoltan Menyhart
2008-03-06 17:09 ` Luck, Tony
2008-03-06 17:29 ` Zoltan Menyhart
2008-03-06 17:52 ` Russ Anderson
2008-03-06 21:56 ` Luck, Tony
2008-03-06 22:13 ` Russ Anderson
2008-03-07 12:02 ` Zoltan Menyhart
2008-03-07 16:55 ` Russ Anderson
2008-03-10 9:36 ` Zoltan Menyhart
2008-03-10 20:36 ` Russ Anderson
2008-03-10 21:10 ` Russ Anderson
2008-03-11 14:07 ` Zoltan Menyhart
2008-03-11 14:32 ` Robin Holt
2008-03-11 21:22 ` Russ Anderson
2008-03-12 1:08 ` Keith Owens
2008-03-12 7:42 ` Zoltan Menyhart
2008-04-01 15:18 ` [PATCH] New way of storing MCA/INIT logs - take 2 Zoltan Menyhart
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47CFC646.2070402@bull.net \
--to=zoltan.menyhart@bull.net \
--cc=linux-ia64@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox