* Re: Re: RFC: MCA/MCE concept
[not found] <200705301310.18574.Christoph.Egger@amd.com>
@ 2007-05-30 14:00 ` Gavin Maltby
0 siblings, 0 replies; only message in thread
From: Gavin Maltby @ 2007-05-30 14:00 UTC (permalink / raw)
To: xen-devel
Hi,
Apologies for the screwy quoting below - I did not receive the first half of this
thread so it's been forwarded to me.
>>>
>>> - Dom0 got enough CEs so that UEs are very likely to happen in order
>>> to "circumvent" UEs.
The greatest rewards here are in syndrome/row/column/bank analysis of the
error stream. Where something like a bad pin produces tonnes of CEs
they are always on the same bit and your chance of a UE is that of a random
radiation type CE colliding within the set of ECC checkwords being undermined
by that pin - not very high. On the other hand if we're seeing repeated
distinct syndromes from the same chip-select (or chip-select in a pair)
then there is a good chance they could collide "soon" - our data is that
this combination predicts a UE within hours to a few days. If you have
row/column/bank decoding you can also perform further analysis of the
error source and assess the chances of a collision that would produce a UE.
That example has DIMM memory in mind, but similar approaches apply to
cache memory where it is ECC protected and so on.
>>> - Possible operations on a DomU
>>> - save/restore DomU
>>> - (live-)migrate DomU to a different physical machine
>>> - etc.
>> Very heavy-weight operations, which I think are unlikely to succeed if
>> you already suspect the system's going to suffer a UE soon.
As above, some predictors can give you hours to a few days warning of a UE.
Gavin
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2007-05-30 14:00 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <200705301310.18574.Christoph.Egger@amd.com>
2007-05-30 14:00 ` Re: RFC: MCA/MCE concept Gavin Maltby
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.