From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: X86 ML <x86@kernel.org>, linux-edac <linux-edac@vger.kernel.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] RAS: Add a Corrected Errors Collector
Date: Thu, 23 Mar 2017 10:20:31 -0700 [thread overview]
Message-ID: <20170323172030.GA31747@intel.com> (raw)
In-Reply-To: <20170323152228.szeqtuc6umr5knh7@pd.tnic>
On Thu, Mar 23, 2017 at 04:22:28PM +0100, Borislav Petkov wrote:
> On Wed, Mar 22, 2017 at 07:03:39PM +0100, Borislav Petkov wrote:
> > Lemme try to write a small script exercising exactly that scenario to
> > see whether I'm actually not talking crap here :-)
>
> Ok, here's a snapshot from the CEC after letting it run for a couple of
> hours in a guest with a script running twice in parallel and injecting
> random PFNs. We have 0 offlined pages because a PFN number doesn't
> repeat frequently enough to cause an overflow.
>
> When I force the occurrence of a single PFN for 1023 and more times and
> do that more than once, this happens:
>
> [ 6629.091239] RAS: Soft-offlining pfn: 0x7fff
> [ 6629.093036] __get_any_page: 0x7fff free buddy page
> [ 6653.259476] RAS: Soft-offlining pfn: 0x7fff
> [ 6653.260100] soft offline: 0x7fff page already poisoned
>
> ...
>
> Stats:
> CEs: 32614
> offlined pages: 2
> ^^^^^^^^^^^^^^^^^
>
> Flags: 0x0
> Timer interval: 86400 seconds
> Decays: 254
> Action threshold: 1023
>
> The "already poisoned" thing shouldn't happen in real life because once
> the page frame is poisoned, it shouldn't generate MCEs.
It can happen if Linux didn't actually take the page offline
(because it was a kernel page). The CEC code only knows that
it queued this page to be taken offline ... and has no way
to know if that succeeded or not.
Some people have grumbled about mcelog(8) doing the same thing.
So is it worth keeping track of the page numbers that we
tried to offline? If they show up again we shouldn't add
them back into the array.
-Tony
next prev parent reply other threads:[~2017-03-23 17:20 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-09 10:08 [PATCH 0/4] RAS: Add CEC collector and deprecate mcelog Borislav Petkov
2017-03-09 10:08 ` [PATCH 1/4] x86/MCE: Rename mce_log()'s argument Borislav Petkov
2017-03-09 10:08 ` [PATCH 2/4] x86/MCE: Rename mce_log to mce_log_buffer Borislav Petkov
2017-03-09 10:08 ` [PATCH 3/4] RAS: Add a Corrected Errors Collector Borislav Petkov
2017-03-12 13:43 ` Boris Petkov
2017-03-20 22:48 ` Luck, Tony
2017-03-22 18:03 ` Borislav Petkov
2017-03-23 15:22 ` Borislav Petkov
2017-03-23 17:20 ` Luck, Tony [this message]
2017-03-23 17:28 ` Borislav Petkov
2017-03-23 18:20 ` Luck, Tony
2017-03-24 11:09 ` Borislav Petkov
2017-03-22 19:00 ` Luck, Tony
2017-03-22 19:22 ` Borislav Petkov
2017-03-09 10:08 ` [PATCH 4/4] x86/mce: Deprecate /dev/mcelog Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170323172030.GA31747@intel.com \
--to=tony.luck@intel.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.