All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: X86 ML <x86@kernel.org>, linux-edac <linux-edac@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 3/4] RAS: Add a Corrected Errors Collector
Date: Mon, 20 Mar 2017 15:48:24 -0700	[thread overview]
Message-ID: <20170320224824.GA27165@intel.com> (raw)
In-Reply-To: <20170309100818.15466-4-bp@alien8.de>

On Thu, Mar 09, 2017 at 11:08:17AM +0100, Borislav Petkov wrote:
> +config RAS_CEC
> +	bool "Correctable Errors Collector"
> +	depends on X86_MCE && MEMORY_FAILURE && DEBUG_FS
> +	---help---
> +	  This is a small cache which collects correctable memory errors per 4K
> +	  page PFN and counts their repeated occurrence. Once the counter for a
> +	  PFN overflows, we try to soft-offline that page as we take it to mean
> +	  that it has reached a relatively high error count and would probably
> +	  be best if we don't use it anymore.

You added "count_threshold" for me ... so the condition isn't quite "overflows"
like it was in the early versions.

We may need to give some thought on what to do if the attempt to offline
the page fails (e.g. because the page belongs to the kernel). Right now
you delete it from the list, but we will see more errors as the page is
still in use. Eventually the counter will hit count_threshold and we will
try to offline again. Rinse, repeat.

Someone also recently sent me a log from a machine with corrected errors
in over 9000 unique addresses. Need a parameter to allocate more than one
page for the collector, or a way to grow the space.

-Tony

  parent reply	other threads:[~2017-03-20 22:48 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-09 10:08 [PATCH 0/4] RAS: Add CEC collector and deprecate mcelog Borislav Petkov
2017-03-09 10:08 ` [PATCH 1/4] x86/MCE: Rename mce_log()'s argument Borislav Petkov
2017-03-09 10:08 ` [PATCH 2/4] x86/MCE: Rename mce_log to mce_log_buffer Borislav Petkov
2017-03-09 10:08 ` [PATCH 3/4] RAS: Add a Corrected Errors Collector Borislav Petkov
2017-03-12 13:43   ` Boris Petkov
2017-03-20 22:48   ` Luck, Tony [this message]
2017-03-22 18:03     ` Borislav Petkov
2017-03-23 15:22       ` Borislav Petkov
2017-03-23 17:20         ` Luck, Tony
2017-03-23 17:28           ` Borislav Petkov
2017-03-23 18:20             ` Luck, Tony
2017-03-24 11:09               ` Borislav Petkov
2017-03-22 19:00   ` Luck, Tony
2017-03-22 19:22     ` Borislav Petkov
2017-03-09 10:08 ` [PATCH 4/4] x86/mce: Deprecate /dev/mcelog Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170320224824.GA27165@intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.