patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: Tony Luck <tony.luck@intel.com>
To: yazen.ghannam@amd.com
Cc: tony.luck@intel.com, bp@alien8.de, linux-kernel@vger.kernel.org,
	patches@lists.linux.dev, x86@kernel.org
Subject: [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems
Date: Fri,  1 Jul 2022 12:12:39 -0700	[thread overview]
Message-ID: <20220701191239.619940-1-tony.luck@intel.com> (raw)
In-Reply-To: <a871b8bd35604921b842dcd65aed0f6c@intel.com>

A large scale study of memory errors on Intel systems in data centers
showed that aggressively taking pages with corrected errors offline is
the best strategy of using corrected errors as a predictor of future
uncorrected errors.

It is unknown whether this would help other vendors. There are some
indicators that it would not.

Set the threshold to "2" on Intel systems.

Do-not-apply-without-agreement-from-AMD
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 drivers/ras/cec.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/ras/cec.c b/drivers/ras/cec.c
index 42f2fc0bc8a9..b1fc193b2036 100644
--- a/drivers/ras/cec.c
+++ b/drivers/ras/cec.c
@@ -556,6 +556,14 @@ static int __init cec_init(void)
 	if (ce_arr.disabled)
 		return -ENODEV;
 
+	/*
+	 * Intel systems may avoid uncorreectable errors
+	 * if pages with corrected errors are aggresively
+	 * taken offline.
+	 */
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		action_threshold = 2;
+
 	ce_arr.array = (void *)get_zeroed_page(GFP_KERNEL);
 	if (!ce_arr.array) {
 		pr_err("Error allocating CE array page!\n");
-- 
2.35.3


  reply	other threads:[~2022-07-01 19:12 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 21:20 [PATCH] RAS/CEC: Reduce default threshold to offline a page to "2" Tony Luck
2022-06-27 14:40 ` Borislav Petkov
2022-06-27 17:27   ` Luck, Tony
2022-06-28 15:59     ` Borislav Petkov
2022-06-28 16:51       ` Luck, Tony
2022-06-30  7:11         ` Borislav Petkov
2022-06-30 17:02           ` Luck, Tony
2022-07-01  8:49             ` Borislav Petkov
2022-07-01 16:44               ` Luck, Tony
2022-07-01 19:12                 ` Tony Luck [this message]
2022-08-02 12:07                   ` [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems Yazen Ghannam
2022-08-02 16:18                     ` [PATCH v2] " Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220701191239.619940-1-tony.luck@intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).