patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"patches@lists.linux.dev" <patches@lists.linux.dev>,
	Yazen Ghannam <yazen.ghannam@amd.com>
Subject: RE: [PATCH] RAS/CEC: Reduce default threshold to offline a page to "2"
Date: Tue, 28 Jun 2022 16:51:49 +0000	[thread overview]
Message-ID: <8f580a2544d846c69c9941e151fa7cc3@intel.com> (raw)
In-Reply-To: <Yrsleko0MnGtwaaR@zn.tnic>

>> Existing default is 1023 ... which is not a good choice for anyone (except
>> perhaps ostriches that want to bury their heads in the sand an ignore marginal
>> DIMMs for as long as possible).
>
>Why isn't that a good choice?

It fails to use the capabilities of h/w an Linux to avoid a fatal error in the future.
Corrected errors are (sometimes) a predictor of marginal/aging memory. Copying
data out of a failing page while there are just corrected errors can avoid losing
that whole page later.

A single error is plausibly a particle strike causing a bit flip. But a second error
in the same page is a long shot (my desktop has 64G of memory, so 16 million
pages ... that's an awful lot of other targets for a second particle strike).

>I'm sure there are error rates where this fits just fine.

Explain further. Apart from the "ostrich" case I'm not sure what they are.

>> So changing the threshold to "2" would be an improvement in at least
>> being right for one vendor, instead of wrong for all.
>
>So I'm pretty sure that is not needed on AMD at all.

It's far more a property of DIMMs than of the CPU. Unless AMD are using
some DECTED or better level of ECC for memory.

-Tony

  reply	other threads:[~2022-06-28 16:51 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 21:20 [PATCH] RAS/CEC: Reduce default threshold to offline a page to "2" Tony Luck
2022-06-27 14:40 ` Borislav Petkov
2022-06-27 17:27   ` Luck, Tony
2022-06-28 15:59     ` Borislav Petkov
2022-06-28 16:51       ` Luck, Tony [this message]
2022-06-30  7:11         ` Borislav Petkov
2022-06-30 17:02           ` Luck, Tony
2022-07-01  8:49             ` Borislav Petkov
2022-07-01 16:44               ` Luck, Tony
2022-07-01 19:12                 ` [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems Tony Luck
2022-08-02 12:07                   ` Yazen Ghannam
2022-08-02 16:18                     ` [PATCH v2] " Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8f580a2544d846c69c9941e151fa7cc3@intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).