patches.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"patches@lists.linux.dev" <patches@lists.linux.dev>,
	Yazen Ghannam <yazen.ghannam@amd.com>
Subject: RE: [PATCH] RAS/CEC: Reduce default threshold to offline a page to "2"
Date: Mon, 27 Jun 2022 17:27:57 +0000	[thread overview]
Message-ID: <7da92773f7084c57814f7ef4d033bc53@intel.com> (raw)
In-Reply-To: <YrnBWjkX82OhXAtL@zn.tnic>

>> 1) Change threshold to "2".
>
> Kinda unconditional that... we haven't talked to other vendors even.

Existing default is 1023 ... which is not a good choice for anyone (except
perhaps ostriches that want to bury their heads in the sand an ignore marginal
DIMMs for as long as possible).

So changing the threshold to "2" would be an improvement in at least being right for
one vendor, instead of wrong for all.

If someone comes up with a different value for another CPU or DIMM vendor
combination ... would we have the RAS_CEC driver check boot_cpu_data.x86_vendor
and SMBIOS to set a different default?

>> 2) Do very smart platform dependent things
>
> If you mean AI, that probably won't happen in the kernel.

Agreed. You don't even need the "probably". This isn't kernel material.

Linux already had a hook in the GHES code to take an error record from
the platform and offline a page. So this "smart" code could be done
by BIOS or BMC just providing the resulting list of pages that should
be taken offline to Linux.

-Tony

  reply	other threads:[~2022-06-27 17:27 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 21:20 [PATCH] RAS/CEC: Reduce default threshold to offline a page to "2" Tony Luck
2022-06-27 14:40 ` Borislav Petkov
2022-06-27 17:27   ` Luck, Tony [this message]
2022-06-28 15:59     ` Borislav Petkov
2022-06-28 16:51       ` Luck, Tony
2022-06-30  7:11         ` Borislav Petkov
2022-06-30 17:02           ` Luck, Tony
2022-07-01  8:49             ` Borislav Petkov
2022-07-01 16:44               ` Luck, Tony
2022-07-01 19:12                 ` [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems Tony Luck
2022-08-02 12:07                   ` Yazen Ghannam
2022-08-02 16:18                     ` [PATCH v2] " Tony Luck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7da92773f7084c57814f7ef4d033bc53@intel.com \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=x86@kernel.org \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).