From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: "x86@kernel.org" <x86@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"patches@lists.linux.dev" <patches@lists.linux.dev>,
Yazen Ghannam <yazen.ghannam@amd.com>
Subject: RE: [PATCH] RAS/CEC: Reduce default threshold to offline a page to "2"
Date: Mon, 27 Jun 2022 17:27:57 +0000 [thread overview]
Message-ID: <7da92773f7084c57814f7ef4d033bc53@intel.com> (raw)
In-Reply-To: <YrnBWjkX82OhXAtL@zn.tnic>
>> 1) Change threshold to "2".
>
> Kinda unconditional that... we haven't talked to other vendors even.
Existing default is 1023 ... which is not a good choice for anyone (except
perhaps ostriches that want to bury their heads in the sand an ignore marginal
DIMMs for as long as possible).
So changing the threshold to "2" would be an improvement in at least being right for
one vendor, instead of wrong for all.
If someone comes up with a different value for another CPU or DIMM vendor
combination ... would we have the RAS_CEC driver check boot_cpu_data.x86_vendor
and SMBIOS to set a different default?
>> 2) Do very smart platform dependent things
>
> If you mean AI, that probably won't happen in the kernel.
Agreed. You don't even need the "probably". This isn't kernel material.
Linux already had a hook in the GHES code to take an error record from
the platform and offline a page. So this "smart" code could be done
by BIOS or BMC just providing the resulting list of pages that should
be taken offline to Linux.
-Tony
next prev parent reply other threads:[~2022-06-27 17:27 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-07 21:20 [PATCH] RAS/CEC: Reduce default threshold to offline a page to "2" Tony Luck
2022-06-27 14:40 ` Borislav Petkov
2022-06-27 17:27 ` Luck, Tony [this message]
2022-06-28 15:59 ` Borislav Petkov
2022-06-28 16:51 ` Luck, Tony
2022-06-30 7:11 ` Borislav Petkov
2022-06-30 17:02 ` Luck, Tony
2022-07-01 8:49 ` Borislav Petkov
2022-07-01 16:44 ` Luck, Tony
2022-07-01 19:12 ` [PATCH] RAS/CEC: Reduce offline page threshold for Intel systems Tony Luck
2022-08-02 12:07 ` Yazen Ghannam
2022-08-02 16:18 ` [PATCH v2] " Tony Luck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7da92773f7084c57814f7ef4d033bc53@intel.com \
--to=tony.luck@intel.com \
--cc=bp@alien8.de \
--cc=linux-kernel@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=x86@kernel.org \
--cc=yazen.ghannam@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).