From: Muralidhara M K <muralimk@amd.com>
To: <linux-edac@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <bp@alien8.de>,
<mchehab@kernel.org>, Muralidhara M K <muralidhara.mk@amd.com>
Subject: [PATCH 0/4] Persist FRU memory poisons
Date: Wed, 29 Nov 2023 07:50:30 +0000 [thread overview]
Message-ID: <20231129075034.2159223-1-muralimk@amd.com> (raw)
From: Muralidhara M K <muralidhara.mk@amd.com>
This patch set is based on the patches submitted
https://lore.kernel.org/linux-edac/20231129073521.2127403-1-muralimk@amd.com/T/#t
MI300A has on-die HBMv3 memory embedded on to socket. Upon reaching threshold
of memory errors socket has to be replaced. Define the criteria to identify the
Field Replicable Unit(FRU) based on number of poisoned pages in the socket by
persisting them in a non-volatile storage.
Notifier is registered to handle the FRU memory poisons and poison count
incremented based on injected MCE errors until it reaches maximum number of
fru poison entries.
Sysfs entry per FRU will ease the use to look into the poison details.
During boot, Read the ERST records for identifying the poison address and
retire all system physical addresses in that HBM row.
Patch 1:
Add an API to get the maximum CPER record size to be stored in NV storage
Patch 2:
Add FRU memory poison module
Patch 3:
Add sysfs entry to print the required error information from poison records
Patch 4:
Add documentation on FRU memory poisons.
Muralidhara M K (4):
ACPI/APEI: Add erst_get_size() API
RAS/fmp: Add FRU memory poison CPER support for Error persistence
EDAC/amd64: Add sysfs entry to read FRU poison data
RAS/fmp: Add Documentation on Persistence of FRU memory poisons
Documentation/RAS/ras.rst | 122 +++++++
MAINTAINERS | 8 +
drivers/acpi/apei/erst.c | 9 +
drivers/edac/amd64_edac.c | 25 ++
drivers/ras/Kconfig | 1 +
drivers/ras/Makefile | 1 +
drivers/ras/fmp/Kconfig | 18 +
drivers/ras/fmp/Makefile | 10 +
drivers/ras/fmp/fru_mem_poison.c | 595 +++++++++++++++++++++++++++++++
include/acpi/apei.h | 1 +
include/linux/cper.h | 24 ++
include/linux/fru_mem_poison.h | 17 +
12 files changed, 831 insertions(+)
create mode 100644 drivers/ras/fmp/Kconfig
create mode 100644 drivers/ras/fmp/Makefile
create mode 100644 drivers/ras/fmp/fru_mem_poison.c
create mode 100644 include/linux/fru_mem_poison.h
--
2.25.1
next reply other threads:[~2023-11-29 7:50 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-29 7:50 Muralidhara M K [this message]
2023-11-29 7:50 ` [PATCH 1/4] ACPI/APEI: Add erst_get_size() API Muralidhara M K
2023-11-29 7:50 ` [PATCH 2/4] RAS/fmp: Add FRU memory poison CPER support for Error persistence Muralidhara M K
2023-11-29 7:50 ` [PATCH 3/4] EDAC/amd64: Add sysfs entry to read FRU poison data Muralidhara M K
2023-11-29 7:50 ` [PATCH 4/4] RAS/fmp: Add Documentation on Persistence of FRU memory poisons Muralidhara M K
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231129075034.2159223-1-muralimk@amd.com \
--to=muralimk@amd.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=muralidhara.mk@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox