From: Srinivasulu Opensrc <sthanneeru.opensrc@micron.com>
To: "mchehab@kernel.org" <mchehab@kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Cc: Srinivasulu Thanneeru <sthanneeru@micron.com>,
Ajay Joshi <ajayjoshi@micron.com>,
Senthil Thangaraj <sthangaraj@micron.com>,
"Vandana Salve" <vsalve@micron.com>
Subject: [RFC PATCH] rasdaemon: Add page offline support for cxl memory
Date: Mon, 14 Oct 2024 10:10:37 +0000 [thread overview]
Message-ID: <a4cdc0ddd56c450c9bfa1d950a3a37ac@micron.com> (raw)
From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
CXL Type 3 device implements a threshold for corrected errors as described in
CXL 3.1 specification section 8.2.9.9.11.3. Device can set the threshold field
in the DRAM event descriptor when it detects corrected errors that meet or
exceed the threshold value.
This patch is intended to offline pages for corrected memory errors when the
device sets the threshold in the DRAM event descriptor.
This helps prevent corrected errors from becoming uncorrected.
Record the hpa for given dpa, then do page offline for hpa when corrected
errors threshold is set.
Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
---
ras-cxl-handler.c | 14 ++++++++++++++
ras-page-isolation.c | 7 +++++++
ras-page-isolation.h | 1 +
ras-record.h | 1 +
4 files changed, 23 insertions(+)
diff --git a/ras-cxl-handler.c b/ras-cxl-handler.c
index 037c19c..c163c6f 100644
--- a/ras-cxl-handler.c
+++ b/ras-cxl-handler.c
@@ -13,6 +13,7 @@
#include "ras-cxl-handler.h"
#include "ras-logger.h"
+#include "ras-page-isolation.h"
#include "ras-record.h"
#include "ras-report.h"
#include "types.h"
@@ -897,6 +898,12 @@ int ras_cxl_dram_event_handler(struct trace_seq *s,
if (trace_seq_printf(s, "dpa:0x%llx ", (unsigned long long)ev.dpa) <= 0)
return -1;
+ if (tep_get_field_val(s, event, "hpa", record, &val, 1) < 0)
+ return -1;
+ ev.hpa = val;
+ if (trace_seq_printf(s, "hpa:0x%llx ", (unsigned long long)ev.hpa) <= 0)
+ return -1;
+
if (tep_get_field_val(s, event, "dpa_flags", record, &val, 1) < 0)
return -1;
ev.dpa_flags = val;
@@ -1005,6 +1012,13 @@ int ras_cxl_dram_event_handler(struct trace_seq *s,
}
}
+#ifdef HAVE_MEMORY_CE_PFA
+ /* Page offline for CE when threeshold is set */
+ if (!(ev.descriptor & CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT) &&
+ (ev.descriptor & CXL_GMER_EVT_DESC_THRESHOLD_EVENT))
+ ras_hw_threshold_pageoffline(ev.hpa);
+#endif
+
/* Insert data into the SGBD */
#ifdef HAVE_SQLITE3
ras_store_cxl_dram_event(ras, &ev);
diff --git a/ras-page-isolation.c b/ras-page-isolation.c
index bb6b777..6eb45d0 100644
--- a/ras-page-isolation.c
+++ b/ras-page-isolation.c
@@ -338,3 +338,10 @@ void ras_record_page_error(unsigned long long addr, unsigned int count, time_t t
page_record(pr, count, time);
}
}
+
+void ras_hw_threshold_pageoffline(unsigned long long addr)
+{
+ time_t now = time(NULL);
+
+ ras_record_page_error(addr, threshold.val, now);
+}
diff --git a/ras-page-isolation.h b/ras-page-isolation.h
index 73c9157..ed2f661 100644
--- a/ras-page-isolation.h
+++ b/ras-page-isolation.h
@@ -57,5 +57,6 @@ struct isolation {
void ras_page_account_init(void);
void ras_record_page_error(unsigned long long addr,
unsigned int count, time_t time);
+void ras_hw_threshold_pageoffline(unsigned long long addr);
#endif
diff --git a/ras-record.h b/ras-record.h
index bd861ff..d4969d1 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -203,6 +203,7 @@ struct ras_cxl_general_media_event {
struct ras_cxl_dram_event {
struct ras_cxl_event_common_hdr hdr;
uint64_t dpa;
+ uint64_t hpa;
uint8_t dpa_flags;
uint8_t descriptor;
uint8_t type;
--
2.46.2
next reply other threads:[~2024-10-14 10:10 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-14 10:10 Srinivasulu Opensrc [this message]
2024-10-16 10:46 ` [RFC PATCH] rasdaemon: Add page offline support for cxl memory Shiju Jose
2024-10-23 6:09 ` [EXT] " Srinivasulu Opensrc
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a4cdc0ddd56c450c9bfa1d950a3a37ac@micron.com \
--to=sthanneeru.opensrc@micron.com \
--cc=ajayjoshi@micron.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=mchehab@kernel.org \
--cc=sthangaraj@micron.com \
--cc=sthanneeru@micron.com \
--cc=vsalve@micron.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox