public inbox for linux-edac@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] rasdaemon: Add page offline support for cxl memory
@ 2024-10-14 10:10 Srinivasulu Opensrc
  2024-10-16 10:46 ` Shiju Jose
  0 siblings, 1 reply; 3+ messages in thread
From: Srinivasulu Opensrc @ 2024-10-14 10:10 UTC (permalink / raw)
  To: mchehab@kernel.org, linux-edac@vger.kernel.org,
	linux-cxl@vger.kernel.org
  Cc: Srinivasulu Thanneeru, Ajay Joshi, Senthil Thangaraj,
	Vandana Salve

From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>

CXL Type 3 device implements a threshold for corrected errors as described in
CXL 3.1 specification section 8.2.9.9.11.3. Device can set the threshold field
in the DRAM event descriptor when it detects corrected errors that meet or
exceed the threshold value.

This patch is intended to offline pages for corrected memory errors when the
device sets the threshold in the DRAM event descriptor.
This helps prevent corrected errors from becoming uncorrected.

Record the hpa for given dpa, then do page offline for hpa when corrected
errors threshold is set.

Signed-off-by: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
---
 ras-cxl-handler.c    | 14 ++++++++++++++
 ras-page-isolation.c |  7 +++++++
 ras-page-isolation.h |  1 +
 ras-record.h         |  1 +
 4 files changed, 23 insertions(+)

diff --git a/ras-cxl-handler.c b/ras-cxl-handler.c
index 037c19c..c163c6f 100644
--- a/ras-cxl-handler.c
+++ b/ras-cxl-handler.c
@@ -13,6 +13,7 @@
 
 #include "ras-cxl-handler.h"
 #include "ras-logger.h"
+#include "ras-page-isolation.h"
 #include "ras-record.h"
 #include "ras-report.h"
 #include "types.h"
@@ -897,6 +898,12 @@ int ras_cxl_dram_event_handler(struct trace_seq *s,
 	if (trace_seq_printf(s, "dpa:0x%llx ", (unsigned long long)ev.dpa) <= 0)
 		return -1;
 
+	if (tep_get_field_val(s, event, "hpa", record, &val, 1) < 0)
+		return -1;
+	ev.hpa = val;
+	if (trace_seq_printf(s, "hpa:0x%llx ", (unsigned long long)ev.hpa) <= 0)
+		return -1;
+
 	if (tep_get_field_val(s,  event, "dpa_flags", record, &val, 1) < 0)
 		return -1;
 	ev.dpa_flags = val;
@@ -1005,6 +1012,13 @@ int ras_cxl_dram_event_handler(struct trace_seq *s,
 		}
 	}
 
+#ifdef HAVE_MEMORY_CE_PFA
+	/* Page offline for CE when threeshold is set */
+	if (!(ev.descriptor & CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT) &&
+	     (ev.descriptor & CXL_GMER_EVT_DESC_THRESHOLD_EVENT))
+		ras_hw_threshold_pageoffline(ev.hpa);
+#endif
+
 	/* Insert data into the SGBD */
 #ifdef HAVE_SQLITE3
 	ras_store_cxl_dram_event(ras, &ev);
diff --git a/ras-page-isolation.c b/ras-page-isolation.c
index bb6b777..6eb45d0 100644
--- a/ras-page-isolation.c
+++ b/ras-page-isolation.c
@@ -338,3 +338,10 @@ void ras_record_page_error(unsigned long long addr, unsigned int count, time_t t
 		page_record(pr, count, time);
 	}
 }
+
+void ras_hw_threshold_pageoffline(unsigned long long addr)
+{
+	time_t now = time(NULL);
+
+	ras_record_page_error(addr, threshold.val, now);
+}
diff --git a/ras-page-isolation.h b/ras-page-isolation.h
index 73c9157..ed2f661 100644
--- a/ras-page-isolation.h
+++ b/ras-page-isolation.h
@@ -57,5 +57,6 @@ struct isolation {
 void ras_page_account_init(void);
 void ras_record_page_error(unsigned long long addr,
 			   unsigned int count, time_t time);
+void ras_hw_threshold_pageoffline(unsigned long long addr);
 
 #endif
diff --git a/ras-record.h b/ras-record.h
index bd861ff..d4969d1 100644
--- a/ras-record.h
+++ b/ras-record.h
@@ -203,6 +203,7 @@ struct ras_cxl_general_media_event {
 struct ras_cxl_dram_event {
 	struct ras_cxl_event_common_hdr hdr;
 	uint64_t dpa;
+	uint64_t hpa;
 	uint8_t dpa_flags;
 	uint8_t descriptor;
 	uint8_t type;
-- 
2.46.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-10-23  6:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-14 10:10 [RFC PATCH] rasdaemon: Add page offline support for cxl memory Srinivasulu Opensrc
2024-10-16 10:46 ` Shiju Jose
2024-10-23  6:09   ` [EXT] " Srinivasulu Opensrc

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox