qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] cxl: add device reporting poison handler
@ 2024-08-08 15:13 Shiyang Ruan via
  2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
  2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
  0 siblings, 2 replies; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-08 15:13 UTC (permalink / raw)
  To: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
	vishal.l.verma, Jonathan.Cameron, alison.schofield
  Cc: bp, dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
	nao.horiguchi, rric, tony.luck, ruansy.fnst

This patchset includes "cxl/core: introduce poison creation hanlding"
and "cxl: avoid duplicated report from MCE & device", which were posted
separately.  Here are changes since last version of each patch:
 P1: 1. since its async memory_failure(), set the flag to 0
     2. also handle CXL_EVENT_TRANSACTION_SCAN_MEDIA type
 P2: 1. use XArray instead of list_head
     2. add guard() lock for cxl device iteration
 P1&P2: Rebase to v6.11-rc1


As is known to us, CXL spec defines POISON feature to notify its status 
when CXL memory device got a broken page.  Basically, there are two 
major paths for the notification.

1. CPU handling error
   When a process is accessing this broken page, CXL device returns data
   with POISON.  When CPU consumes the POISON, it raises a kind of error
   notification.
   To be precise, "how CPU should behave when it consumes POISON" is
   architecture dependent.  In my understanding, x86-64 raises Machine
   Check Exception(MCE) via interrupt #18 in this case.
2. CXL device reporting error
   When CXL device detects the broken page by itself and sends memory
   error signal to kernel in two optional paths.
   2.a. FW-First
     CXL device sends error via VDM to CXL Host, then CXL Host sends it
     to System Firmware via interrupt, finally kernel handles the error.
   2.b. OS-First
     CXL device directly sends error via MSI/MSI-X to kernel.

Note: Since I'm now focusing on x86_64, basically I'll describe about 
x86-64 only.

The following diagram should describe the 2 major paths and 2 optional 
sub-paths above.
```
1.  MCE (interrupt #18, while CPU consuming POISON)
     -> do_machine_check()
       -> mce_log()
         -> notify chain (x86_mce_decoder_chain)
           -> memory_failure()
2.a FW-First (optional, CXL device proactively find&report)
     -> CXL device -> Firmware
       -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
2.b OS-First (optional, CXL device proactively find&report)
     -> CXL device -> MSI
       -> OS: CXL driver -> trace
```

For "1. CPU handling error" path, the current code seems to work fine. 
When I used error injection feature on QEMU emulation, the code path is 
executed certainly.  Then, if the CPU certainly raises a MCE when it 
consumes the POISON, this path has no problem.

So, I'm working on making for 2.a and 2.b path, which is CXL device 
reported POISON error could be handled by kernel.  This path has two 
advantages.

- Proactively find&report memory problems

   Even if a process does not read data yet, kernel/drivers can prevent
   the process from using corrupted data proactively.  AFAIK, the current
   kernel only traces POISON error event from FW-First/OS-First path, but
   it doesn't handle them, neither notify processes who are using the
   POISON page like MCE does.  User space tools like rasdaemon reads the
   trace and log it, but as well, it doesn't handle the POISON page.  As
   a result, user has to read the error log from rasdaemon, distinguish
   whether the POISON error is from CXL memory or DDR memory, find out
   which applications are effected.  That is not an easy work and cannot
   be handled in time.  Thus, I'd like to add a feature to make the work
   done automatically and quickly. Once CXL device reports the POISON
   error (via FW-First/OS-First), kernel handles it immediately, similar
   to the flow when a MCE is triggered.  This is my first motivation.

- Architecture independent

   As the mentioned above, "1. CPU handling error" path is architecture
   dependent.  On the other hand, this route can be architecture
   independent code.  If there is a CPU which does not have similar
   feature like MCE of x86-64, my work will be essential.  (To be honest,
   I did not notice this advantage at first as mentioned later, but I
   think this is also important.)


Shiyang Ruan (2):
  cxl/core: introduce device reporting poison hanlding
  cxl: avoid duplicated report from MCE & device

 arch/x86/include/asm/mce.h |   1 +
 drivers/cxl/core/mbox.c    | 190 ++++++++++++++++++++++++++++++++++---
 drivers/cxl/core/memdev.c  |   6 +-
 drivers/cxl/cxlmem.h       |  11 ++-
 drivers/cxl/pci.c          |   4 +-
 include/linux/cxl-event.h  |  16 +++-
 6 files changed, 207 insertions(+), 21 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding
  2024-08-08 15:13 [PATCH v4 0/2] cxl: add device reporting poison handler Shiyang Ruan via
@ 2024-08-08 15:13 ` Shiyang Ruan via
  2024-08-08 18:28   ` Fan Ni
  2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
  1 sibling, 1 reply; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-08 15:13 UTC (permalink / raw)
  To: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
	vishal.l.verma, Jonathan.Cameron, alison.schofield
  Cc: bp, dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
	nao.horiguchi, rric, tony.luck, ruansy.fnst

CXL device can find&report memory problems, even before MCE is detected
by CPU.  AFAIK, the current kernel only traces POISON error event
from FW-First/OS-First path, but it doesn't handle them, neither
notify processes who are using the POISON page like MCE does.

Thus, user have to read logs from trace and find out which device
reported the error and which applications are affected.  That is not
an easy work and cannot be handled in time.  Thus, it is needed to add
the feature to make the work done automatically and quickly.  Once CXL
device reports the POISON error (via FW-First/OS-First), kernel
handles it immediately, similar to the flow when a MCE is triggered.

The current call trace of error reporting&handling looks like this:
```
1.  MCE (interrupt #18, while CPU consuming POISON)
     -> do_machine_check()
       -> mce_log()
         -> notify chain (x86_mce_decoder_chain)
           -> memory_failure()

2.a FW-First (optional, CXL device proactively find&report)
     -> CXL device -> Firmware
       -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
                                                  \-> memory_failure()
                                                      ^----- ADD
2.b OS-First (optional, CXL device proactively find&report)
     -> CXL device -> MSI
       -> OS: CXL driver -> trace
                        \-> memory_failure()
                            ^------------------------------- ADD
```
This patch adds calling memory_failure() while CXL device reporting
error is received, marked as "ADD" in figure above.

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 drivers/cxl/core/mbox.c   | 75 ++++++++++++++++++++++++++++++++-------
 drivers/cxl/cxlmem.h      |  8 ++---
 drivers/cxl/pci.c         |  4 +--
 include/linux/cxl-event.h | 16 ++++++++-
 4 files changed, 83 insertions(+), 20 deletions(-)

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index e5cdeafdf76e..0cb6ef2e6600 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -849,10 +849,55 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
 
-void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
-			    enum cxl_event_log_type type,
-			    enum cxl_event_type event_type,
-			    const uuid_t *uuid, union cxl_event *evt)
+static void cxl_report_poison(struct cxl_memdev *cxlmd, u64 hpa)
+{
+	unsigned long pfn = PHYS_PFN(hpa);
+
+	memory_failure_queue(pfn, 0);
+}
+
+static void cxl_event_handle_general_media(struct cxl_memdev *cxlmd,
+					   enum cxl_event_log_type type,
+					   u64 hpa,
+					   struct cxl_event_gen_media *rec)
+{
+	if (type == CXL_EVENT_TYPE_FAIL) {
+		switch (rec->media_hdr.transaction_type) {
+		case CXL_EVENT_TRANSACTION_READ:
+		case CXL_EVENT_TRANSACTION_WRITE:
+		case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
+		case CXL_EVENT_TRANSACTION_INJECT_POISON:
+			cxl_report_poison(cxlmd, hpa);
+			break;
+		default:
+			break;
+		}
+	}
+}
+
+static void cxl_event_handle_dram(struct cxl_memdev *cxlmd,
+				  enum cxl_event_log_type type,
+				  u64 hpa,
+				  struct cxl_event_dram *rec)
+{
+	if (type == CXL_EVENT_TYPE_FAIL) {
+		switch (rec->media_hdr.transaction_type) {
+		case CXL_EVENT_TRANSACTION_READ:
+		case CXL_EVENT_TRANSACTION_WRITE:
+		case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
+		case CXL_EVENT_TRANSACTION_INJECT_POISON:
+			cxl_report_poison(cxlmd, hpa);
+			break;
+		default:
+			break;
+		}
+	}
+}
+
+void cxl_event_handle_record(struct cxl_memdev *cxlmd,
+			     enum cxl_event_log_type type,
+			     enum cxl_event_type event_type,
+			     const uuid_t *uuid, union cxl_event *evt)
 {
 	if (event_type == CXL_CPER_EVENT_MEM_MODULE) {
 		trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
@@ -880,18 +925,22 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 		if (cxlr)
 			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
 
-		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
+		if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
 			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
 						&evt->gen_media);
-		else if (event_type == CXL_CPER_EVENT_DRAM)
+			cxl_event_handle_general_media(cxlmd, type, hpa,
+						&evt->gen_media);
+		} else if (event_type == CXL_CPER_EVENT_DRAM) {
 			trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
+			cxl_event_handle_dram(cxlmd, type, hpa, &evt->dram);
+		}
 	}
 }
-EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
+EXPORT_SYMBOL_NS_GPL(cxl_event_handle_record, CXL);
 
-static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
-				     enum cxl_event_log_type type,
-				     struct cxl_event_record_raw *record)
+static void __cxl_event_handle_record(struct cxl_memdev *cxlmd,
+				      enum cxl_event_log_type type,
+				      struct cxl_event_record_raw *record)
 {
 	enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
 	const uuid_t *uuid = &record->id;
@@ -903,7 +952,7 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
 	else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
 		ev_type = CXL_CPER_EVENT_MEM_MODULE;
 
-	cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
+	cxl_event_handle_record(cxlmd, type, ev_type, uuid, &record->event);
 }
 
 static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -1012,8 +1061,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 			break;
 
 		for (i = 0; i < nr_rec; i++)
-			__cxl_event_trace_record(cxlmd, type,
-						 &payload->records[i]);
+			__cxl_event_handle_record(cxlmd, type,
+						  &payload->records[i]);
 
 		if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
 			trace_cxl_overflow(cxlmd, type, payload);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index afb53d058d62..5c4810dcbdeb 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -826,10 +826,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
 				  unsigned long *cmds);
 void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
-void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
-			    enum cxl_event_log_type type,
-			    enum cxl_event_type event_type,
-			    const uuid_t *uuid, union cxl_event *evt);
+void cxl_event_handle_record(struct cxl_memdev *cxlmd,
+			     enum cxl_event_log_type type,
+			     enum cxl_event_type event_type,
+			     const uuid_t *uuid, union cxl_event *evt);
 int cxl_set_timestamp(struct cxl_memdev_state *mds);
 int cxl_poison_state_init(struct cxl_memdev_state *mds);
 int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4be35dc22202..6e65ca89f666 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1029,8 +1029,8 @@ static void cxl_handle_cper_event(enum cxl_event_type ev_type,
 	hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
 	log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
 
-	cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
-			       &uuid_null, &rec->event);
+	cxl_event_handle_record(cxlds->cxlmd, log_type, ev_type,
+				&uuid_null, &rec->event);
 }
 
 static void cxl_cper_work_fn(struct work_struct *work)
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 0bea1afbd747..be4342a2b597 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -7,6 +7,20 @@
 #include <linux/uuid.h>
 #include <linux/workqueue_types.h>
 
+/*
+ * Event transaction type
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+enum cxl_event_transaction_type {
+	CXL_EVENT_TRANSACTION_UNKNOWN = 0X00,
+	CXL_EVENT_TRANSACTION_READ,
+	CXL_EVENT_TRANSACTION_WRITE,
+	CXL_EVENT_TRANSACTION_SCAN_MEDIA,
+	CXL_EVENT_TRANSACTION_INJECT_POISON,
+	CXL_EVENT_TRANSACTION_MEDIA_SCRUB,
+	CXL_EVENT_TRANSACTION_MEDIA_MANAGEMENT,
+};
+
 /*
  * Common Event Record Format
  * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
@@ -26,7 +40,7 @@ struct cxl_event_media_hdr {
 	__le64 phys_addr;
 	u8 descriptor;
 	u8 type;
-	u8 transaction_type;
+	u8 transaction_type;	/* enum cxl_event_transaction_type */
 	/*
 	 * The meaning of Validity Flags from bit 2 is
 	 * different across DRAM and General Media records
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
  2024-08-08 15:13 [PATCH v4 0/2] cxl: add device reporting poison handler Shiyang Ruan via
  2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
@ 2024-08-08 15:13 ` Shiyang Ruan via
  2024-08-09  7:31   ` kernel test robot
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-08 15:13 UTC (permalink / raw)
  To: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
	vishal.l.verma, Jonathan.Cameron, alison.schofield
  Cc: bp, dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
	nao.horiguchi, rric, tony.luck, ruansy.fnst

Since CXL device is a memory device, while CPU is consuming a poison
page of CXL device, it always triggers a MCE (via interrupt #18) and
calls memory_failure() to handle POISON page, no matter which-First path
is configured.  CXL device could also find and report the POISON, kernel
now not only traces but also calls memory_failure() to handle it, which
is marked as "NEW" in the figure blow.
```
1.  MCE (interrupt #18, while CPU consuming POISON)
     -> do_machine_check()
       -> mce_log()
         -> notify chain (x86_mce_decoder_chain)
           -> memory_failure() <---------------------------- EXISTS
2.a FW-First (optional, CXL device proactively find&report)
     -> CXL device -> Firmware
       -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
                                                  \-> memory_failure()
                                                      ^----- NEW
2.b OS-First (optional, CXL device proactively find&report)
     -> CXL device -> MSI
       -> OS: CXL driver -> trace
                        \-> memory_failure()
                            ^------------------------------- NEW
```

But in this way, the memory_failure() could be called twice or even at
same time, as is shown in the figure above: (1.) and (2.a or 2.b),
before the POISON page is cleared.  memory_failure() has it own mutex
lock so it actually won't be called at same time and the later call
could be avoided because HWPoison bit has been set.  However, assume
such a scenario, "CXL device reports POISON error" triggers 1st call,
user see it from log and want to clear the poison by executing `cxl
clear-poison` command, and at the same time, a process tries to access
this POISON page, which triggers MCE (it's the 2nd call).  Since there
is no lock between the 2nd call with clearing poison operation, race
condition may happen, which may cause HWPoison bit of the page in an
unknown state.

Thus, we have to avoid the 2nd call. This patch[2] introduces a new
notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to
stop the 2nd call of memory_failure(). It checks whether the current
poison page has been reported (if yes, stop the notifier chain, don't
call the following memory_failure() to report again).

Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 arch/x86/include/asm/mce.h |   1 +
 drivers/cxl/core/mbox.c    | 115 +++++++++++++++++++++++++++++++++++++
 drivers/cxl/core/memdev.c  |   6 +-
 drivers/cxl/cxlmem.h       |   3 +
 4 files changed, 124 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3ad29b128943..5da45e870858 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -182,6 +182,7 @@ enum mce_notifier_prios {
 	MCE_PRIO_NFIT,
 	MCE_PRIO_EXTLOG,
 	MCE_PRIO_UC,
+	MCE_PRIO_CXL,
 	MCE_PRIO_EARLY,
 	MCE_PRIO_CEC,
 	MCE_PRIO_HIGHEST = MCE_PRIO_CEC
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 0cb6ef2e6600..b21700428c35 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -4,6 +4,8 @@
 #include <linux/debugfs.h>
 #include <linux/ktime.h>
 #include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <asm/mce.h>
 #include <asm/unaligned.h>
 #include <cxlpci.h>
 #include <cxlmem.h>
@@ -925,6 +927,9 @@ void cxl_event_handle_record(struct cxl_memdev *cxlmd,
 		if (cxlr)
 			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
 
+		if (hpa != ULLONG_MAX && cxl_mce_recorded(hpa))
+			return;
+
 		if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
 			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
 						&evt->gen_media);
@@ -1457,6 +1462,112 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds)
 }
 EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
 
+DEFINE_XARRAY(cxl_mce_records);
+
+bool cxl_mce_recorded(u64 hpa)
+{
+	XA_STATE(xas, &cxl_mce_records, hpa);
+	void *entry;
+
+	xas_lock_irq(&xas);
+	entry = xas_load(&xas);
+	if (entry) {
+		xas_unlock_irq(&xas);
+		return true;
+	}
+	entry = xa_mk_value(hpa);
+	xas_store(&xas, entry);
+	xas_unlock_irq(&xas);
+
+	return false;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mce_recorded, CXL);
+
+void cxl_mce_clear(u64 hpa)
+{
+	XA_STATE(xas, &cxl_mce_records, hpa);
+	void *entry;
+
+	xas_lock_irq(&xas);
+	entry = xas_load(&xas);
+	if (entry) {
+		xas_store(&xas, NULL);
+	}
+	xas_unlock_irq(&xas);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mce_clear, CXL);
+
+struct cxl_contains_hpa_context {
+	bool contains;
+	u64 hpa;
+};
+
+static int __cxl_contains_hpa(struct device *dev, void *arg)
+{
+	struct cxl_contains_hpa_context *ctx = arg;
+	struct cxl_endpoint_decoder *cxled;
+	struct range *range;
+	u64 hpa = ctx->hpa;
+
+	if (!is_endpoint_decoder(dev))
+		return 0;
+
+	cxled = to_cxl_endpoint_decoder(dev);
+	range = &cxled->cxld.hpa_range;
+
+	if (range->start <= hpa && hpa <= range->end) {
+		ctx->contains = true;
+		return 1;
+	}
+
+	return 0;
+}
+
+static bool cxl_contains_hpa(const struct cxl_memdev *cxlmd, u64 hpa)
+{
+	struct cxl_contains_hpa_context ctx = {
+		.contains = false,
+		.hpa = hpa,
+	};
+	struct cxl_port *port;
+
+	port = cxlmd->endpoint;
+	guard(rwsem_write)(&cxl_region_rwsem);
+	if (port && cxl_num_decoders_committed(port))
+		device_for_each_child(&port->dev, &ctx, __cxl_contains_hpa);
+
+	return ctx.contains;
+}
+
+static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
+			  void *data)
+{
+	struct mce *mce = (struct mce *)data;
+	struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
+						    mce_notifier);
+	u64 hpa;
+
+	if (!mce || !mce_usable_address(mce))
+		return NOTIFY_DONE;
+
+	hpa = mce->addr & MCI_ADDR_PHYSADDR;
+
+	/* Check if the PFN is located on this CXL device */
+	if (!pfn_valid(hpa >> PAGE_SHIFT) &&
+	    !cxl_contains_hpa(mds->cxlds.cxlmd, hpa))
+		return NOTIFY_DONE;
+
+	/*
+	 * Search PFN in the cxl_mce_records, if already exists, don't continue
+	 * to do memory_failure() to avoid a poison address being reported
+	 * more than once.
+	 */
+	if (cxl_mce_recorded(hpa))
+		return NOTIFY_STOP;
+	else
+		return NOTIFY_OK;
+}
+
 struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 {
 	struct cxl_memdev_state *mds;
@@ -1476,6 +1587,10 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
 	mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
 	mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
 
+	mds->mce_notifier.notifier_call = cxl_handle_mce;
+	mds->mce_notifier.priority = MCE_PRIO_CXL;
+	mce_register_decode_chain(&mds->mce_notifier);
+
 	return mds;
 }
 EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, CXL);
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 0277726afd04..9d4ed4dc4d51 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -376,10 +376,14 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
 		goto out;
 
 	cxlr = cxl_dpa_to_region(cxlmd, dpa);
-	if (cxlr)
+	if (cxlr) {
+		u64 hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
+
+		cxl_mce_clear(hpa);
 		dev_warn_once(mds->cxlds.dev,
 			      "poison clear dpa:%#llx region: %s\n", dpa,
 			      dev_name(&cxlr->dev));
+	}
 
 	record = (struct cxl_poison_record) {
 		.address = cpu_to_le64(dpa),
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5c4810dcbdeb..d2d906c26755 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -502,6 +502,7 @@ struct cxl_memdev_state {
 	struct cxl_fw_state fw;
 
 	struct rcuwait mbox_wait;
+	struct notifier_block mce_notifier;
 	int (*mbox_send)(struct cxl_memdev_state *mds,
 			 struct cxl_mbox_cmd *cmd);
 };
@@ -837,6 +838,8 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
 int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
 int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
 int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
+bool cxl_mce_recorded(u64 pfn);
+void cxl_mce_clear(u64 pfn);
 
 #ifdef CONFIG_CXL_SUSPEND
 void cxl_mem_active_inc(void);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding
  2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
@ 2024-08-08 18:28   ` Fan Ni
  2024-08-21 13:57     ` Shiyang Ruan via
  0 siblings, 1 reply; 8+ messages in thread
From: Fan Ni @ 2024-08-08 18:28 UTC (permalink / raw)
  To: Shiyang Ruan
  Cc: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
	vishal.l.verma, Jonathan.Cameron, alison.schofield, bp,
	dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
	nao.horiguchi, rric, tony.luck

On Thu, Aug 08, 2024 at 11:13:27PM +0800, Shiyang Ruan wrote:
> CXL device can find&report memory problems, even before MCE is detected
> by CPU.  AFAIK, the current kernel only traces POISON error event
> from FW-First/OS-First path, but it doesn't handle them, neither
> notify processes who are using the POISON page like MCE does.
> 
> Thus, user have to read logs from trace and find out which device
> reported the error and which applications are affected.  That is not
> an easy work and cannot be handled in time.  Thus, it is needed to add
> the feature to make the work done automatically and quickly.  Once CXL
> device reports the POISON error (via FW-First/OS-First), kernel
> handles it immediately, similar to the flow when a MCE is triggered.
> 
> The current call trace of error reporting&handling looks like this:
> ```
> 1.  MCE (interrupt #18, while CPU consuming POISON)
>      -> do_machine_check()
>        -> mce_log()
>          -> notify chain (x86_mce_decoder_chain)
>            -> memory_failure()
> 
> 2.a FW-First (optional, CXL device proactively find&report)
>      -> CXL device -> Firmware
>        -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
>                                                   \-> memory_failure()
>                                                       ^----- ADD
> 2.b OS-First (optional, CXL device proactively find&report)
>      -> CXL device -> MSI
>        -> OS: CXL driver -> trace
>                         \-> memory_failure()
>                             ^------------------------------- ADD
> ```
> This patch adds calling memory_failure() while CXL device reporting
> error is received, marked as "ADD" in figure above.
> 
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  drivers/cxl/core/mbox.c   | 75 ++++++++++++++++++++++++++++++++-------
>  drivers/cxl/cxlmem.h      |  8 ++---
>  drivers/cxl/pci.c         |  4 +--
>  include/linux/cxl-event.h | 16 ++++++++-
>  4 files changed, 83 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index e5cdeafdf76e..0cb6ef2e6600 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -849,10 +849,55 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>  
> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> -			    enum cxl_event_log_type type,
> -			    enum cxl_event_type event_type,
> -			    const uuid_t *uuid, union cxl_event *evt)
> +static void cxl_report_poison(struct cxl_memdev *cxlmd, u64 hpa)
> +{
> +	unsigned long pfn = PHYS_PFN(hpa);
> +
> +	memory_failure_queue(pfn, 0);
> +}
> +
> +static void cxl_event_handle_general_media(struct cxl_memdev *cxlmd,
> +					   enum cxl_event_log_type type,
> +					   u64 hpa,
> +					   struct cxl_event_gen_media *rec)
> +{
> +	if (type == CXL_EVENT_TYPE_FAIL) {
> +		switch (rec->media_hdr.transaction_type) {
> +		case CXL_EVENT_TRANSACTION_READ:
> +		case CXL_EVENT_TRANSACTION_WRITE:
> +		case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
> +		case CXL_EVENT_TRANSACTION_INJECT_POISON:
> +			cxl_report_poison(cxlmd, hpa);
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +}
> +
> +static void cxl_event_handle_dram(struct cxl_memdev *cxlmd,
> +				  enum cxl_event_log_type type,
> +				  u64 hpa,
> +				  struct cxl_event_dram *rec)
> +{
> +	if (type == CXL_EVENT_TYPE_FAIL) {
> +		switch (rec->media_hdr.transaction_type) {
> +		case CXL_EVENT_TRANSACTION_READ:
> +		case CXL_EVENT_TRANSACTION_WRITE:
> +		case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
> +		case CXL_EVENT_TRANSACTION_INJECT_POISON:
> +			cxl_report_poison(cxlmd, hpa);
> +			break;
> +		default:
> +			break;
> +		}
> +	}
> +}
> +
> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
> +			     enum cxl_event_log_type type,
> +			     enum cxl_event_type event_type,
> +			     const uuid_t *uuid, union cxl_event *evt)
>  {
>  	if (event_type == CXL_CPER_EVENT_MEM_MODULE) {
>  		trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
> @@ -880,18 +925,22 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>  		if (cxlr)
>  			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
>  
> -		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
> +		if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
>  			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
>  						&evt->gen_media);
> -		else if (event_type == CXL_CPER_EVENT_DRAM)
> +			cxl_event_handle_general_media(cxlmd, type, hpa,
> +						&evt->gen_media);
> +		} else if (event_type == CXL_CPER_EVENT_DRAM) {
>  			trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
> +			cxl_event_handle_dram(cxlmd, type, hpa, &evt->dram);

Does it make sense to call the trace function in
cxl_event_handle_dram/general_media and replace the trace function with
the handle_* here?

> +		}
>  	}
>  }
> -EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
> +EXPORT_SYMBOL_NS_GPL(cxl_event_handle_record, CXL);
>  
> -static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> -				     enum cxl_event_log_type type,
> -				     struct cxl_event_record_raw *record)
> +static void __cxl_event_handle_record(struct cxl_memdev *cxlmd,
> +				      enum cxl_event_log_type type,
> +				      struct cxl_event_record_raw *record)
>  {
>  	enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
>  	const uuid_t *uuid = &record->id;
> @@ -903,7 +952,7 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>  	else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
>  		ev_type = CXL_CPER_EVENT_MEM_MODULE;
>  
> -	cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
> +	cxl_event_handle_record(cxlmd, type, ev_type, uuid, &record->event);
>  }
>  
>  static int cxl_clear_event_record(struct cxl_memdev_state *mds,
> @@ -1012,8 +1061,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>  			break;
>  
>  		for (i = 0; i < nr_rec; i++)
> -			__cxl_event_trace_record(cxlmd, type,
> -						 &payload->records[i]);
> +			__cxl_event_handle_record(cxlmd, type,
> +						  &payload->records[i]);
>  
>  		if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
>  			trace_cxl_overflow(cxlmd, type, payload);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index afb53d058d62..5c4810dcbdeb 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -826,10 +826,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>  void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>  				  unsigned long *cmds);
>  void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> -			    enum cxl_event_log_type type,
> -			    enum cxl_event_type event_type,
> -			    const uuid_t *uuid, union cxl_event *evt);
> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
> +			     enum cxl_event_log_type type,
> +			     enum cxl_event_type event_type,
> +			     const uuid_t *uuid, union cxl_event *evt);
>  int cxl_set_timestamp(struct cxl_memdev_state *mds);
>  int cxl_poison_state_init(struct cxl_memdev_state *mds);
>  int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 4be35dc22202..6e65ca89f666 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1029,8 +1029,8 @@ static void cxl_handle_cper_event(enum cxl_event_type ev_type,
>  	hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
>  	log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
>  
> -	cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
> -			       &uuid_null, &rec->event);
> +	cxl_event_handle_record(cxlds->cxlmd, log_type, ev_type,
> +				&uuid_null, &rec->event);
>  }
>  
>  static void cxl_cper_work_fn(struct work_struct *work)
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 0bea1afbd747..be4342a2b597 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -7,6 +7,20 @@
>  #include <linux/uuid.h>
>  #include <linux/workqueue_types.h>
>  
> +/*
> + * Event transaction type
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43

Here and below, update the specification reference to reflect cxl 3.1.

Fan
> + */
> +enum cxl_event_transaction_type {
> +	CXL_EVENT_TRANSACTION_UNKNOWN = 0X00,
> +	CXL_EVENT_TRANSACTION_READ,
> +	CXL_EVENT_TRANSACTION_WRITE,
> +	CXL_EVENT_TRANSACTION_SCAN_MEDIA,
> +	CXL_EVENT_TRANSACTION_INJECT_POISON,
> +	CXL_EVENT_TRANSACTION_MEDIA_SCRUB,
> +	CXL_EVENT_TRANSACTION_MEDIA_MANAGEMENT,
> +};
> +
>  /*
>   * Common Event Record Format
>   * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> @@ -26,7 +40,7 @@ struct cxl_event_media_hdr {
>  	__le64 phys_addr;
>  	u8 descriptor;
>  	u8 type;
> -	u8 transaction_type;
> +	u8 transaction_type;	/* enum cxl_event_transaction_type */
>  	/*
>  	 * The meaning of Validity Flags from bit 2 is
>  	 * different across DRAM and General Media records
> -- 
> 2.34.1
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
  2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
@ 2024-08-09  7:31   ` kernel test robot
  2024-08-09  7:31   ` kernel test robot
  2024-08-09 11:48   ` kernel test robot
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-08-09  7:31 UTC (permalink / raw)
  To: Shiyang Ruan, qemu-devel, linux-cxl, linux-edac, linux-mm,
	dan.j.williams, vishal.l.verma, Jonathan.Cameron,
	alison.schofield
  Cc: oe-kbuild-all, bp, dave.jiang, dave, ira.weiny, james.morse,
	linmiaohe, mchehab, nao.horiguchi, rric, tony.luck, ruansy.fnst

Hi Shiyang,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: um-allyesconfig (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091537.p9RKx1R2-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/cxl/core/mbox.c:8:
>> arch/x86/include/asm/mce.h:219:43: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
         |                                           ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:220:44: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
         |                                            ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:240:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
         |                                                  ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:241:51: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
         |                                                   ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:248:26: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     248 | int mce_available(struct cpuinfo_x86 *c);
         |                          ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:355:48: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                                ^~~~~~~~~~~
   arch/x86/include/asm/mce.h:358:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                  ^~~~~~~~~~~
   arch/x86/include/asm/mce.h: In function 'mce_hygon_feature_init':
>> arch/x86/include/asm/mce.h:358:103: error: passing argument 1 of 'mce_amd_feature_init' from incompatible pointer type [-Werror=incompatible-pointer-types]
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                                                                       ^
         |                                                                                                       |
         |                                                                                                       struct cpuinfo_x86 *
   arch/x86/include/asm/mce.h:355:61: note: expected 'struct cpuinfo_x86 *' but argument is of type 'struct cpuinfo_x86 *'
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                         ~~~~~~~~~~~~~~~~~~~~^
   In file included from include/linux/container_of.h:5,
                    from include/linux/list.h:5,
                    from include/linux/key.h:14,
                    from include/linux/security.h:27,
                    from drivers/cxl/core/mbox.c:3:
   drivers/cxl/core/mbox.c: In function 'cxl_handle_mce':
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                                          ^
   include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
   include/linux/bits.h:25:17: note: in expansion of macro '__is_constexpr'
      25 |                 __is_constexpr((l) > (h)), (l) > (h), 0)))
         |                 ^~~~~~~~~~~~~~
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                                          ^
   include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   include/linux/bits.h:24:28: error: first argument to '__builtin_choose_expr' not a constant
      24 |         (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
         |                            ^~~~~~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                   ^
   include/linux/bits.h:24:10: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
      24 |         (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
         |          ^~~~~~~~~~~~~~~~~
   include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |          ^~~~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   In file included from include/linux/bits.h:7,
                    from include/linux/ratelimit_types.h:5,
                    from include/linux/printk.h:9,
                    from include/asm-generic/bug.h:22,
                    from ./arch/um/include/generated/asm/bug.h:1,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:13,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/um/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:79,
                    from include/linux/rcupdate.h:27,
                    from include/linux/rbtree.h:24,
                    from include/linux/key.h:15:
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                                          ^
   include/uapi/linux/bits.h:13:52: note: in definition of macro '__GENMASK_ULL'
      13 |          (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
         |                                                    ^
   arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                 ^~~~~~~~~~~
   drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/mce_amd_feature_init +358 arch/x86/include/asm/mce.h

4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  210  
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  211  #ifdef CONFIG_X86_MCE
a2202aa29289db arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  212  int mcheck_init(void);
5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16  213  void mcheck_cpu_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  214  void mcheck_cpu_clear(struct cpuinfo_x86 *c);
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  215  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  216  			       u64 lapic_id);
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  217  #else
a2202aa29289db arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  218  static inline int mcheck_init(void) { return 0; }
5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16 @219  static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  220  static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  221  static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  222  					     u64 lapic_id) { return -EINVAL; }
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  223  #endif
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  224  
b5f2fa4ea00a17 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  225  void mce_setup(struct mce *m);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  226  void mce_log(struct mce *m);
d6126ef5f31ca5 arch/x86/include/asm/mce.h Greg Kroah-Hartman        2012-01-26  227  DECLARE_PER_CPU(struct device *, mce_device);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  228  
a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  229  /* Maximum number of MCA banks per CPU. */
a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  230  #define MAX_NR_BANKS 64
41fdff322e26c4 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  231  
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  232  #ifdef CONFIG_X86_MCE_INTEL
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  233  void mce_intel_feature_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  234  void mce_intel_feature_clear(struct cpuinfo_x86 *c);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  235  void cmci_clear(void);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  236  void cmci_reenable(void);
7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  237  void cmci_rediscover(void);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  238  void cmci_recheck(void);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  239  #else
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  240  static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  241  static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  242  static inline void cmci_clear(void) {}
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  243  static inline void cmci_reenable(void) {}
7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  244  static inline void cmci_rediscover(void) {}
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  245  static inline void cmci_recheck(void) {}
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  246  #endif
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  247  
38736072d45488 arch/x86/include/asm/mce.h H. Peter Anvin            2009-05-28  248  int mce_available(struct cpuinfo_x86 *c);
2d1f406139ec20 arch/x86/include/asm/mce.h Borislav Petkov           2017-05-19  249  bool mce_is_memory_error(struct mce *m);
5d96c9342c23ee arch/x86/include/asm/mce.h Vishal Verma              2018-10-25  250  bool mce_is_correctable(struct mce *m);
1bae0cfe4a171c arch/x86/include/asm/mce.h Yazen Ghannam             2023-06-13  251  bool mce_usable_address(struct mce *m);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  252  
01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  253  DECLARE_PER_CPU(unsigned, mce_exception_count);
ca84f69697da0f arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  254  DECLARE_PER_CPU(unsigned, mce_poll_count);
01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  255  
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  256  typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  257  DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  258  
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  259  enum mcp_flags {
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  260  	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  261  	MCP_UC		= BIT(1),	/* log uncorrected errors */
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  262  	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
3bff147b187d5d arch/x86/include/asm/mce.h Borislav Petkov           2021-08-23  263  	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  264  };
5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  265  
5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  266  void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  267  
9ff36ee9668ff4 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  268  int mce_notify_irq(void);
e2f430291fe23a include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  269  
ea149b36c7f511 arch/x86/include/asm/mce.h Andi Kleen                2009-04-29  270  DECLARE_PER_CPU(struct mce, injectm);
66f5ddf30a59f8 arch/x86/include/asm/mce.h Tony Luck                 2011-11-03  271  
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  272  /* Disable CMCI/polling for MCA bank claimed by firmware */
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  273  extern void mce_disable_bank(int bank);
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  274  
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  275  /*
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  276   * Exception handler
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  277   */
8cd501c1facc15 arch/x86/include/asm/mce.h Thomas Gleixner           2020-02-25  278  void do_machine_check(struct pt_regs *pt_regs);
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  279  
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  280  /*
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  281   * Threshold handler
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  282   */
b276268631af3a arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  283  extern void (*mce_threshold_vector)(void);
b276268631af3a arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  284  
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  285  /* Deferred error interrupt handler */
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  286  extern void (*deferred_error_int_vector)(void);
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  287  
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  288  /*
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  289   * Used by APEI to report memory error via /dev/mcelog
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  290   */
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  291  
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  292  struct cper_sec_mem_err;
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  293  extern void apei_mce_report_mem_error(int corrected,
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  294  				      struct cper_sec_mem_err *mem_err);
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  295  
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  296  /*
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  297   * Enumerate new IP types and HWID values in AMD processors which support
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  298   * Scalable MCA.
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  299   */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  300  #ifdef CONFIG_X86_MCE_AMD
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  301  
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  302  /* These may be used by multiple smca_hwid_mcatypes */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  303  enum smca_bank_types {
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  304  	SMCA_LS = 0,	/* Load Store */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  305  	SMCA_LS_V2,
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  306  	SMCA_IF,	/* Instruction Fetch */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  307  	SMCA_L2_CACHE,	/* L2 Cache */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  308  	SMCA_DE,	/* Decoder Unit */
68627a697c1959 arch/x86/include/asm/mce.h Yazen Ghannam             2018-02-21  309  	SMCA_RESERVED,	/* Reserved */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  310  	SMCA_EX,	/* Execution Unit */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  311  	SMCA_FP,	/* Floating Point */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  312  	SMCA_L3_CACHE,	/* L3 Cache */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  313  	SMCA_CS,	/* Coherent Slave */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  314  	SMCA_CS_V2,
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  315  	SMCA_PIE,	/* Power, Interrupts, etc. */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  316  	SMCA_UMC,	/* Unified Memory Controller */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  317  	SMCA_UMC_V2,
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  318  	SMCA_MA_LLC,	/* Memory Attached Last Level Cache */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  319  	SMCA_PB,	/* Parameter Block */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  320  	SMCA_PSP,	/* Platform Security Processor */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  321  	SMCA_PSP_V2,
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  322  	SMCA_SMU,	/* System Management Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  323  	SMCA_SMU_V2,
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  324  	SMCA_MP5,	/* Microprocessor 5 Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  325  	SMCA_MPDMA,	/* MPDMA Unit */
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  326  	SMCA_NBIO,	/* Northbridge IO Unit */
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  327  	SMCA_PCIE,	/* PCI Express Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  328  	SMCA_PCIE_V2,
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  329  	SMCA_XGMI_PCS,	/* xGMI PCS Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  330  	SMCA_NBIF,	/* NBIF Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  331  	SMCA_SHUB,	/* System HUB Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  332  	SMCA_SATA,	/* SATA Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  333  	SMCA_USB,	/* USB Unit */
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  334  	SMCA_USR_DP,	/* Ultra Short Reach Data Plane Controller */
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  335  	SMCA_USR_CP,	/* Ultra Short Reach Control Plane Controller */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  336  	SMCA_GMI_PCS,	/* GMI PCS Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  337  	SMCA_XGMI_PHY,	/* xGMI PHY Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  338  	SMCA_WAFL_PHY,	/* WAFL PHY Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  339  	SMCA_GMI_PHY,	/* GMI PHY Unit */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  340  	N_SMCA_BANK_TYPES
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  341  };
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  342  
c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  343  extern bool amd_mce_is_memory_error(struct mce *m);
e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  344  
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  345  extern int mce_threshold_create_device(unsigned int cpu);
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  346  extern int mce_threshold_remove_device(unsigned int cpu);
e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  347  
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  348  void mce_amd_feature_init(struct cpuinfo_x86 *c);
91f75eb481cfae arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  349  enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  350  #else
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  351  
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  352  static inline int mce_threshold_create_device(unsigned int cpu)		{ return 0; };
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  353  static inline int mce_threshold_remove_device(unsigned int cpu)		{ return 0; };
c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  354  static inline bool amd_mce_is_memory_error(struct mce *m)		{ return false; };
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  355  static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)		{ }
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  356  #endif
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  357  
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22 @358  static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)	{ return mce_amd_feature_init(c); }
e9c2a283e7d9d4 arch/x86/include/asm/mce.h Arnd Bergmann             2023-05-16  359  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
  2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
  2024-08-09  7:31   ` kernel test robot
@ 2024-08-09  7:31   ` kernel test robot
  2024-08-09 11:48   ` kernel test robot
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-08-09  7:31 UTC (permalink / raw)
  To: Shiyang Ruan, qemu-devel, linux-cxl, linux-edac, linux-mm,
	dan.j.williams, vishal.l.verma, Jonathan.Cameron,
	alison.schofield
  Cc: llvm, oe-kbuild-all, bp, dave.jiang, dave, ira.weiny, james.morse,
	linmiaohe, mchehab, nao.horiguchi, rric, tony.luck, ruansy.fnst

Hi Shiyang,

kernel test robot noticed the following build errors:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: um-allmodconfig (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project f86594788ce93b696675c94f54016d27a6c21d18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091543.UNFvPFFl-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:33:
   In file included from include/linux/mm.h:2228:
   include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
     514 |         return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
         |                               ~~~~~~~~~~~ ^ ~~~
   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:25:
   In file included from include/linux/kernel_stat.h:8:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     548 |         val = __raw_readb(PCI_IOBASE + addr);
         |                           ~~~~~~~~~~ ^
   include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     561 |         val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
      37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
         |                                                   ^
   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:25:
   In file included from include/linux/kernel_stat.h:8:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     574 |         val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
         |                                                         ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
      35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
         |                                                   ^
   In file included from drivers/cxl/core/mbox.c:3:
   In file included from include/linux/security.h:35:
   In file included from include/linux/bpf.h:31:
   In file included from include/linux/memcontrol.h:13:
   In file included from include/linux/cgroup.h:25:
   In file included from include/linux/kernel_stat.h:8:
   In file included from include/linux/interrupt.h:11:
   In file included from include/linux/hardirq.h:11:
   In file included from arch/um/include/asm/hardirq.h:5:
   In file included from include/asm-generic/hardirq.h:17:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:14:
   In file included from arch/um/include/asm/io.h:24:
   include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     585 |         __raw_writeb(value, PCI_IOBASE + addr);
         |                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     595 |         __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     605 |         __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
         |                                                       ~~~~~~~~~~ ^
   include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     693 |         readsb(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     701 |         readsw(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     709 |         readsl(PCI_IOBASE + addr, buffer, count);
         |                ~~~~~~~~~~ ^
   include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     718 |         writesb(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     727 |         writesw(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
     736 |         writesl(PCI_IOBASE + addr, buffer, count);
         |                 ~~~~~~~~~~ ^
   In file included from drivers/cxl/core/mbox.c:8:
>> arch/x86/include/asm/mce.h:219:43: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
         |                                           ^
   arch/x86/include/asm/mce.h:220:44: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
         |                                            ^
   arch/x86/include/asm/mce.h:240:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
         |                                                  ^
   arch/x86/include/asm/mce.h:241:51: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
         |                                                   ^
   arch/x86/include/asm/mce.h:248:26: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     248 | int mce_available(struct cpuinfo_x86 *c);
         |                          ^
   arch/x86/include/asm/mce.h:355:48: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                                ^
   arch/x86/include/asm/mce.h:358:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                  ^
>> arch/x86/include/asm/mce.h:358:96: error: incompatible pointer types passing 'struct cpuinfo_x86 *' to parameter of type 'struct cpuinfo_x86 *' [-Werror,-Wincompatible-pointer-types]
     358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)        { return mce_amd_feature_init(c); }
         |                                                                                                       ^
   arch/x86/include/asm/mce.h:355:61: note: passing argument to parameter 'c' here
     355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)          { }
         |                                                             ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                             ~~~~~~~~~~~~~ ^
   include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |                              ^
   include/linux/bits.h:25:25: note: expanded from macro 'GENMASK_INPUT_CHECK'
      25 |                 __is_constexpr((l) > (h)), (l) > (h), 0)))
         |                                       ^
   include/linux/compiler.h:290:48: note: expanded from macro '__is_constexpr'
     290 |         (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
         |                                                       ^
   include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                             ~~~~~~~~~~~~~ ^
   include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |                              ^
   include/linux/bits.h:25:37: note: expanded from macro 'GENMASK_INPUT_CHECK'
      25 |                 __is_constexpr((l) > (h)), (l) > (h), 0)))
         |                                                   ^
   include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO'
      16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
         |                                                              ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
    1553 |         hpa = mce->addr & MCI_ADDR_PHYSADDR;
         |                           ^~~~~~~~~~~~~~~~~
   arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
      94 | #define MCI_ADDR_PHYSADDR       GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
         |                                             ~~~~~~~~~~~~~ ^
   include/linux/bits.h:37:45: note: expanded from macro 'GENMASK_ULL'
      37 |         (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
         |                                                    ^
   include/uapi/linux/bits.h:13:52: note: expanded from macro '__GENMASK_ULL'
      13 |          (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
         |                                                    ^
   20 warnings and 4 errors generated.


vim +358 arch/x86/include/asm/mce.h

4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  210  
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  211  #ifdef CONFIG_X86_MCE
a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  212  int mcheck_init(void);
5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16  213  void mcheck_cpu_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  214  void mcheck_cpu_clear(struct cpuinfo_x86 *c);
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  215  int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  216  			       u64 lapic_id);
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  217  #else
a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang                 2009-11-10  218  static inline int mcheck_init(void) { return 0; }
5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov           2009-10-16 @219  static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  220  static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  221  static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli         2020-11-19  222  					     u64 lapic_id) { return -EINVAL; }
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  223  #endif
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  224  
b5f2fa4ea00a179 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  225  void mce_setup(struct mce *m);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  226  void mce_log(struct mce *m);
d6126ef5f31ca54 arch/x86/include/asm/mce.h Greg Kroah-Hartman        2012-01-26  227  DECLARE_PER_CPU(struct device *, mce_device);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  228  
a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  229  /* Maximum number of MCA banks per CPU. */
a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta              2020-08-28  230  #define MAX_NR_BANKS 64
41fdff322e26c4a arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  231  
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  232  #ifdef CONFIG_X86_MCE_INTEL
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  233  void mce_intel_feature_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  234  void mce_intel_feature_clear(struct cpuinfo_x86 *c);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  235  void cmci_clear(void);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  236  void cmci_reenable(void);
7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  237  void cmci_rediscover(void);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  238  void cmci_recheck(void);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  239  #else
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  240  static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj                 2015-08-12  241  static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  242  static inline void cmci_clear(void) {}
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  243  static inline void cmci_reenable(void) {}
7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat          2013-03-20  244  static inline void cmci_rediscover(void) {}
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  245  static inline void cmci_recheck(void) {}
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  246  #endif
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  247  
38736072d45488f arch/x86/include/asm/mce.h H. Peter Anvin            2009-05-28  248  int mce_available(struct cpuinfo_x86 *c);
2d1f406139ec203 arch/x86/include/asm/mce.h Borislav Petkov           2017-05-19  249  bool mce_is_memory_error(struct mce *m);
5d96c9342c23ee1 arch/x86/include/asm/mce.h Vishal Verma              2018-10-25  250  bool mce_is_correctable(struct mce *m);
1bae0cfe4a171cc arch/x86/include/asm/mce.h Yazen Ghannam             2023-06-13  251  bool mce_usable_address(struct mce *m);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  252  
01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  253  DECLARE_PER_CPU(unsigned, mce_exception_count);
ca84f69697da0f0 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  254  DECLARE_PER_CPU(unsigned, mce_poll_count);
01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  255  
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  256  typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  257  DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  258  
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  259  enum mcp_flags {
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  260  	MCP_TIMESTAMP	= BIT(0),	/* log time stamp */
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  261  	MCP_UC		= BIT(1),	/* log uncorrected errors */
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov           2015-01-13  262  	MCP_DONTLOG	= BIT(2),	/* only clear, don't log */
3bff147b187d5df arch/x86/include/asm/mce.h Borislav Petkov           2021-08-23  263  	MCP_QUEUE_LOG	= BIT(3),	/* only queue to genpool */
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  264  };
5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  265  
5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam             2024-05-23  266  void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  267  
9ff36ee9668ff41 arch/x86/include/asm/mce.h Andi Kleen                2009-05-27  268  int mce_notify_irq(void);
e2f430291fe23a4 include/asm-x86/mce.h      Thomas Gleixner           2007-10-17  269  
ea149b36c7f511d arch/x86/include/asm/mce.h Andi Kleen                2009-04-29  270  DECLARE_PER_CPU(struct mce, injectm);
66f5ddf30a59f81 arch/x86/include/asm/mce.h Tony Luck                 2011-11-03  271  
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  272  /* Disable CMCI/polling for MCA bank claimed by firmware */
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  273  extern void mce_disable_bank(int bank);
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao              2013-07-01  274  
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  275  /*
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  276   * Exception handler
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  277   */
8cd501c1facc159 arch/x86/include/asm/mce.h Thomas Gleixner           2020-02-25  278  void do_machine_check(struct pt_regs *pt_regs);
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  279  
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  280  /*
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  281   * Threshold handler
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto            2009-06-15  282   */
b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  283  extern void (*mce_threshold_vector)(void);
b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen                2009-02-12  284  
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  285  /* Deferred error interrupt handler */
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  286  extern void (*deferred_error_int_vector)(void);
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2015-05-06  287  
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  288  /*
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  289   * Used by APEI to report memory error via /dev/mcelog
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  290   */
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  291  
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  292  struct cper_sec_mem_err;
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  293  extern void apei_mce_report_mem_error(int corrected,
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  294  				      struct cper_sec_mem_err *mem_err);
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying                2010-05-18  295  
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  296  /*
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  297   * Enumerate new IP types and HWID values in AMD processors which support
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  298   * Scalable MCA.
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  299   */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  300  #ifdef CONFIG_X86_MCE_AMD
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  301  
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  302  /* These may be used by multiple smca_hwid_mcatypes */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  303  enum smca_bank_types {
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  304  	SMCA_LS = 0,	/* Load Store */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  305  	SMCA_LS_V2,
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  306  	SMCA_IF,	/* Instruction Fetch */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  307  	SMCA_L2_CACHE,	/* L2 Cache */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  308  	SMCA_DE,	/* Decoder Unit */
68627a697c19593 arch/x86/include/asm/mce.h Yazen Ghannam             2018-02-21  309  	SMCA_RESERVED,	/* Reserved */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  310  	SMCA_EX,	/* Execution Unit */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  311  	SMCA_FP,	/* Floating Point */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  312  	SMCA_L3_CACHE,	/* L3 Cache */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  313  	SMCA_CS,	/* Coherent Slave */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  314  	SMCA_CS_V2,
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  315  	SMCA_PIE,	/* Power, Interrupts, etc. */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  316  	SMCA_UMC,	/* Unified Memory Controller */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  317  	SMCA_UMC_V2,
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  318  	SMCA_MA_LLC,	/* Memory Attached Last Level Cache */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  319  	SMCA_PB,	/* Parameter Block */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  320  	SMCA_PSP,	/* Platform Security Processor */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  321  	SMCA_PSP_V2,
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  322  	SMCA_SMU,	/* System Management Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  323  	SMCA_SMU_V2,
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  324  	SMCA_MP5,	/* Microprocessor 5 Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  325  	SMCA_MPDMA,	/* MPDMA Unit */
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  326  	SMCA_NBIO,	/* Northbridge IO Unit */
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam             2019-02-01  327  	SMCA_PCIE,	/* PCI Express Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  328  	SMCA_PCIE_V2,
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  329  	SMCA_XGMI_PCS,	/* xGMI PCS Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  330  	SMCA_NBIF,	/* NBIF Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  331  	SMCA_SHUB,	/* System HUB Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  332  	SMCA_SATA,	/* SATA Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  333  	SMCA_USB,	/* USB Unit */
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  334  	SMCA_USR_DP,	/* Ultra Short Reach Data Plane Controller */
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K           2023-11-02  335  	SMCA_USR_CP,	/* Ultra Short Reach Control Plane Controller */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  336  	SMCA_GMI_PCS,	/* GMI PCS Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  337  	SMCA_XGMI_PHY,	/* xGMI PHY Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K           2021-05-26  338  	SMCA_WAFL_PHY,	/* WAFL PHY Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  339  	SMCA_GMI_PHY,	/* GMI PHY Unit */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  340  	N_SMCA_BANK_TYPES
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  341  };
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  342  
c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  343  extern bool amd_mce_is_memory_error(struct mce *m);
e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  344  
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  345  extern int mce_threshold_create_device(unsigned int cpu);
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  346  extern int mce_threshold_remove_device(unsigned int cpu);
e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds            2016-12-12  347  
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  348  void mce_amd_feature_init(struct cpuinfo_x86 *c);
91f75eb481cfaee arch/x86/include/asm/mce.h Yazen Ghannam             2021-12-16  349  enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  350  #else
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam             2016-09-12  351  
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  352  static inline int mce_threshold_create_device(unsigned int cpu)		{ return 0; };
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10  353  static inline int mce_threshold_remove_device(unsigned int cpu)		{ return 0; };
c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam             2017-12-18  354  static inline bool amd_mce_is_memory_error(struct mce *m)		{ return false; };
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22  355  static inline void mce_amd_feature_init(struct cpuinfo_x86 *c)		{ }
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  356  #endif
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan    2016-03-07  357  
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam             2019-03-22 @358  static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c)	{ return mce_amd_feature_init(c); }
e9c2a283e7d9d4e arch/x86/include/asm/mce.h Arnd Bergmann             2023-05-16  359  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
  2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
  2024-08-09  7:31   ` kernel test robot
  2024-08-09  7:31   ` kernel test robot
@ 2024-08-09 11:48   ` kernel test robot
  2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-08-09 11:48 UTC (permalink / raw)
  To: Shiyang Ruan, qemu-devel, linux-cxl, linux-edac, linux-mm,
	dan.j.williams, vishal.l.verma, Jonathan.Cameron,
	alison.schofield
  Cc: oe-kbuild-all, bp, dave.jiang, dave, ira.weiny, james.morse,
	linmiaohe, mchehab, nao.horiguchi, rric, tony.luck, ruansy.fnst

Hi Shiyang,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/x86/core]
[also build test WARNING on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base:   tip/x86/core
patch link:    https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: x86_64-randconfig-121-20240809 (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091914.TFbjPuNQ-lkp@intel.com/

sparse warnings: (new ones prefixed by >>)
>> drivers/cxl/core/mbox.c:1465:1: sparse: sparse: symbol 'cxl_mce_records' was not declared. Should it be static?
   drivers/cxl/core/mbox.c: note: in included file (through include/linux/gfp.h, include/linux/xarray.h, include/linux/list_lru.h, ...):
   include/linux/mmzone.h:2018:40: sparse: sparse: self-comparison always evaluates to false

vim +/cxl_mce_records +1465 drivers/cxl/core/mbox.c

  1464	
> 1465	DEFINE_XARRAY(cxl_mce_records);
  1466	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding
  2024-08-08 18:28   ` Fan Ni
@ 2024-08-21 13:57     ` Shiyang Ruan via
  0 siblings, 0 replies; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-21 13:57 UTC (permalink / raw)
  To: Fan Ni
  Cc: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
	vishal.l.verma, Jonathan.Cameron, alison.schofield, bp,
	dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
	nao.horiguchi, rric, tony.luck



在 2024/8/9 2:28, Fan Ni 写道:
> On Thu, Aug 08, 2024 at 11:13:27PM +0800, Shiyang Ruan wrote:
>> CXL device can find&report memory problems, even before MCE is detected
>> by CPU.  AFAIK, the current kernel only traces POISON error event
>> from FW-First/OS-First path, but it doesn't handle them, neither
>> notify processes who are using the POISON page like MCE does.
>>
>> Thus, user have to read logs from trace and find out which device
>> reported the error and which applications are affected.  That is not
>> an easy work and cannot be handled in time.  Thus, it is needed to add
>> the feature to make the work done automatically and quickly.  Once CXL
>> device reports the POISON error (via FW-First/OS-First), kernel
>> handles it immediately, similar to the flow when a MCE is triggered.
>>
>> The current call trace of error reporting&handling looks like this:
>> ```
>> 1.  MCE (interrupt #18, while CPU consuming POISON)
>>       -> do_machine_check()
>>         -> mce_log()
>>           -> notify chain (x86_mce_decoder_chain)
>>             -> memory_failure()
>>
>> 2.a FW-First (optional, CXL device proactively find&report)
>>       -> CXL device -> Firmware
>>         -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
>>                                                    \-> memory_failure()
>>                                                        ^----- ADD
>> 2.b OS-First (optional, CXL device proactively find&report)
>>       -> CXL device -> MSI
>>         -> OS: CXL driver -> trace
>>                          \-> memory_failure()
>>                              ^------------------------------- ADD
>> ```
>> This patch adds calling memory_failure() while CXL device reporting
>> error is received, marked as "ADD" in figure above.
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>> ---
>>   drivers/cxl/core/mbox.c   | 75 ++++++++++++++++++++++++++++++++-------
>>   drivers/cxl/cxlmem.h      |  8 ++---
>>   drivers/cxl/pci.c         |  4 +--
>>   include/linux/cxl-event.h | 16 ++++++++-
>>   4 files changed, 83 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index e5cdeafdf76e..0cb6ef2e6600 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -849,10 +849,55 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
>>   }
>>   EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>>   
>> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> -			    enum cxl_event_log_type type,
>> -			    enum cxl_event_type event_type,
>> -			    const uuid_t *uuid, union cxl_event *evt)
>> +static void cxl_report_poison(struct cxl_memdev *cxlmd, u64 hpa)
>> +{
>> +	unsigned long pfn = PHYS_PFN(hpa);
>> +
>> +	memory_failure_queue(pfn, 0);
>> +}
>> +
>> +static void cxl_event_handle_general_media(struct cxl_memdev *cxlmd,
>> +					   enum cxl_event_log_type type,
>> +					   u64 hpa,
>> +					   struct cxl_event_gen_media *rec)
>> +{
>> +	if (type == CXL_EVENT_TYPE_FAIL) {
>> +		switch (rec->media_hdr.transaction_type) {
>> +		case CXL_EVENT_TRANSACTION_READ:
>> +		case CXL_EVENT_TRANSACTION_WRITE:
>> +		case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
>> +		case CXL_EVENT_TRANSACTION_INJECT_POISON:
>> +			cxl_report_poison(cxlmd, hpa);
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +	}
>> +}
>> +
>> +static void cxl_event_handle_dram(struct cxl_memdev *cxlmd,
>> +				  enum cxl_event_log_type type,
>> +				  u64 hpa,
>> +				  struct cxl_event_dram *rec)
>> +{
>> +	if (type == CXL_EVENT_TYPE_FAIL) {
>> +		switch (rec->media_hdr.transaction_type) {
>> +		case CXL_EVENT_TRANSACTION_READ:
>> +		case CXL_EVENT_TRANSACTION_WRITE:
>> +		case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
>> +		case CXL_EVENT_TRANSACTION_INJECT_POISON:
>> +			cxl_report_poison(cxlmd, hpa);
>> +			break;
>> +		default:
>> +			break;
>> +		}
>> +	}
>> +}
>> +
>> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> +			     enum cxl_event_log_type type,
>> +			     enum cxl_event_type event_type,
>> +			     const uuid_t *uuid, union cxl_event *evt)
>>   {
>>   	if (event_type == CXL_CPER_EVENT_MEM_MODULE) {
>>   		trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
>> @@ -880,18 +925,22 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>>   		if (cxlr)
>>   			hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
>>   
>> -		if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
>> +		if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
>>   			trace_cxl_general_media(cxlmd, type, cxlr, hpa,
>>   						&evt->gen_media);
>> -		else if (event_type == CXL_CPER_EVENT_DRAM)
>> +			cxl_event_handle_general_media(cxlmd, type, hpa,
>> +						&evt->gen_media);
>> +		} else if (event_type == CXL_CPER_EVENT_DRAM) {
>>   			trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
>> +			cxl_event_handle_dram(cxlmd, type, hpa, &evt->dram);
> 
> Does it make sense to call the trace function in
> cxl_event_handle_dram/general_media and replace the trace function with
> the handle_* here?

Sorry for late reply.  I'm not really good at naming functions.  Since 
the trace functions already have the framework to deal with each kind of 
uuids and event types, I don't think we should make another one for the 
same logics.  Thus, I reused it and renamed the functions.  Maybe 
"handle" isn't a good word to describe "tracing records and doing 
memory_failure if necessary".  Could you help me to name it better?

> 
>> +		}
>>   	}
>>   }
>> -EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
>> +EXPORT_SYMBOL_NS_GPL(cxl_event_handle_record, CXL);
>>   
>> -static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> -				     enum cxl_event_log_type type,
>> -				     struct cxl_event_record_raw *record)
>> +static void __cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> +				      enum cxl_event_log_type type,
>> +				      struct cxl_event_record_raw *record)
>>   {
>>   	enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
>>   	const uuid_t *uuid = &record->id;
>> @@ -903,7 +952,7 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>>   	else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
>>   		ev_type = CXL_CPER_EVENT_MEM_MODULE;
>>   
>> -	cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
>> +	cxl_event_handle_record(cxlmd, type, ev_type, uuid, &record->event);
>>   }
>>   
>>   static int cxl_clear_event_record(struct cxl_memdev_state *mds,
>> @@ -1012,8 +1061,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>>   			break;
>>   
>>   		for (i = 0; i < nr_rec; i++)
>> -			__cxl_event_trace_record(cxlmd, type,
>> -						 &payload->records[i]);
>> +			__cxl_event_handle_record(cxlmd, type,
>> +						  &payload->records[i]);
>>   
>>   		if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
>>   			trace_cxl_overflow(cxlmd, type, payload);
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index afb53d058d62..5c4810dcbdeb 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -826,10 +826,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>>   void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>>   				  unsigned long *cmds);
>>   void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
>> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> -			    enum cxl_event_log_type type,
>> -			    enum cxl_event_type event_type,
>> -			    const uuid_t *uuid, union cxl_event *evt);
>> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> +			     enum cxl_event_log_type type,
>> +			     enum cxl_event_type event_type,
>> +			     const uuid_t *uuid, union cxl_event *evt);
>>   int cxl_set_timestamp(struct cxl_memdev_state *mds);
>>   int cxl_poison_state_init(struct cxl_memdev_state *mds);
>>   int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 4be35dc22202..6e65ca89f666 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -1029,8 +1029,8 @@ static void cxl_handle_cper_event(enum cxl_event_type ev_type,
>>   	hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
>>   	log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
>>   
>> -	cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
>> -			       &uuid_null, &rec->event);
>> +	cxl_event_handle_record(cxlds->cxlmd, log_type, ev_type,
>> +				&uuid_null, &rec->event);
>>   }
>>   
>>   static void cxl_cper_work_fn(struct work_struct *work)
>> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
>> index 0bea1afbd747..be4342a2b597 100644
>> --- a/include/linux/cxl-event.h
>> +++ b/include/linux/cxl-event.h
>> @@ -7,6 +7,20 @@
>>   #include <linux/uuid.h>
>>   #include <linux/workqueue_types.h>
>>   
>> +/*
>> + * Event transaction type
>> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
> 
> Here and below, update the specification reference to reflect cxl 3.1.

Ok. Will update it.


--
Thanks,
Ruan.

> 
> Fan
>> + */
>> +enum cxl_event_transaction_type {
>> +	CXL_EVENT_TRANSACTION_UNKNOWN = 0X00,
>> +	CXL_EVENT_TRANSACTION_READ,
>> +	CXL_EVENT_TRANSACTION_WRITE,
>> +	CXL_EVENT_TRANSACTION_SCAN_MEDIA,
>> +	CXL_EVENT_TRANSACTION_INJECT_POISON,
>> +	CXL_EVENT_TRANSACTION_MEDIA_SCRUB,
>> +	CXL_EVENT_TRANSACTION_MEDIA_MANAGEMENT,
>> +};
>> +
>>   /*
>>    * Common Event Record Format
>>    * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
>> @@ -26,7 +40,7 @@ struct cxl_event_media_hdr {
>>   	__le64 phys_addr;
>>   	u8 descriptor;
>>   	u8 type;
>> -	u8 transaction_type;
>> +	u8 transaction_type;	/* enum cxl_event_transaction_type */
>>   	/*
>>   	 * The meaning of Validity Flags from bit 2 is
>>   	 * different across DRAM and General Media records
>> -- 
>> 2.34.1
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-08-21 13:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-08 15:13 [PATCH v4 0/2] cxl: add device reporting poison handler Shiyang Ruan via
2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
2024-08-08 18:28   ` Fan Ni
2024-08-21 13:57     ` Shiyang Ruan via
2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
2024-08-09  7:31   ` kernel test robot
2024-08-09  7:31   ` kernel test robot
2024-08-09 11:48   ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).