* [PATCH v4 0/2] cxl: add device reporting poison handler
@ 2024-08-08 15:13 Shiyang Ruan via
2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
0 siblings, 2 replies; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-08 15:13 UTC (permalink / raw)
To: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
vishal.l.verma, Jonathan.Cameron, alison.schofield
Cc: bp, dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
nao.horiguchi, rric, tony.luck, ruansy.fnst
This patchset includes "cxl/core: introduce poison creation hanlding"
and "cxl: avoid duplicated report from MCE & device", which were posted
separately. Here are changes since last version of each patch:
P1: 1. since its async memory_failure(), set the flag to 0
2. also handle CXL_EVENT_TRANSACTION_SCAN_MEDIA type
P2: 1. use XArray instead of list_head
2. add guard() lock for cxl device iteration
P1&P2: Rebase to v6.11-rc1
As is known to us, CXL spec defines POISON feature to notify its status
when CXL memory device got a broken page. Basically, there are two
major paths for the notification.
1. CPU handling error
When a process is accessing this broken page, CXL device returns data
with POISON. When CPU consumes the POISON, it raises a kind of error
notification.
To be precise, "how CPU should behave when it consumes POISON" is
architecture dependent. In my understanding, x86-64 raises Machine
Check Exception(MCE) via interrupt #18 in this case.
2. CXL device reporting error
When CXL device detects the broken page by itself and sends memory
error signal to kernel in two optional paths.
2.a. FW-First
CXL device sends error via VDM to CXL Host, then CXL Host sends it
to System Firmware via interrupt, finally kernel handles the error.
2.b. OS-First
CXL device directly sends error via MSI/MSI-X to kernel.
Note: Since I'm now focusing on x86_64, basically I'll describe about
x86-64 only.
The following diagram should describe the 2 major paths and 2 optional
sub-paths above.
```
1. MCE (interrupt #18, while CPU consuming POISON)
-> do_machine_check()
-> mce_log()
-> notify chain (x86_mce_decoder_chain)
-> memory_failure()
2.a FW-First (optional, CXL device proactively find&report)
-> CXL device -> Firmware
-> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
2.b OS-First (optional, CXL device proactively find&report)
-> CXL device -> MSI
-> OS: CXL driver -> trace
```
For "1. CPU handling error" path, the current code seems to work fine.
When I used error injection feature on QEMU emulation, the code path is
executed certainly. Then, if the CPU certainly raises a MCE when it
consumes the POISON, this path has no problem.
So, I'm working on making for 2.a and 2.b path, which is CXL device
reported POISON error could be handled by kernel. This path has two
advantages.
- Proactively find&report memory problems
Even if a process does not read data yet, kernel/drivers can prevent
the process from using corrupted data proactively. AFAIK, the current
kernel only traces POISON error event from FW-First/OS-First path, but
it doesn't handle them, neither notify processes who are using the
POISON page like MCE does. User space tools like rasdaemon reads the
trace and log it, but as well, it doesn't handle the POISON page. As
a result, user has to read the error log from rasdaemon, distinguish
whether the POISON error is from CXL memory or DDR memory, find out
which applications are effected. That is not an easy work and cannot
be handled in time. Thus, I'd like to add a feature to make the work
done automatically and quickly. Once CXL device reports the POISON
error (via FW-First/OS-First), kernel handles it immediately, similar
to the flow when a MCE is triggered. This is my first motivation.
- Architecture independent
As the mentioned above, "1. CPU handling error" path is architecture
dependent. On the other hand, this route can be architecture
independent code. If there is a CPU which does not have similar
feature like MCE of x86-64, my work will be essential. (To be honest,
I did not notice this advantage at first as mentioned later, but I
think this is also important.)
Shiyang Ruan (2):
cxl/core: introduce device reporting poison hanlding
cxl: avoid duplicated report from MCE & device
arch/x86/include/asm/mce.h | 1 +
drivers/cxl/core/mbox.c | 190 ++++++++++++++++++++++++++++++++++---
drivers/cxl/core/memdev.c | 6 +-
drivers/cxl/cxlmem.h | 11 ++-
drivers/cxl/pci.c | 4 +-
include/linux/cxl-event.h | 16 +++-
6 files changed, 207 insertions(+), 21 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding
2024-08-08 15:13 [PATCH v4 0/2] cxl: add device reporting poison handler Shiyang Ruan via
@ 2024-08-08 15:13 ` Shiyang Ruan via
2024-08-08 18:28 ` Fan Ni
2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
1 sibling, 1 reply; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-08 15:13 UTC (permalink / raw)
To: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
vishal.l.verma, Jonathan.Cameron, alison.schofield
Cc: bp, dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
nao.horiguchi, rric, tony.luck, ruansy.fnst
CXL device can find&report memory problems, even before MCE is detected
by CPU. AFAIK, the current kernel only traces POISON error event
from FW-First/OS-First path, but it doesn't handle them, neither
notify processes who are using the POISON page like MCE does.
Thus, user have to read logs from trace and find out which device
reported the error and which applications are affected. That is not
an easy work and cannot be handled in time. Thus, it is needed to add
the feature to make the work done automatically and quickly. Once CXL
device reports the POISON error (via FW-First/OS-First), kernel
handles it immediately, similar to the flow when a MCE is triggered.
The current call trace of error reporting&handling looks like this:
```
1. MCE (interrupt #18, while CPU consuming POISON)
-> do_machine_check()
-> mce_log()
-> notify chain (x86_mce_decoder_chain)
-> memory_failure()
2.a FW-First (optional, CXL device proactively find&report)
-> CXL device -> Firmware
-> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
\-> memory_failure()
^----- ADD
2.b OS-First (optional, CXL device proactively find&report)
-> CXL device -> MSI
-> OS: CXL driver -> trace
\-> memory_failure()
^------------------------------- ADD
```
This patch adds calling memory_failure() while CXL device reporting
error is received, marked as "ADD" in figure above.
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
drivers/cxl/core/mbox.c | 75 ++++++++++++++++++++++++++++++++-------
drivers/cxl/cxlmem.h | 8 ++---
drivers/cxl/pci.c | 4 +--
include/linux/cxl-event.h | 16 ++++++++-
4 files changed, 83 insertions(+), 20 deletions(-)
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index e5cdeafdf76e..0cb6ef2e6600 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -849,10 +849,55 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
-void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
- enum cxl_event_log_type type,
- enum cxl_event_type event_type,
- const uuid_t *uuid, union cxl_event *evt)
+static void cxl_report_poison(struct cxl_memdev *cxlmd, u64 hpa)
+{
+ unsigned long pfn = PHYS_PFN(hpa);
+
+ memory_failure_queue(pfn, 0);
+}
+
+static void cxl_event_handle_general_media(struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ u64 hpa,
+ struct cxl_event_gen_media *rec)
+{
+ if (type == CXL_EVENT_TYPE_FAIL) {
+ switch (rec->media_hdr.transaction_type) {
+ case CXL_EVENT_TRANSACTION_READ:
+ case CXL_EVENT_TRANSACTION_WRITE:
+ case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
+ case CXL_EVENT_TRANSACTION_INJECT_POISON:
+ cxl_report_poison(cxlmd, hpa);
+ break;
+ default:
+ break;
+ }
+ }
+}
+
+static void cxl_event_handle_dram(struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ u64 hpa,
+ struct cxl_event_dram *rec)
+{
+ if (type == CXL_EVENT_TYPE_FAIL) {
+ switch (rec->media_hdr.transaction_type) {
+ case CXL_EVENT_TRANSACTION_READ:
+ case CXL_EVENT_TRANSACTION_WRITE:
+ case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
+ case CXL_EVENT_TRANSACTION_INJECT_POISON:
+ cxl_report_poison(cxlmd, hpa);
+ break;
+ default:
+ break;
+ }
+ }
+}
+
+void cxl_event_handle_record(struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ const uuid_t *uuid, union cxl_event *evt)
{
if (event_type == CXL_CPER_EVENT_MEM_MODULE) {
trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
@@ -880,18 +925,22 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
if (cxlr)
hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
- if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
+ if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
&evt->gen_media);
- else if (event_type == CXL_CPER_EVENT_DRAM)
+ cxl_event_handle_general_media(cxlmd, type, hpa,
+ &evt->gen_media);
+ } else if (event_type == CXL_CPER_EVENT_DRAM) {
trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
+ cxl_event_handle_dram(cxlmd, type, hpa, &evt->dram);
+ }
}
}
-EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
+EXPORT_SYMBOL_NS_GPL(cxl_event_handle_record, CXL);
-static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
- enum cxl_event_log_type type,
- struct cxl_event_record_raw *record)
+static void __cxl_event_handle_record(struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ struct cxl_event_record_raw *record)
{
enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
const uuid_t *uuid = &record->id;
@@ -903,7 +952,7 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
ev_type = CXL_CPER_EVENT_MEM_MODULE;
- cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
+ cxl_event_handle_record(cxlmd, type, ev_type, uuid, &record->event);
}
static int cxl_clear_event_record(struct cxl_memdev_state *mds,
@@ -1012,8 +1061,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
break;
for (i = 0; i < nr_rec; i++)
- __cxl_event_trace_record(cxlmd, type,
- &payload->records[i]);
+ __cxl_event_handle_record(cxlmd, type,
+ &payload->records[i]);
if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
trace_cxl_overflow(cxlmd, type, payload);
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index afb53d058d62..5c4810dcbdeb 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -826,10 +826,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
unsigned long *cmds);
void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
-void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
- enum cxl_event_log_type type,
- enum cxl_event_type event_type,
- const uuid_t *uuid, union cxl_event *evt);
+void cxl_event_handle_record(struct cxl_memdev *cxlmd,
+ enum cxl_event_log_type type,
+ enum cxl_event_type event_type,
+ const uuid_t *uuid, union cxl_event *evt);
int cxl_set_timestamp(struct cxl_memdev_state *mds);
int cxl_poison_state_init(struct cxl_memdev_state *mds);
int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
index 4be35dc22202..6e65ca89f666 100644
--- a/drivers/cxl/pci.c
+++ b/drivers/cxl/pci.c
@@ -1029,8 +1029,8 @@ static void cxl_handle_cper_event(enum cxl_event_type ev_type,
hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
- cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
- &uuid_null, &rec->event);
+ cxl_event_handle_record(cxlds->cxlmd, log_type, ev_type,
+ &uuid_null, &rec->event);
}
static void cxl_cper_work_fn(struct work_struct *work)
diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
index 0bea1afbd747..be4342a2b597 100644
--- a/include/linux/cxl-event.h
+++ b/include/linux/cxl-event.h
@@ -7,6 +7,20 @@
#include <linux/uuid.h>
#include <linux/workqueue_types.h>
+/*
+ * Event transaction type
+ * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
+ */
+enum cxl_event_transaction_type {
+ CXL_EVENT_TRANSACTION_UNKNOWN = 0X00,
+ CXL_EVENT_TRANSACTION_READ,
+ CXL_EVENT_TRANSACTION_WRITE,
+ CXL_EVENT_TRANSACTION_SCAN_MEDIA,
+ CXL_EVENT_TRANSACTION_INJECT_POISON,
+ CXL_EVENT_TRANSACTION_MEDIA_SCRUB,
+ CXL_EVENT_TRANSACTION_MEDIA_MANAGEMENT,
+};
+
/*
* Common Event Record Format
* CXL rev 3.0 section 8.2.9.2.1; Table 8-42
@@ -26,7 +40,7 @@ struct cxl_event_media_hdr {
__le64 phys_addr;
u8 descriptor;
u8 type;
- u8 transaction_type;
+ u8 transaction_type; /* enum cxl_event_transaction_type */
/*
* The meaning of Validity Flags from bit 2 is
* different across DRAM and General Media records
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
2024-08-08 15:13 [PATCH v4 0/2] cxl: add device reporting poison handler Shiyang Ruan via
2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
@ 2024-08-08 15:13 ` Shiyang Ruan via
2024-08-09 7:31 ` kernel test robot
` (2 more replies)
1 sibling, 3 replies; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-08 15:13 UTC (permalink / raw)
To: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
vishal.l.verma, Jonathan.Cameron, alison.schofield
Cc: bp, dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
nao.horiguchi, rric, tony.luck, ruansy.fnst
Since CXL device is a memory device, while CPU is consuming a poison
page of CXL device, it always triggers a MCE (via interrupt #18) and
calls memory_failure() to handle POISON page, no matter which-First path
is configured. CXL device could also find and report the POISON, kernel
now not only traces but also calls memory_failure() to handle it, which
is marked as "NEW" in the figure blow.
```
1. MCE (interrupt #18, while CPU consuming POISON)
-> do_machine_check()
-> mce_log()
-> notify chain (x86_mce_decoder_chain)
-> memory_failure() <---------------------------- EXISTS
2.a FW-First (optional, CXL device proactively find&report)
-> CXL device -> Firmware
-> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
\-> memory_failure()
^----- NEW
2.b OS-First (optional, CXL device proactively find&report)
-> CXL device -> MSI
-> OS: CXL driver -> trace
\-> memory_failure()
^------------------------------- NEW
```
But in this way, the memory_failure() could be called twice or even at
same time, as is shown in the figure above: (1.) and (2.a or 2.b),
before the POISON page is cleared. memory_failure() has it own mutex
lock so it actually won't be called at same time and the later call
could be avoided because HWPoison bit has been set. However, assume
such a scenario, "CXL device reports POISON error" triggers 1st call,
user see it from log and want to clear the poison by executing `cxl
clear-poison` command, and at the same time, a process tries to access
this POISON page, which triggers MCE (it's the 2nd call). Since there
is no lock between the 2nd call with clearing poison operation, race
condition may happen, which may cause HWPoison bit of the page in an
unknown state.
Thus, we have to avoid the 2nd call. This patch[2] introduces a new
notifier_block into `x86_mce_decoder_chain` and a POISON cache list, to
stop the 2nd call of memory_failure(). It checks whether the current
poison page has been reported (if yes, stop the notifier chain, don't
call the following memory_failure() to report again).
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
arch/x86/include/asm/mce.h | 1 +
drivers/cxl/core/mbox.c | 115 +++++++++++++++++++++++++++++++++++++
drivers/cxl/core/memdev.c | 6 +-
drivers/cxl/cxlmem.h | 3 +
4 files changed, 124 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3ad29b128943..5da45e870858 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -182,6 +182,7 @@ enum mce_notifier_prios {
MCE_PRIO_NFIT,
MCE_PRIO_EXTLOG,
MCE_PRIO_UC,
+ MCE_PRIO_CXL,
MCE_PRIO_EARLY,
MCE_PRIO_CEC,
MCE_PRIO_HIGHEST = MCE_PRIO_CEC
diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 0cb6ef2e6600..b21700428c35 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -4,6 +4,8 @@
#include <linux/debugfs.h>
#include <linux/ktime.h>
#include <linux/mutex.h>
+#include <linux/notifier.h>
+#include <asm/mce.h>
#include <asm/unaligned.h>
#include <cxlpci.h>
#include <cxlmem.h>
@@ -925,6 +927,9 @@ void cxl_event_handle_record(struct cxl_memdev *cxlmd,
if (cxlr)
hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
+ if (hpa != ULLONG_MAX && cxl_mce_recorded(hpa))
+ return;
+
if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
trace_cxl_general_media(cxlmd, type, cxlr, hpa,
&evt->gen_media);
@@ -1457,6 +1462,112 @@ int cxl_poison_state_init(struct cxl_memdev_state *mds)
}
EXPORT_SYMBOL_NS_GPL(cxl_poison_state_init, CXL);
+DEFINE_XARRAY(cxl_mce_records);
+
+bool cxl_mce_recorded(u64 hpa)
+{
+ XA_STATE(xas, &cxl_mce_records, hpa);
+ void *entry;
+
+ xas_lock_irq(&xas);
+ entry = xas_load(&xas);
+ if (entry) {
+ xas_unlock_irq(&xas);
+ return true;
+ }
+ entry = xa_mk_value(hpa);
+ xas_store(&xas, entry);
+ xas_unlock_irq(&xas);
+
+ return false;
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mce_recorded, CXL);
+
+void cxl_mce_clear(u64 hpa)
+{
+ XA_STATE(xas, &cxl_mce_records, hpa);
+ void *entry;
+
+ xas_lock_irq(&xas);
+ entry = xas_load(&xas);
+ if (entry) {
+ xas_store(&xas, NULL);
+ }
+ xas_unlock_irq(&xas);
+}
+EXPORT_SYMBOL_NS_GPL(cxl_mce_clear, CXL);
+
+struct cxl_contains_hpa_context {
+ bool contains;
+ u64 hpa;
+};
+
+static int __cxl_contains_hpa(struct device *dev, void *arg)
+{
+ struct cxl_contains_hpa_context *ctx = arg;
+ struct cxl_endpoint_decoder *cxled;
+ struct range *range;
+ u64 hpa = ctx->hpa;
+
+ if (!is_endpoint_decoder(dev))
+ return 0;
+
+ cxled = to_cxl_endpoint_decoder(dev);
+ range = &cxled->cxld.hpa_range;
+
+ if (range->start <= hpa && hpa <= range->end) {
+ ctx->contains = true;
+ return 1;
+ }
+
+ return 0;
+}
+
+static bool cxl_contains_hpa(const struct cxl_memdev *cxlmd, u64 hpa)
+{
+ struct cxl_contains_hpa_context ctx = {
+ .contains = false,
+ .hpa = hpa,
+ };
+ struct cxl_port *port;
+
+ port = cxlmd->endpoint;
+ guard(rwsem_write)(&cxl_region_rwsem);
+ if (port && cxl_num_decoders_committed(port))
+ device_for_each_child(&port->dev, &ctx, __cxl_contains_hpa);
+
+ return ctx.contains;
+}
+
+static int cxl_handle_mce(struct notifier_block *nb, unsigned long val,
+ void *data)
+{
+ struct mce *mce = (struct mce *)data;
+ struct cxl_memdev_state *mds = container_of(nb, struct cxl_memdev_state,
+ mce_notifier);
+ u64 hpa;
+
+ if (!mce || !mce_usable_address(mce))
+ return NOTIFY_DONE;
+
+ hpa = mce->addr & MCI_ADDR_PHYSADDR;
+
+ /* Check if the PFN is located on this CXL device */
+ if (!pfn_valid(hpa >> PAGE_SHIFT) &&
+ !cxl_contains_hpa(mds->cxlds.cxlmd, hpa))
+ return NOTIFY_DONE;
+
+ /*
+ * Search PFN in the cxl_mce_records, if already exists, don't continue
+ * to do memory_failure() to avoid a poison address being reported
+ * more than once.
+ */
+ if (cxl_mce_recorded(hpa))
+ return NOTIFY_STOP;
+ else
+ return NOTIFY_OK;
+}
+
struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
{
struct cxl_memdev_state *mds;
@@ -1476,6 +1587,10 @@ struct cxl_memdev_state *cxl_memdev_state_create(struct device *dev)
mds->ram_perf.qos_class = CXL_QOS_CLASS_INVALID;
mds->pmem_perf.qos_class = CXL_QOS_CLASS_INVALID;
+ mds->mce_notifier.notifier_call = cxl_handle_mce;
+ mds->mce_notifier.priority = MCE_PRIO_CXL;
+ mce_register_decode_chain(&mds->mce_notifier);
+
return mds;
}
EXPORT_SYMBOL_NS_GPL(cxl_memdev_state_create, CXL);
diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 0277726afd04..9d4ed4dc4d51 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -376,10 +376,14 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
goto out;
cxlr = cxl_dpa_to_region(cxlmd, dpa);
- if (cxlr)
+ if (cxlr) {
+ u64 hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
+
+ cxl_mce_clear(hpa);
dev_warn_once(mds->cxlds.dev,
"poison clear dpa:%#llx region: %s\n", dpa,
dev_name(&cxlr->dev));
+ }
record = (struct cxl_poison_record) {
.address = cpu_to_le64(dpa),
diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
index 5c4810dcbdeb..d2d906c26755 100644
--- a/drivers/cxl/cxlmem.h
+++ b/drivers/cxl/cxlmem.h
@@ -502,6 +502,7 @@ struct cxl_memdev_state {
struct cxl_fw_state fw;
struct rcuwait mbox_wait;
+ struct notifier_block mce_notifier;
int (*mbox_send)(struct cxl_memdev_state *mds,
struct cxl_mbox_cmd *cmd);
};
@@ -837,6 +838,8 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
int cxl_trigger_poison_list(struct cxl_memdev *cxlmd);
int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa);
int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa);
+bool cxl_mce_recorded(u64 pfn);
+void cxl_mce_clear(u64 pfn);
#ifdef CONFIG_CXL_SUSPEND
void cxl_mem_active_inc(void);
--
2.34.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding
2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
@ 2024-08-08 18:28 ` Fan Ni
2024-08-21 13:57 ` Shiyang Ruan via
0 siblings, 1 reply; 8+ messages in thread
From: Fan Ni @ 2024-08-08 18:28 UTC (permalink / raw)
To: Shiyang Ruan
Cc: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
vishal.l.verma, Jonathan.Cameron, alison.schofield, bp,
dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
nao.horiguchi, rric, tony.luck
On Thu, Aug 08, 2024 at 11:13:27PM +0800, Shiyang Ruan wrote:
> CXL device can find&report memory problems, even before MCE is detected
> by CPU. AFAIK, the current kernel only traces POISON error event
> from FW-First/OS-First path, but it doesn't handle them, neither
> notify processes who are using the POISON page like MCE does.
>
> Thus, user have to read logs from trace and find out which device
> reported the error and which applications are affected. That is not
> an easy work and cannot be handled in time. Thus, it is needed to add
> the feature to make the work done automatically and quickly. Once CXL
> device reports the POISON error (via FW-First/OS-First), kernel
> handles it immediately, similar to the flow when a MCE is triggered.
>
> The current call trace of error reporting&handling looks like this:
> ```
> 1. MCE (interrupt #18, while CPU consuming POISON)
> -> do_machine_check()
> -> mce_log()
> -> notify chain (x86_mce_decoder_chain)
> -> memory_failure()
>
> 2.a FW-First (optional, CXL device proactively find&report)
> -> CXL device -> Firmware
> -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
> \-> memory_failure()
> ^----- ADD
> 2.b OS-First (optional, CXL device proactively find&report)
> -> CXL device -> MSI
> -> OS: CXL driver -> trace
> \-> memory_failure()
> ^------------------------------- ADD
> ```
> This patch adds calling memory_failure() while CXL device reporting
> error is received, marked as "ADD" in figure above.
>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
> drivers/cxl/core/mbox.c | 75 ++++++++++++++++++++++++++++++++-------
> drivers/cxl/cxlmem.h | 8 ++---
> drivers/cxl/pci.c | 4 +--
> include/linux/cxl-event.h | 16 ++++++++-
> 4 files changed, 83 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index e5cdeafdf76e..0cb6ef2e6600 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -849,10 +849,55 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
> }
> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>
> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - enum cxl_event_type event_type,
> - const uuid_t *uuid, union cxl_event *evt)
> +static void cxl_report_poison(struct cxl_memdev *cxlmd, u64 hpa)
> +{
> + unsigned long pfn = PHYS_PFN(hpa);
> +
> + memory_failure_queue(pfn, 0);
> +}
> +
> +static void cxl_event_handle_general_media(struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + u64 hpa,
> + struct cxl_event_gen_media *rec)
> +{
> + if (type == CXL_EVENT_TYPE_FAIL) {
> + switch (rec->media_hdr.transaction_type) {
> + case CXL_EVENT_TRANSACTION_READ:
> + case CXL_EVENT_TRANSACTION_WRITE:
> + case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
> + case CXL_EVENT_TRANSACTION_INJECT_POISON:
> + cxl_report_poison(cxlmd, hpa);
> + break;
> + default:
> + break;
> + }
> + }
> +}
> +
> +static void cxl_event_handle_dram(struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + u64 hpa,
> + struct cxl_event_dram *rec)
> +{
> + if (type == CXL_EVENT_TYPE_FAIL) {
> + switch (rec->media_hdr.transaction_type) {
> + case CXL_EVENT_TRANSACTION_READ:
> + case CXL_EVENT_TRANSACTION_WRITE:
> + case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
> + case CXL_EVENT_TRANSACTION_INJECT_POISON:
> + cxl_report_poison(cxlmd, hpa);
> + break;
> + default:
> + break;
> + }
> + }
> +}
> +
> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + const uuid_t *uuid, union cxl_event *evt)
> {
> if (event_type == CXL_CPER_EVENT_MEM_MODULE) {
> trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
> @@ -880,18 +925,22 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> if (cxlr)
> hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
>
> - if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
> + if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
> trace_cxl_general_media(cxlmd, type, cxlr, hpa,
> &evt->gen_media);
> - else if (event_type == CXL_CPER_EVENT_DRAM)
> + cxl_event_handle_general_media(cxlmd, type, hpa,
> + &evt->gen_media);
> + } else if (event_type == CXL_CPER_EVENT_DRAM) {
> trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
> + cxl_event_handle_dram(cxlmd, type, hpa, &evt->dram);
Does it make sense to call the trace function in
cxl_event_handle_dram/general_media and replace the trace function with
the handle_* here?
> + }
> }
> }
> -EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
> +EXPORT_SYMBOL_NS_GPL(cxl_event_handle_record, CXL);
>
> -static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - struct cxl_event_record_raw *record)
> +static void __cxl_event_handle_record(struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + struct cxl_event_record_raw *record)
> {
> enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
> const uuid_t *uuid = &record->id;
> @@ -903,7 +952,7 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
> ev_type = CXL_CPER_EVENT_MEM_MODULE;
>
> - cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
> + cxl_event_handle_record(cxlmd, type, ev_type, uuid, &record->event);
> }
>
> static int cxl_clear_event_record(struct cxl_memdev_state *mds,
> @@ -1012,8 +1061,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> break;
>
> for (i = 0; i < nr_rec; i++)
> - __cxl_event_trace_record(cxlmd, type,
> - &payload->records[i]);
> + __cxl_event_handle_record(cxlmd, type,
> + &payload->records[i]);
>
> if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
> trace_cxl_overflow(cxlmd, type, payload);
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index afb53d058d62..5c4810dcbdeb 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -826,10 +826,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
> unsigned long *cmds);
> void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
> - enum cxl_event_log_type type,
> - enum cxl_event_type event_type,
> - const uuid_t *uuid, union cxl_event *evt);
> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
> + enum cxl_event_log_type type,
> + enum cxl_event_type event_type,
> + const uuid_t *uuid, union cxl_event *evt);
> int cxl_set_timestamp(struct cxl_memdev_state *mds);
> int cxl_poison_state_init(struct cxl_memdev_state *mds);
> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index 4be35dc22202..6e65ca89f666 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -1029,8 +1029,8 @@ static void cxl_handle_cper_event(enum cxl_event_type ev_type,
> hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
> log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
>
> - cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
> - &uuid_null, &rec->event);
> + cxl_event_handle_record(cxlds->cxlmd, log_type, ev_type,
> + &uuid_null, &rec->event);
> }
>
> static void cxl_cper_work_fn(struct work_struct *work)
> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
> index 0bea1afbd747..be4342a2b597 100644
> --- a/include/linux/cxl-event.h
> +++ b/include/linux/cxl-event.h
> @@ -7,6 +7,20 @@
> #include <linux/uuid.h>
> #include <linux/workqueue_types.h>
>
> +/*
> + * Event transaction type
> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
Here and below, update the specification reference to reflect cxl 3.1.
Fan
> + */
> +enum cxl_event_transaction_type {
> + CXL_EVENT_TRANSACTION_UNKNOWN = 0X00,
> + CXL_EVENT_TRANSACTION_READ,
> + CXL_EVENT_TRANSACTION_WRITE,
> + CXL_EVENT_TRANSACTION_SCAN_MEDIA,
> + CXL_EVENT_TRANSACTION_INJECT_POISON,
> + CXL_EVENT_TRANSACTION_MEDIA_SCRUB,
> + CXL_EVENT_TRANSACTION_MEDIA_MANAGEMENT,
> +};
> +
> /*
> * Common Event Record Format
> * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
> @@ -26,7 +40,7 @@ struct cxl_event_media_hdr {
> __le64 phys_addr;
> u8 descriptor;
> u8 type;
> - u8 transaction_type;
> + u8 transaction_type; /* enum cxl_event_transaction_type */
> /*
> * The meaning of Validity Flags from bit 2 is
> * different across DRAM and General Media records
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
@ 2024-08-09 7:31 ` kernel test robot
2024-08-09 7:31 ` kernel test robot
2024-08-09 11:48 ` kernel test robot
2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-08-09 7:31 UTC (permalink / raw)
To: Shiyang Ruan, qemu-devel, linux-cxl, linux-edac, linux-mm,
dan.j.williams, vishal.l.verma, Jonathan.Cameron,
alison.schofield
Cc: oe-kbuild-all, bp, dave.jiang, dave, ira.weiny, james.morse,
linmiaohe, mchehab, nao.horiguchi, rric, tony.luck, ruansy.fnst
Hi Shiyang,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base: tip/x86/core
patch link: https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: um-allyesconfig (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091537.p9RKx1R2-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091537.p9RKx1R2-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
In file included from drivers/cxl/core/mbox.c:8:
>> arch/x86/include/asm/mce.h:219:43: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
| ^~~~~~~~~~~
arch/x86/include/asm/mce.h:220:44: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
| ^~~~~~~~~~~
arch/x86/include/asm/mce.h:240:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
| ^~~~~~~~~~~
arch/x86/include/asm/mce.h:241:51: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
| ^~~~~~~~~~~
arch/x86/include/asm/mce.h:248:26: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
248 | int mce_available(struct cpuinfo_x86 *c);
| ^~~~~~~~~~~
arch/x86/include/asm/mce.h:355:48: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
| ^~~~~~~~~~~
arch/x86/include/asm/mce.h:358:50: warning: 'struct cpuinfo_x86' declared inside parameter list will not be visible outside of this definition or declaration
358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
| ^~~~~~~~~~~
arch/x86/include/asm/mce.h: In function 'mce_hygon_feature_init':
>> arch/x86/include/asm/mce.h:358:103: error: passing argument 1 of 'mce_amd_feature_init' from incompatible pointer type [-Werror=incompatible-pointer-types]
358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
| ^
| |
| struct cpuinfo_x86 *
arch/x86/include/asm/mce.h:355:61: note: expected 'struct cpuinfo_x86 *' but argument is of type 'struct cpuinfo_x86 *'
355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
| ~~~~~~~~~~~~~~~~~~~~^
In file included from include/linux/container_of.h:5,
from include/linux/list.h:5,
from include/linux/key.h:14,
from include/linux/security.h:27,
from drivers/cxl/core/mbox.c:3:
drivers/cxl/core/mbox.c: In function 'cxl_handle_mce':
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^
include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
include/linux/bits.h:25:17: note: in expansion of macro '__is_constexpr'
25 | __is_constexpr((l) > (h)), (l) > (h), 0)))
| ^~~~~~~~~~~~~~
include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
| ^~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^~~~~~~~~~~
drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^
include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
| ^~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^~~~~~~~~~~
drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
include/linux/bits.h:24:28: error: first argument to '__builtin_choose_expr' not a constant
24 | (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
| ^~~~~~~~~~~~~~~~~~~~~
include/linux/build_bug.h:16:62: note: in definition of macro 'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
| ^~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^~~~~~~~~~~
drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
include/linux/build_bug.h:16:51: error: bit-field '<anonymous>' width not an integer constant
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
include/linux/bits.h:24:10: note: in expansion of macro 'BUILD_BUG_ON_ZERO'
24 | (BUILD_BUG_ON_ZERO(__builtin_choose_expr( \
| ^~~~~~~~~~~~~~~~~
include/linux/bits.h:37:10: note: in expansion of macro 'GENMASK_INPUT_CHECK'
37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
| ^~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^~~~~~~~~~~
drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
In file included from include/linux/bits.h:7,
from include/linux/ratelimit_types.h:5,
from include/linux/printk.h:9,
from include/asm-generic/bug.h:22,
from ./arch/um/include/generated/asm/bug.h:1,
from include/linux/bug.h:5,
from include/linux/thread_info.h:13,
from include/asm-generic/preempt.h:5,
from ./arch/um/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:79,
from include/linux/rcupdate.h:27,
from include/linux/rbtree.h:24,
from include/linux/key.h:15:
>> arch/x86/include/asm/mce.h:94:58: error: 'struct cpuinfo_um' has no member named 'x86_phys_bits'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^
include/uapi/linux/bits.h:13:52: note: in definition of macro '__GENMASK_ULL'
13 | (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
| ^
arch/x86/include/asm/mce.h:94:33: note: in expansion of macro 'GENMASK_ULL'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ^~~~~~~~~~~
drivers/cxl/core/mbox.c:1553:27: note: in expansion of macro 'MCI_ADDR_PHYSADDR'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +/mce_amd_feature_init +358 arch/x86/include/asm/mce.h
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 210
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 211 #ifdef CONFIG_X86_MCE
a2202aa29289db arch/x86/include/asm/mce.h Yong Wang 2009-11-10 212 int mcheck_init(void);
5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 213 void mcheck_cpu_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 214 void mcheck_cpu_clear(struct cpuinfo_x86 *c);
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 215 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 216 u64 lapic_id);
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 217 #else
a2202aa29289db arch/x86/include/asm/mce.h Yong Wang 2009-11-10 218 static inline int mcheck_init(void) { return 0; }
5e09954a9acc3b arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 @219 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 220 static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 221 static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 222 u64 lapic_id) { return -EINVAL; }
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 223 #endif
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 224
b5f2fa4ea00a17 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 225 void mce_setup(struct mce *m);
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 226 void mce_log(struct mce *m);
d6126ef5f31ca5 arch/x86/include/asm/mce.h Greg Kroah-Hartman 2012-01-26 227 DECLARE_PER_CPU(struct device *, mce_device);
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 228
a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 229 /* Maximum number of MCA banks per CPU. */
a0bc32b3cacf19 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 230 #define MAX_NR_BANKS 64
41fdff322e26c4 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 231
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 232 #ifdef CONFIG_X86_MCE_INTEL
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 233 void mce_intel_feature_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 234 void mce_intel_feature_clear(struct cpuinfo_x86 *c);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 235 void cmci_clear(void);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 236 void cmci_reenable(void);
7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 237 void cmci_rediscover(void);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 238 void cmci_recheck(void);
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 239 #else
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 240 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
8838eb6c0bf3b6 arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 241 static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 242 static inline void cmci_clear(void) {}
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 243 static inline void cmci_reenable(void) {}
7a0c819d28f5c9 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 244 static inline void cmci_rediscover(void) {}
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 245 static inline void cmci_recheck(void) {}
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 246 #endif
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 247
38736072d45488 arch/x86/include/asm/mce.h H. Peter Anvin 2009-05-28 248 int mce_available(struct cpuinfo_x86 *c);
2d1f406139ec20 arch/x86/include/asm/mce.h Borislav Petkov 2017-05-19 249 bool mce_is_memory_error(struct mce *m);
5d96c9342c23ee arch/x86/include/asm/mce.h Vishal Verma 2018-10-25 250 bool mce_is_correctable(struct mce *m);
1bae0cfe4a171c arch/x86/include/asm/mce.h Yazen Ghannam 2023-06-13 251 bool mce_usable_address(struct mce *m);
88ccbedd9ca85d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 252
01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 253 DECLARE_PER_CPU(unsigned, mce_exception_count);
ca84f69697da0f arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 254 DECLARE_PER_CPU(unsigned, mce_poll_count);
01ca79f1411eae arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 255
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 256 typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 257 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
ee031c31d6381d arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 258
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 259 enum mcp_flags {
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 260 MCP_TIMESTAMP = BIT(0), /* log time stamp */
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 261 MCP_UC = BIT(1), /* log uncorrected errors */
3f2f0680d1161d arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 262 MCP_DONTLOG = BIT(2), /* only clear, don't log */
3bff147b187d5d arch/x86/include/asm/mce.h Borislav Petkov 2021-08-23 263 MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 264 };
5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 265
5b9d292ea87c83 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 266 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
b79109c3bbcf52 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 267
9ff36ee9668ff4 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 268 int mce_notify_irq(void);
e2f430291fe23a include/asm-x86/mce.h Thomas Gleixner 2007-10-17 269
ea149b36c7f511 arch/x86/include/asm/mce.h Andi Kleen 2009-04-29 270 DECLARE_PER_CPU(struct mce, injectm);
66f5ddf30a59f8 arch/x86/include/asm/mce.h Tony Luck 2011-11-03 271
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 272 /* Disable CMCI/polling for MCA bank claimed by firmware */
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 273 extern void mce_disable_bank(int bank);
c3d1fb567a634d arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 274
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 275 /*
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 276 * Exception handler
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 277 */
8cd501c1facc15 arch/x86/include/asm/mce.h Thomas Gleixner 2020-02-25 278 void do_machine_check(struct pt_regs *pt_regs);
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 279
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 280 /*
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 281 * Threshold handler
58995d2d58e8e5 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 282 */
b276268631af3a arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 283 extern void (*mce_threshold_vector)(void);
b276268631af3a arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 284
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 285 /* Deferred error interrupt handler */
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 286 extern void (*deferred_error_int_vector)(void);
24fd78a81f6d3f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 287
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 288 /*
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 289 * Used by APEI to report memory error via /dev/mcelog
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 290 */
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 291
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 292 struct cper_sec_mem_err;
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 293 extern void apei_mce_report_mem_error(int corrected,
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 294 struct cper_sec_mem_err *mem_err);
d334a49113a4a3 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 295
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 296 /*
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 297 * Enumerate new IP types and HWID values in AMD processors which support
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 298 * Scalable MCA.
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 299 */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 300 #ifdef CONFIG_X86_MCE_AMD
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 301
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 302 /* These may be used by multiple smca_hwid_mcatypes */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 303 enum smca_bank_types {
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 304 SMCA_LS = 0, /* Load Store */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 305 SMCA_LS_V2,
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 306 SMCA_IF, /* Instruction Fetch */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 307 SMCA_L2_CACHE, /* L2 Cache */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 308 SMCA_DE, /* Decoder Unit */
68627a697c1959 arch/x86/include/asm/mce.h Yazen Ghannam 2018-02-21 309 SMCA_RESERVED, /* Reserved */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 310 SMCA_EX, /* Execution Unit */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 311 SMCA_FP, /* Floating Point */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 312 SMCA_L3_CACHE, /* L3 Cache */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 313 SMCA_CS, /* Coherent Slave */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 314 SMCA_CS_V2,
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 315 SMCA_PIE, /* Power, Interrupts, etc. */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 316 SMCA_UMC, /* Unified Memory Controller */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 317 SMCA_UMC_V2,
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 318 SMCA_MA_LLC, /* Memory Attached Last Level Cache */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 319 SMCA_PB, /* Parameter Block */
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 320 SMCA_PSP, /* Platform Security Processor */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 321 SMCA_PSP_V2,
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 322 SMCA_SMU, /* System Management Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 323 SMCA_SMU_V2,
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 324 SMCA_MP5, /* Microprocessor 5 Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 325 SMCA_MPDMA, /* MPDMA Unit */
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 326 SMCA_NBIO, /* Northbridge IO Unit */
cbfa447edd6a38 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 327 SMCA_PCIE, /* PCI Express Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 328 SMCA_PCIE_V2,
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 329 SMCA_XGMI_PCS, /* xGMI PCS Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 330 SMCA_NBIF, /* NBIF Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 331 SMCA_SHUB, /* System HUB Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 332 SMCA_SATA, /* SATA Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 333 SMCA_USB, /* USB Unit */
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 334 SMCA_USR_DP, /* Ultra Short Reach Data Plane Controller */
47b744ea5e3cf8 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 335 SMCA_USR_CP, /* Ultra Short Reach Control Plane Controller */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 336 SMCA_GMI_PCS, /* GMI PCS Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 337 SMCA_XGMI_PHY, /* xGMI PHY Unit */
94a311ce248e0b arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 338 SMCA_WAFL_PHY, /* WAFL PHY Unit */
5176a93ab27aef arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 339 SMCA_GMI_PHY, /* GMI PHY Unit */
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 340 N_SMCA_BANK_TYPES
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 341 };
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 342
c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 343 extern bool amd_mce_is_memory_error(struct mce *m);
e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 344
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 345 extern int mce_threshold_create_device(unsigned int cpu);
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 346 extern int mce_threshold_remove_device(unsigned int cpu);
e71c3978d6f976 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 347
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 348 void mce_amd_feature_init(struct cpuinfo_x86 *c);
91f75eb481cfae arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 349 enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 350 #else
5896820e0aa325 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 351
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 352 static inline int mce_threshold_create_device(unsigned int cpu) { return 0; };
4d7b02d58c4000 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 353 static inline int mce_threshold_remove_device(unsigned int cpu) { return 0; };
c6708d50f166be arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 354 static inline bool amd_mce_is_memory_error(struct mce *m) { return false; };
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 355 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 356 #endif
be0aec23bf4624 arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 357
9308fd4074551f arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 @358 static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
e9c2a283e7d9d4 arch/x86/include/asm/mce.h Arnd Bergmann 2023-05-16 359
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
2024-08-09 7:31 ` kernel test robot
@ 2024-08-09 7:31 ` kernel test robot
2024-08-09 11:48 ` kernel test robot
2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-08-09 7:31 UTC (permalink / raw)
To: Shiyang Ruan, qemu-devel, linux-cxl, linux-edac, linux-mm,
dan.j.williams, vishal.l.verma, Jonathan.Cameron,
alison.schofield
Cc: llvm, oe-kbuild-all, bp, dave.jiang, dave, ira.weiny, james.morse,
linmiaohe, mchehab, nao.horiguchi, rric, tony.luck, ruansy.fnst
Hi Shiyang,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base: tip/x86/core
patch link: https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: um-allmodconfig (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/config)
compiler: clang version 20.0.0git (https://github.com/llvm/llvm-project f86594788ce93b696675c94f54016d27a6c21d18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091543.UNFvPFFl-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091543.UNFvPFFl-lkp@intel.com/
All error/warnings (new ones prefixed by >>):
In file included from drivers/cxl/core/mbox.c:3:
In file included from include/linux/security.h:33:
In file included from include/linux/mm.h:2228:
include/linux/vmstat.h:514:36: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
514 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_"
| ~~~~~~~~~~~ ^ ~~~
In file included from drivers/cxl/core/mbox.c:3:
In file included from include/linux/security.h:35:
In file included from include/linux/bpf.h:31:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:25:
In file included from include/linux/kernel_stat.h:8:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:14:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:548:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
548 | val = __raw_readb(PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:561:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
561 | val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu'
37 | #define __le16_to_cpu(x) ((__force __u16)(__le16)(x))
| ^
In file included from drivers/cxl/core/mbox.c:3:
In file included from include/linux/security.h:35:
In file included from include/linux/bpf.h:31:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:25:
In file included from include/linux/kernel_stat.h:8:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:14:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:574:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
574 | val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
| ~~~~~~~~~~ ^
include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu'
35 | #define __le32_to_cpu(x) ((__force __u32)(__le32)(x))
| ^
In file included from drivers/cxl/core/mbox.c:3:
In file included from include/linux/security.h:35:
In file included from include/linux/bpf.h:31:
In file included from include/linux/memcontrol.h:13:
In file included from include/linux/cgroup.h:25:
In file included from include/linux/kernel_stat.h:8:
In file included from include/linux/interrupt.h:11:
In file included from include/linux/hardirq.h:11:
In file included from arch/um/include/asm/hardirq.h:5:
In file included from include/asm-generic/hardirq.h:17:
In file included from include/linux/irq.h:20:
In file included from include/linux/io.h:14:
In file included from arch/um/include/asm/io.h:24:
include/asm-generic/io.h:585:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
585 | __raw_writeb(value, PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:595:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
595 | __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:605:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
605 | __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:693:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
693 | readsb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:701:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
701 | readsw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:709:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
709 | readsl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:718:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
718 | writesb(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:727:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
727 | writesw(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
include/asm-generic/io.h:736:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
736 | writesl(PCI_IOBASE + addr, buffer, count);
| ~~~~~~~~~~ ^
In file included from drivers/cxl/core/mbox.c:8:
>> arch/x86/include/asm/mce.h:219:43: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
219 | static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
| ^
arch/x86/include/asm/mce.h:220:44: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
220 | static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
| ^
arch/x86/include/asm/mce.h:240:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
240 | static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
| ^
arch/x86/include/asm/mce.h:241:51: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
241 | static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
| ^
arch/x86/include/asm/mce.h:248:26: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
248 | int mce_available(struct cpuinfo_x86 *c);
| ^
arch/x86/include/asm/mce.h:355:48: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
| ^
arch/x86/include/asm/mce.h:358:50: warning: declaration of 'struct cpuinfo_x86' will not be visible outside of this function [-Wvisibility]
358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
| ^
>> arch/x86/include/asm/mce.h:358:96: error: incompatible pointer types passing 'struct cpuinfo_x86 *' to parameter of type 'struct cpuinfo_x86 *' [-Werror,-Wincompatible-pointer-types]
358 | static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
| ^
arch/x86/include/asm/mce.h:355:61: note: passing argument to parameter 'c' here
355 | static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
| ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ~~~~~~~~~~~~~ ^
include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL'
37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
| ^
include/linux/bits.h:25:25: note: expanded from macro 'GENMASK_INPUT_CHECK'
25 | __is_constexpr((l) > (h)), (l) > (h), 0)))
| ^
include/linux/compiler.h:290:48: note: expanded from macro '__is_constexpr'
290 | (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
| ^
include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ~~~~~~~~~~~~~ ^
include/linux/bits.h:37:23: note: expanded from macro 'GENMASK_ULL'
37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
| ^
include/linux/bits.h:25:37: note: expanded from macro 'GENMASK_INPUT_CHECK'
25 | __is_constexpr((l) > (h)), (l) > (h), 0)))
| ^
include/linux/build_bug.h:16:62: note: expanded from macro 'BUILD_BUG_ON_ZERO'
16 | #define BUILD_BUG_ON_ZERO(e) ((int)(sizeof(struct { int:(-!!(e)); })))
| ^
>> drivers/cxl/core/mbox.c:1553:20: error: no member named 'x86_phys_bits' in 'struct cpuinfo_um'
1553 | hpa = mce->addr & MCI_ADDR_PHYSADDR;
| ^~~~~~~~~~~~~~~~~
arch/x86/include/asm/mce.h:94:53: note: expanded from macro 'MCI_ADDR_PHYSADDR'
94 | #define MCI_ADDR_PHYSADDR GENMASK_ULL(boot_cpu_data.x86_phys_bits - 1, 0)
| ~~~~~~~~~~~~~ ^
include/linux/bits.h:37:45: note: expanded from macro 'GENMASK_ULL'
37 | (GENMASK_INPUT_CHECK(h, l) + __GENMASK_ULL(h, l))
| ^
include/uapi/linux/bits.h:13:52: note: expanded from macro '__GENMASK_ULL'
13 | (~_ULL(0) >> (__BITS_PER_LONG_LONG - 1 - (h))))
| ^
20 warnings and 4 errors generated.
vim +358 arch/x86/include/asm/mce.h
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 210
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 211 #ifdef CONFIG_X86_MCE
a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang 2009-11-10 212 int mcheck_init(void);
5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 213 void mcheck_cpu_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 214 void mcheck_cpu_clear(struct cpuinfo_x86 *c);
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 215 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 216 u64 lapic_id);
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 217 #else
a2202aa29289db6 arch/x86/include/asm/mce.h Yong Wang 2009-11-10 218 static inline int mcheck_init(void) { return 0; }
5e09954a9acc3b4 arch/x86/include/asm/mce.h Borislav Petkov 2009-10-16 @219 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 220 static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 221 static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
4a24d80b8c3e9f8 arch/x86/include/asm/mce.h Smita Koralahalli 2020-11-19 222 u64 lapic_id) { return -EINVAL; }
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 223 #endif
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 224
b5f2fa4ea00a179 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 225 void mce_setup(struct mce *m);
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 226 void mce_log(struct mce *m);
d6126ef5f31ca54 arch/x86/include/asm/mce.h Greg Kroah-Hartman 2012-01-26 227 DECLARE_PER_CPU(struct device *, mce_device);
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 228
a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 229 /* Maximum number of MCA banks per CPU. */
a0bc32b3cacf194 arch/x86/include/asm/mce.h Akshay Gupta 2020-08-28 230 #define MAX_NR_BANKS 64
41fdff322e26c4a arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 231
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 232 #ifdef CONFIG_X86_MCE_INTEL
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 233 void mce_intel_feature_init(struct cpuinfo_x86 *c);
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 234 void mce_intel_feature_clear(struct cpuinfo_x86 *c);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 235 void cmci_clear(void);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 236 void cmci_reenable(void);
7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 237 void cmci_rediscover(void);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 238 void cmci_recheck(void);
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 239 #else
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 240 static inline void mce_intel_feature_init(struct cpuinfo_x86 *c) { }
8838eb6c0bf3b6a arch/x86/include/asm/mce.h Ashok Raj 2015-08-12 241 static inline void mce_intel_feature_clear(struct cpuinfo_x86 *c) { }
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 242 static inline void cmci_clear(void) {}
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 243 static inline void cmci_reenable(void) {}
7a0c819d28f5c91 arch/x86/include/asm/mce.h Srivatsa S. Bhat 2013-03-20 244 static inline void cmci_rediscover(void) {}
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 245 static inline void cmci_recheck(void) {}
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 246 #endif
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 247
38736072d45488f arch/x86/include/asm/mce.h H. Peter Anvin 2009-05-28 248 int mce_available(struct cpuinfo_x86 *c);
2d1f406139ec203 arch/x86/include/asm/mce.h Borislav Petkov 2017-05-19 249 bool mce_is_memory_error(struct mce *m);
5d96c9342c23ee1 arch/x86/include/asm/mce.h Vishal Verma 2018-10-25 250 bool mce_is_correctable(struct mce *m);
1bae0cfe4a171cc arch/x86/include/asm/mce.h Yazen Ghannam 2023-06-13 251 bool mce_usable_address(struct mce *m);
88ccbedd9ca85d1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 252
01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 253 DECLARE_PER_CPU(unsigned, mce_exception_count);
ca84f69697da0f0 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 254 DECLARE_PER_CPU(unsigned, mce_poll_count);
01ca79f1411eae2 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 255
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 256 typedef DECLARE_BITMAP(mce_banks_t, MAX_NR_BANKS);
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 257 DECLARE_PER_CPU(mce_banks_t, mce_poll_banks);
ee031c31d6381d0 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 258
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 259 enum mcp_flags {
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 260 MCP_TIMESTAMP = BIT(0), /* log time stamp */
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 261 MCP_UC = BIT(1), /* log uncorrected errors */
3f2f0680d1161df arch/x86/include/asm/mce.h Borislav Petkov 2015-01-13 262 MCP_DONTLOG = BIT(2), /* only clear, don't log */
3bff147b187d5df arch/x86/include/asm/mce.h Borislav Petkov 2021-08-23 263 MCP_QUEUE_LOG = BIT(3), /* only queue to genpool */
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 264 };
5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 265
5b9d292ea87c836 arch/x86/include/asm/mce.h Yazen Ghannam 2024-05-23 266 void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
b79109c3bbcf52c arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 267
9ff36ee9668ff41 arch/x86/include/asm/mce.h Andi Kleen 2009-05-27 268 int mce_notify_irq(void);
e2f430291fe23a4 include/asm-x86/mce.h Thomas Gleixner 2007-10-17 269
ea149b36c7f511d arch/x86/include/asm/mce.h Andi Kleen 2009-04-29 270 DECLARE_PER_CPU(struct mce, injectm);
66f5ddf30a59f81 arch/x86/include/asm/mce.h Tony Luck 2011-11-03 271
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 272 /* Disable CMCI/polling for MCA bank claimed by firmware */
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 273 extern void mce_disable_bank(int bank);
c3d1fb567a634dc arch/x86/include/asm/mce.h Naveen N Rao 2013-07-01 274
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 275 /*
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 276 * Exception handler
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 277 */
8cd501c1facc159 arch/x86/include/asm/mce.h Thomas Gleixner 2020-02-25 278 void do_machine_check(struct pt_regs *pt_regs);
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 279
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 280 /*
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 281 * Threshold handler
58995d2d58e8e55 arch/x86/include/asm/mce.h Hidetoshi Seto 2009-06-15 282 */
b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 283 extern void (*mce_threshold_vector)(void);
b276268631af3a1 arch/x86/include/asm/mce.h Andi Kleen 2009-02-12 284
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 285 /* Deferred error interrupt handler */
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 286 extern void (*deferred_error_int_vector)(void);
24fd78a81f6d3fe arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2015-05-06 287
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 288 /*
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 289 * Used by APEI to report memory error via /dev/mcelog
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 290 */
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 291
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 292 struct cper_sec_mem_err;
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 293 extern void apei_mce_report_mem_error(int corrected,
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 294 struct cper_sec_mem_err *mem_err);
d334a49113a4a33 arch/x86/include/asm/mce.h Huang Ying 2010-05-18 295
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 296 /*
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 297 * Enumerate new IP types and HWID values in AMD processors which support
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 298 * Scalable MCA.
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 299 */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 300 #ifdef CONFIG_X86_MCE_AMD
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 301
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 302 /* These may be used by multiple smca_hwid_mcatypes */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 303 enum smca_bank_types {
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 304 SMCA_LS = 0, /* Load Store */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 305 SMCA_LS_V2,
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 306 SMCA_IF, /* Instruction Fetch */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 307 SMCA_L2_CACHE, /* L2 Cache */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 308 SMCA_DE, /* Decoder Unit */
68627a697c19593 arch/x86/include/asm/mce.h Yazen Ghannam 2018-02-21 309 SMCA_RESERVED, /* Reserved */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 310 SMCA_EX, /* Execution Unit */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 311 SMCA_FP, /* Floating Point */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 312 SMCA_L3_CACHE, /* L3 Cache */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 313 SMCA_CS, /* Coherent Slave */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 314 SMCA_CS_V2,
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 315 SMCA_PIE, /* Power, Interrupts, etc. */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 316 SMCA_UMC, /* Unified Memory Controller */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 317 SMCA_UMC_V2,
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 318 SMCA_MA_LLC, /* Memory Attached Last Level Cache */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 319 SMCA_PB, /* Parameter Block */
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 320 SMCA_PSP, /* Platform Security Processor */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 321 SMCA_PSP_V2,
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 322 SMCA_SMU, /* System Management Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 323 SMCA_SMU_V2,
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 324 SMCA_MP5, /* Microprocessor 5 Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 325 SMCA_MPDMA, /* MPDMA Unit */
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 326 SMCA_NBIO, /* Northbridge IO Unit */
cbfa447edd6a382 arch/x86/include/asm/mce.h Yazen Ghannam 2019-02-01 327 SMCA_PCIE, /* PCI Express Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 328 SMCA_PCIE_V2,
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 329 SMCA_XGMI_PCS, /* xGMI PCS Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 330 SMCA_NBIF, /* NBIF Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 331 SMCA_SHUB, /* System HUB Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 332 SMCA_SATA, /* SATA Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 333 SMCA_USB, /* USB Unit */
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 334 SMCA_USR_DP, /* Ultra Short Reach Data Plane Controller */
47b744ea5e3cf85 arch/x86/include/asm/mce.h Muralidhara M K 2023-11-02 335 SMCA_USR_CP, /* Ultra Short Reach Control Plane Controller */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 336 SMCA_GMI_PCS, /* GMI PCS Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 337 SMCA_XGMI_PHY, /* xGMI PHY Unit */
94a311ce248e0b5 arch/x86/include/asm/mce.h Muralidhara M K 2021-05-26 338 SMCA_WAFL_PHY, /* WAFL PHY Unit */
5176a93ab27aef1 arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 339 SMCA_GMI_PHY, /* GMI PHY Unit */
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 340 N_SMCA_BANK_TYPES
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 341 };
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 342
c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 343 extern bool amd_mce_is_memory_error(struct mce *m);
e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 344
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 345 extern int mce_threshold_create_device(unsigned int cpu);
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 346 extern int mce_threshold_remove_device(unsigned int cpu);
e71c3978d6f9765 arch/x86/include/asm/mce.h Linus Torvalds 2016-12-12 347
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 348 void mce_amd_feature_init(struct cpuinfo_x86 *c);
91f75eb481cfaee arch/x86/include/asm/mce.h Yazen Ghannam 2021-12-16 349 enum smca_bank_types smca_get_bank_type(unsigned int cpu, unsigned int bank);
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 350 #else
5896820e0aa3257 arch/x86/include/asm/mce.h Yazen Ghannam 2016-09-12 351
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 352 static inline int mce_threshold_create_device(unsigned int cpu) { return 0; };
4d7b02d58c40005 arch/x86/include/asm/mce.h Sebastian Andrzej Siewior 2016-11-10 353 static inline int mce_threshold_remove_device(unsigned int cpu) { return 0; };
c6708d50f166bea arch/x86/include/asm/mce.h Yazen Ghannam 2017-12-18 354 static inline bool amd_mce_is_memory_error(struct mce *m) { return false; };
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 355 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 356 #endif
be0aec23bf4624f arch/x86/include/asm/mce.h Aravind Gopalakrishnan 2016-03-07 357
9308fd4074551f2 arch/x86/include/asm/mce.h Yazen Ghannam 2019-03-22 @358 static inline void mce_hygon_feature_init(struct cpuinfo_x86 *c) { return mce_amd_feature_init(c); }
e9c2a283e7d9d4e arch/x86/include/asm/mce.h Arnd Bergmann 2023-05-16 359
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
2024-08-09 7:31 ` kernel test robot
2024-08-09 7:31 ` kernel test robot
@ 2024-08-09 11:48 ` kernel test robot
2 siblings, 0 replies; 8+ messages in thread
From: kernel test robot @ 2024-08-09 11:48 UTC (permalink / raw)
To: Shiyang Ruan, qemu-devel, linux-cxl, linux-edac, linux-mm,
dan.j.williams, vishal.l.verma, Jonathan.Cameron,
alison.schofield
Cc: oe-kbuild-all, bp, dave.jiang, dave, ira.weiny, james.morse,
linmiaohe, mchehab, nao.horiguchi, rric, tony.luck, ruansy.fnst
Hi Shiyang,
kernel test robot noticed the following build warnings:
[auto build test WARNING on tip/x86/core]
[also build test WARNING on cxl/next linus/master v6.11-rc2 next-20240809]
[cannot apply to cxl/pending]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Shiyang-Ruan/cxl-core-introduce-device-reporting-poison-hanlding/20240809-013658
base: tip/x86/core
patch link: https://lore.kernel.org/r/20240808151328.707869-3-ruansy.fnst%40fujitsu.com
patch subject: [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device
config: x86_64-randconfig-121-20240809 (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240809/202408091914.TFbjPuNQ-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202408091914.TFbjPuNQ-lkp@intel.com/
sparse warnings: (new ones prefixed by >>)
>> drivers/cxl/core/mbox.c:1465:1: sparse: sparse: symbol 'cxl_mce_records' was not declared. Should it be static?
drivers/cxl/core/mbox.c: note: in included file (through include/linux/gfp.h, include/linux/xarray.h, include/linux/list_lru.h, ...):
include/linux/mmzone.h:2018:40: sparse: sparse: self-comparison always evaluates to false
vim +/cxl_mce_records +1465 drivers/cxl/core/mbox.c
1464
> 1465 DEFINE_XARRAY(cxl_mce_records);
1466
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding
2024-08-08 18:28 ` Fan Ni
@ 2024-08-21 13:57 ` Shiyang Ruan via
0 siblings, 0 replies; 8+ messages in thread
From: Shiyang Ruan via @ 2024-08-21 13:57 UTC (permalink / raw)
To: Fan Ni
Cc: qemu-devel, linux-cxl, linux-edac, linux-mm, dan.j.williams,
vishal.l.verma, Jonathan.Cameron, alison.schofield, bp,
dave.jiang, dave, ira.weiny, james.morse, linmiaohe, mchehab,
nao.horiguchi, rric, tony.luck
在 2024/8/9 2:28, Fan Ni 写道:
> On Thu, Aug 08, 2024 at 11:13:27PM +0800, Shiyang Ruan wrote:
>> CXL device can find&report memory problems, even before MCE is detected
>> by CPU. AFAIK, the current kernel only traces POISON error event
>> from FW-First/OS-First path, but it doesn't handle them, neither
>> notify processes who are using the POISON page like MCE does.
>>
>> Thus, user have to read logs from trace and find out which device
>> reported the error and which applications are affected. That is not
>> an easy work and cannot be handled in time. Thus, it is needed to add
>> the feature to make the work done automatically and quickly. Once CXL
>> device reports the POISON error (via FW-First/OS-First), kernel
>> handles it immediately, similar to the flow when a MCE is triggered.
>>
>> The current call trace of error reporting&handling looks like this:
>> ```
>> 1. MCE (interrupt #18, while CPU consuming POISON)
>> -> do_machine_check()
>> -> mce_log()
>> -> notify chain (x86_mce_decoder_chain)
>> -> memory_failure()
>>
>> 2.a FW-First (optional, CXL device proactively find&report)
>> -> CXL device -> Firmware
>> -> OS: ACPI->APEI->GHES->CPER -> CXL driver -> trace
>> \-> memory_failure()
>> ^----- ADD
>> 2.b OS-First (optional, CXL device proactively find&report)
>> -> CXL device -> MSI
>> -> OS: CXL driver -> trace
>> \-> memory_failure()
>> ^------------------------------- ADD
>> ```
>> This patch adds calling memory_failure() while CXL device reporting
>> error is received, marked as "ADD" in figure above.
>>
>> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
>> ---
>> drivers/cxl/core/mbox.c | 75 ++++++++++++++++++++++++++++++++-------
>> drivers/cxl/cxlmem.h | 8 ++---
>> drivers/cxl/pci.c | 4 +--
>> include/linux/cxl-event.h | 16 ++++++++-
>> 4 files changed, 83 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
>> index e5cdeafdf76e..0cb6ef2e6600 100644
>> --- a/drivers/cxl/core/mbox.c
>> +++ b/drivers/cxl/core/mbox.c
>> @@ -849,10 +849,55 @@ int cxl_enumerate_cmds(struct cxl_memdev_state *mds)
>> }
>> EXPORT_SYMBOL_NS_GPL(cxl_enumerate_cmds, CXL);
>>
>> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> - enum cxl_event_log_type type,
>> - enum cxl_event_type event_type,
>> - const uuid_t *uuid, union cxl_event *evt)
>> +static void cxl_report_poison(struct cxl_memdev *cxlmd, u64 hpa)
>> +{
>> + unsigned long pfn = PHYS_PFN(hpa);
>> +
>> + memory_failure_queue(pfn, 0);
>> +}
>> +
>> +static void cxl_event_handle_general_media(struct cxl_memdev *cxlmd,
>> + enum cxl_event_log_type type,
>> + u64 hpa,
>> + struct cxl_event_gen_media *rec)
>> +{
>> + if (type == CXL_EVENT_TYPE_FAIL) {
>> + switch (rec->media_hdr.transaction_type) {
>> + case CXL_EVENT_TRANSACTION_READ:
>> + case CXL_EVENT_TRANSACTION_WRITE:
>> + case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
>> + case CXL_EVENT_TRANSACTION_INJECT_POISON:
>> + cxl_report_poison(cxlmd, hpa);
>> + break;
>> + default:
>> + break;
>> + }
>> + }
>> +}
>> +
>> +static void cxl_event_handle_dram(struct cxl_memdev *cxlmd,
>> + enum cxl_event_log_type type,
>> + u64 hpa,
>> + struct cxl_event_dram *rec)
>> +{
>> + if (type == CXL_EVENT_TYPE_FAIL) {
>> + switch (rec->media_hdr.transaction_type) {
>> + case CXL_EVENT_TRANSACTION_READ:
>> + case CXL_EVENT_TRANSACTION_WRITE:
>> + case CXL_EVENT_TRANSACTION_SCAN_MEDIA:
>> + case CXL_EVENT_TRANSACTION_INJECT_POISON:
>> + cxl_report_poison(cxlmd, hpa);
>> + break;
>> + default:
>> + break;
>> + }
>> + }
>> +}
>> +
>> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> + enum cxl_event_log_type type,
>> + enum cxl_event_type event_type,
>> + const uuid_t *uuid, union cxl_event *evt)
>> {
>> if (event_type == CXL_CPER_EVENT_MEM_MODULE) {
>> trace_cxl_memory_module(cxlmd, type, &evt->mem_module);
>> @@ -880,18 +925,22 @@ void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> if (cxlr)
>> hpa = cxl_dpa_to_hpa(cxlr, cxlmd, dpa);
>>
>> - if (event_type == CXL_CPER_EVENT_GEN_MEDIA)
>> + if (event_type == CXL_CPER_EVENT_GEN_MEDIA) {
>> trace_cxl_general_media(cxlmd, type, cxlr, hpa,
>> &evt->gen_media);
>> - else if (event_type == CXL_CPER_EVENT_DRAM)
>> + cxl_event_handle_general_media(cxlmd, type, hpa,
>> + &evt->gen_media);
>> + } else if (event_type == CXL_CPER_EVENT_DRAM) {
>> trace_cxl_dram(cxlmd, type, cxlr, hpa, &evt->dram);
>> + cxl_event_handle_dram(cxlmd, type, hpa, &evt->dram);
>
> Does it make sense to call the trace function in
> cxl_event_handle_dram/general_media and replace the trace function with
> the handle_* here?
Sorry for late reply. I'm not really good at naming functions. Since
the trace functions already have the framework to deal with each kind of
uuids and event types, I don't think we should make another one for the
same logics. Thus, I reused it and renamed the functions. Maybe
"handle" isn't a good word to describe "tracing records and doing
memory_failure if necessary". Could you help me to name it better?
>
>> + }
>> }
>> }
>> -EXPORT_SYMBOL_NS_GPL(cxl_event_trace_record, CXL);
>> +EXPORT_SYMBOL_NS_GPL(cxl_event_handle_record, CXL);
>>
>> -static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> - enum cxl_event_log_type type,
>> - struct cxl_event_record_raw *record)
>> +static void __cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> + enum cxl_event_log_type type,
>> + struct cxl_event_record_raw *record)
>> {
>> enum cxl_event_type ev_type = CXL_CPER_EVENT_GENERIC;
>> const uuid_t *uuid = &record->id;
>> @@ -903,7 +952,7 @@ static void __cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> else if (uuid_equal(uuid, &CXL_EVENT_MEM_MODULE_UUID))
>> ev_type = CXL_CPER_EVENT_MEM_MODULE;
>>
>> - cxl_event_trace_record(cxlmd, type, ev_type, uuid, &record->event);
>> + cxl_event_handle_record(cxlmd, type, ev_type, uuid, &record->event);
>> }
>>
>> static int cxl_clear_event_record(struct cxl_memdev_state *mds,
>> @@ -1012,8 +1061,8 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>> break;
>>
>> for (i = 0; i < nr_rec; i++)
>> - __cxl_event_trace_record(cxlmd, type,
>> - &payload->records[i]);
>> + __cxl_event_handle_record(cxlmd, type,
>> + &payload->records[i]);
>>
>> if (payload->flags & CXL_GET_EVENT_FLAG_OVERFLOW)
>> trace_cxl_overflow(cxlmd, type, payload);
>> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
>> index afb53d058d62..5c4810dcbdeb 100644
>> --- a/drivers/cxl/cxlmem.h
>> +++ b/drivers/cxl/cxlmem.h
>> @@ -826,10 +826,10 @@ void set_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>> void clear_exclusive_cxl_commands(struct cxl_memdev_state *mds,
>> unsigned long *cmds);
>> void cxl_mem_get_event_records(struct cxl_memdev_state *mds, u32 status);
>> -void cxl_event_trace_record(const struct cxl_memdev *cxlmd,
>> - enum cxl_event_log_type type,
>> - enum cxl_event_type event_type,
>> - const uuid_t *uuid, union cxl_event *evt);
>> +void cxl_event_handle_record(struct cxl_memdev *cxlmd,
>> + enum cxl_event_log_type type,
>> + enum cxl_event_type event_type,
>> + const uuid_t *uuid, union cxl_event *evt);
>> int cxl_set_timestamp(struct cxl_memdev_state *mds);
>> int cxl_poison_state_init(struct cxl_memdev_state *mds);
>> int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
>> index 4be35dc22202..6e65ca89f666 100644
>> --- a/drivers/cxl/pci.c
>> +++ b/drivers/cxl/pci.c
>> @@ -1029,8 +1029,8 @@ static void cxl_handle_cper_event(enum cxl_event_type ev_type,
>> hdr_flags = get_unaligned_le24(rec->event.generic.hdr.flags);
>> log_type = FIELD_GET(CXL_EVENT_HDR_FLAGS_REC_SEVERITY, hdr_flags);
>>
>> - cxl_event_trace_record(cxlds->cxlmd, log_type, ev_type,
>> - &uuid_null, &rec->event);
>> + cxl_event_handle_record(cxlds->cxlmd, log_type, ev_type,
>> + &uuid_null, &rec->event);
>> }
>>
>> static void cxl_cper_work_fn(struct work_struct *work)
>> diff --git a/include/linux/cxl-event.h b/include/linux/cxl-event.h
>> index 0bea1afbd747..be4342a2b597 100644
>> --- a/include/linux/cxl-event.h
>> +++ b/include/linux/cxl-event.h
>> @@ -7,6 +7,20 @@
>> #include <linux/uuid.h>
>> #include <linux/workqueue_types.h>
>>
>> +/*
>> + * Event transaction type
>> + * CXL rev 3.0 Section 8.2.9.2.1.1; Table 8-43
>
> Here and below, update the specification reference to reflect cxl 3.1.
Ok. Will update it.
--
Thanks,
Ruan.
>
> Fan
>> + */
>> +enum cxl_event_transaction_type {
>> + CXL_EVENT_TRANSACTION_UNKNOWN = 0X00,
>> + CXL_EVENT_TRANSACTION_READ,
>> + CXL_EVENT_TRANSACTION_WRITE,
>> + CXL_EVENT_TRANSACTION_SCAN_MEDIA,
>> + CXL_EVENT_TRANSACTION_INJECT_POISON,
>> + CXL_EVENT_TRANSACTION_MEDIA_SCRUB,
>> + CXL_EVENT_TRANSACTION_MEDIA_MANAGEMENT,
>> +};
>> +
>> /*
>> * Common Event Record Format
>> * CXL rev 3.0 section 8.2.9.2.1; Table 8-42
>> @@ -26,7 +40,7 @@ struct cxl_event_media_hdr {
>> __le64 phys_addr;
>> u8 descriptor;
>> u8 type;
>> - u8 transaction_type;
>> + u8 transaction_type; /* enum cxl_event_transaction_type */
>> /*
>> * The meaning of Validity Flags from bit 2 is
>> * different across DRAM and General Media records
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-08-21 13:58 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-08 15:13 [PATCH v4 0/2] cxl: add device reporting poison handler Shiyang Ruan via
2024-08-08 15:13 ` [PATCH v4 1/2] cxl/core: introduce device reporting poison hanlding Shiyang Ruan via
2024-08-08 18:28 ` Fan Ni
2024-08-21 13:57 ` Shiyang Ruan via
2024-08-08 15:13 ` [PATCH v4 2/2] cxl: avoid duplicated report from MCE & device Shiyang Ruan via
2024-08-09 7:31 ` kernel test robot
2024-08-09 7:31 ` kernel test robot
2024-08-09 11:48 ` kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).