* [PATCH v2 0/5] Rework mce_setup()
@ 2024-06-24 21:20 Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID Yazen Ghannam
` (4 more replies)
0 siblings, 5 replies; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-24 21:20 UTC (permalink / raw)
To: linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam
Hi all,
Patches 1-2 are new in this revision. They address a comment from Boris
to reuse a new helper function from the recent APIC/topology rework.
Patches 3-4 are unchanged.
Patch 5 is updated with the changes from the new patches.
Thanks,
Yazen
Link:
https://lkml.kernel.org/r/20240521125434.1555845-1-yazen.ghannam@amd.com
Yazen Ghannam (5):
x86/topology: Export helper to get CPU number from APIC ID
x86/mce: Fixup APIC ID search for x86 CPER decoding
x86/mce: Rename mce_setup() to mce_prep_record()
x86/mce: Define mce_prep_record() helpers for common and per-CPU
fields
x86/mce: Use mce_prep_record() helpers for
apei_smca_report_x86_error()
arch/x86/include/asm/mce.h | 2 +-
arch/x86/include/asm/topology.h | 6 +++++
arch/x86/kernel/cpu/mce/amd.c | 2 +-
arch/x86/kernel/cpu/mce/apei.c | 23 +++++++-----------
arch/x86/kernel/cpu/mce/core.c | 38 ++++++++++++++++++++----------
arch/x86/kernel/cpu/mce/internal.h | 2 ++
arch/x86/kernel/cpu/topology.c | 15 +++++++++---
7 files changed, 55 insertions(+), 33 deletions(-)
base-commit: 2db56ea9b277e1790f6a4d8fdd949ab81ba34c76
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-06-24 21:20 [PATCH v2 0/5] Rework mce_setup() Yazen Ghannam
@ 2024-06-24 21:20 ` Yazen Ghannam
2024-06-25 6:50 ` Thomas Gleixner
2024-06-24 21:20 ` [PATCH v2 2/5] x86/mce: Fixup APIC ID search for x86 CPER decoding Yazen Ghannam
` (3 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-24 21:20 UTC (permalink / raw)
To: linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam
The need to look up a CPU number from an APIC ID is done in at least one
other place outside of APIC/topology code: apei_smca_report_x86_error().
With the recent topology rework, there is now a helper function to do
just this task.
Export the helper so other code can use it. Also, update the name to
match other exported topology_* helpers.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20240618172447.GA1387@yaz-khff2.amd.com
v1->v2:
* New in v2.
arch/x86/include/asm/topology.h | 6 ++++++
arch/x86/kernel/cpu/topology.c | 15 ++++++++++++---
2 files changed, 18 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h
index abe3a8f22cbd..d8c3c3c818bc 100644
--- a/arch/x86/include/asm/topology.h
+++ b/arch/x86/include/asm/topology.h
@@ -171,11 +171,17 @@ static inline unsigned int topology_num_threads_per_package(void)
#ifdef CONFIG_X86_LOCAL_APIC
int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level);
+int topology_get_cpunr(u32 apic_id);
#else
static inline int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level)
{
return 0;
}
+
+static inline int topology_get_cpunr(u32 apic_id)
+{
+ return -ENODEV;
+}
#endif
#ifdef CONFIG_SMP
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index 621a151ccf7d..fc74578ee3bd 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -95,7 +95,15 @@ static inline u32 topo_apicid(u32 apicid, enum x86_topology_domains dom)
return apicid & (UINT_MAX << x86_topo_system.dom_shifts[dom - 1]);
}
-static int topo_lookup_cpuid(u32 apic_id)
+/**
+ * topology_get_cpunr - Retrieve the CPU number for the given APIC ID
+ * @apic_id: The APIC ID for which to lookup the CPU number
+ *
+ * Returns:
+ * - >= 0: The CPU number for the given APIC ID
+ * - -ENODEV: @apic_id does not match any known CPU
+ */
+int topology_get_cpunr(u32 apic_id)
{
int i;
@@ -106,10 +114,11 @@ static int topo_lookup_cpuid(u32 apic_id)
}
return -ENODEV;
}
+EXPORT_SYMBOL_GPL(topology_get_cpunr);
static __init int topo_get_cpunr(u32 apic_id)
{
- int cpu = topo_lookup_cpuid(apic_id);
+ int cpu = topology_get_cpunr(apic_id);
if (cpu >= 0)
return cpu;
@@ -388,7 +397,7 @@ int topology_hotplug_apic(u32 apic_id, u32 acpi_id)
if (!test_bit(apic_id, apic_maps[TOPO_SMT_DOMAIN].map))
return -ENODEV;
- cpu = topo_lookup_cpuid(apic_id);
+ cpu = topology_get_cpunr(apic_id);
if (cpu < 0)
return -ENOSPC;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/5] x86/mce: Fixup APIC ID search for x86 CPER decoding
2024-06-24 21:20 [PATCH v2 0/5] Rework mce_setup() Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID Yazen Ghannam
@ 2024-06-24 21:20 ` Yazen Ghannam
2024-06-25 6:50 ` Thomas Gleixner
2024-06-24 21:20 ` [PATCH v2 3/5] x86/mce: Rename mce_setup() to mce_prep_record() Yazen Ghannam
` (2 subsequent siblings)
4 siblings, 1 reply; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-24 21:20 UTC (permalink / raw)
To: linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam
Replace redundant local search code with a call to an x86/topology
helper function.
Additionally, this now handles the case where a CPU match is not found.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20240618172447.GA1387@yaz-khff2.amd.com
v1->v2:
* New in v2.
arch/x86/kernel/cpu/mce/apei.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 7f7309ff67d0..59f1c9ea47f5 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -66,12 +66,16 @@ EXPORT_SYMBOL_GPL(apei_mce_report_mem_error);
int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
{
const u64 *i_mce = ((const u64 *) (ctx_info + 1));
- unsigned int cpu;
struct mce m;
+ int cpu;
if (!boot_cpu_has(X86_FEATURE_SMCA))
return -EINVAL;
+ cpu = topology_get_cpunr(lapic_id);
+ if (cpu < 0)
+ return cpu;
+
/*
* The starting address of the register array extracted from BERT must
* match with the first expected register in the register layout of
@@ -99,16 +103,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
mce_setup(&m);
- m.extcpu = -1;
- m.socketid = -1;
-
- for_each_possible_cpu(cpu) {
- if (cpu_data(cpu).topo.initial_apicid == lapic_id) {
- m.extcpu = cpu;
- m.socketid = cpu_data(m.extcpu).topo.pkg_id;
- break;
- }
- }
+ m.extcpu = cpu;
+ m.socketid = cpu_data(cpu).topo.pkg_id;
m.apicid = lapic_id;
m.bank = (ctx_info->msr_addr >> 4) & 0xFF;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/5] x86/mce: Rename mce_setup() to mce_prep_record()
2024-06-24 21:20 [PATCH v2 0/5] Rework mce_setup() Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 2/5] x86/mce: Fixup APIC ID search for x86 CPER decoding Yazen Ghannam
@ 2024-06-24 21:20 ` Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 4/5] x86/mce: Define mce_prep_record() helpers for common and per-CPU fields Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 5/5] x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error() Yazen Ghannam
4 siblings, 0 replies; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-24 21:20 UTC (permalink / raw)
To: linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam, Borislav Petkov
There is no MCE "setup" done in mce_setup(). Rather this function
initializes and prepares an MCE record.
Rename the function to highlight what it does.
No functional change is intended.
Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20240521125434.1555845-2-yazen.ghannam@amd.com
v1->v2:
* No change.
arch/x86/include/asm/mce.h | 2 +-
arch/x86/kernel/cpu/mce/amd.c | 2 +-
arch/x86/kernel/cpu/mce/apei.c | 4 ++--
arch/x86/kernel/cpu/mce/core.c | 6 +++---
4 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 3ad29b128943..3b9970117a0f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -221,7 +221,7 @@ static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
u64 lapic_id) { return -EINVAL; }
#endif
-void mce_setup(struct mce *m);
+void mce_prep_record(struct mce *m);
void mce_log(struct mce *m);
DECLARE_PER_CPU(struct device *, mce_device);
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9a0133ef7e20..14bf8c232e45 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -780,7 +780,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
{
struct mce m;
- mce_setup(&m);
+ mce_prep_record(&m);
m.status = status;
m.misc = misc;
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 59f1c9ea47f5..8e8ad63da5b6 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -44,7 +44,7 @@ void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_err)
else
lsb = PAGE_SHIFT;
- mce_setup(&m);
+ mce_prep_record(&m);
m.bank = -1;
/* Fake a memory read error with unknown channel */
m.status = MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STATUS_MISCV | 0x9f;
@@ -101,7 +101,7 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
if (ctx_info->reg_arr_size < 48)
return -EINVAL;
- mce_setup(&m);
+ mce_prep_record(&m);
m.extcpu = cpu;
m.socketid = cpu_data(cpu).topo.pkg_id;
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b85ec7a4ec9e..dd5192ef52e0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -118,7 +118,7 @@ static struct irq_work mce_irq_work;
BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
/* Do initial initialization of a struct mce */
-void mce_setup(struct mce *m)
+void mce_prep_record(struct mce *m)
{
memset(m, 0, sizeof(struct mce));
m->cpu = m->extcpu = smp_processor_id();
@@ -436,11 +436,11 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
static noinstr void mce_gather_info(struct mce *m, struct pt_regs *regs)
{
/*
- * Enable instrumentation around mce_setup() which calls external
+ * Enable instrumentation around mce_prep_record() which calls external
* facilities.
*/
instrumentation_begin();
- mce_setup(m);
+ mce_prep_record(m);
instrumentation_end();
m->mcgstatus = mce_rdmsrl(MSR_IA32_MCG_STATUS);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/5] x86/mce: Define mce_prep_record() helpers for common and per-CPU fields
2024-06-24 21:20 [PATCH v2 0/5] Rework mce_setup() Yazen Ghannam
` (2 preceding siblings ...)
2024-06-24 21:20 ` [PATCH v2 3/5] x86/mce: Rename mce_setup() to mce_prep_record() Yazen Ghannam
@ 2024-06-24 21:20 ` Yazen Ghannam
2024-06-25 13:19 ` Nikolay Borisov
2024-06-24 21:20 ` [PATCH v2 5/5] x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error() Yazen Ghannam
4 siblings, 1 reply; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-24 21:20 UTC (permalink / raw)
To: linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam
Generally, MCA information for an error is gathered on the CPU that
reported the error. In this case, CPU-specific information from the
running CPU will be correct.
However, this will be incorrect if the MCA information is gathered while
running on a CPU that didn't report the error. One example is creating
an MCA record using mce_prep_record() for errors reported from ACPI.
Split mce_prep_record() so that there is a helper function to gather
common, i.e. not CPU-specific, information and another helper for
CPU-specific information.
Leave mce_prep_record() defined as-is for the common case when running
on the reporting CPU.
Get MCG_CAP in the global helper even though the register is per-CPU.
This value is not already cached per-CPU like other values. And it does
not assist with any per-CPU decoding or handling.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20240521125434.1555845-3-yazen.ghannam@amd.com
v1->v2:
* No change.
arch/x86/kernel/cpu/mce/core.c | 34 ++++++++++++++++++++----------
arch/x86/kernel/cpu/mce/internal.h | 2 ++
2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index dd5192ef52e0..0133f88dfffb 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -117,20 +117,32 @@ static struct irq_work mce_irq_work;
*/
BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
-/* Do initial initialization of a struct mce */
-void mce_prep_record(struct mce *m)
+void mce_prep_record_common(struct mce *m)
{
memset(m, 0, sizeof(struct mce));
- m->cpu = m->extcpu = smp_processor_id();
+
+ m->cpuid = cpuid_eax(1);
+ m->cpuvendor = boot_cpu_data.x86_vendor;
+ m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
/* need the internal __ version to avoid deadlocks */
- m->time = __ktime_get_real_seconds();
- m->cpuvendor = boot_cpu_data.x86_vendor;
- m->cpuid = cpuid_eax(1);
- m->socketid = cpu_data(m->extcpu).topo.pkg_id;
- m->apicid = cpu_data(m->extcpu).topo.initial_apicid;
- m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
- m->ppin = cpu_data(m->extcpu).ppin;
- m->microcode = boot_cpu_data.microcode;
+ m->time = __ktime_get_real_seconds();
+}
+
+void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m)
+{
+ m->cpu = cpu;
+ m->extcpu = cpu;
+ m->apicid = cpu_data(m->extcpu).topo.initial_apicid;
+ m->microcode = cpu_data(m->extcpu).microcode;
+ m->ppin = cpu_data(m->extcpu).ppin;
+ m->socketid = cpu_data(m->extcpu).topo.pkg_id;
+}
+
+/* Do initial initialization of a struct mce */
+void mce_prep_record(struct mce *m)
+{
+ mce_prep_record_common(m);
+ mce_prep_record_per_cpu(smp_processor_id(), m);
}
DEFINE_PER_CPU(struct mce, injectm);
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 01f8f03969e6..43c7f3b71df5 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -261,6 +261,8 @@ enum mca_msr {
/* Decide whether to add MCE record to MCE event pool or filter it out. */
extern bool filter_mce(struct mce *m);
+void mce_prep_record_common(struct mce *m);
+void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m);
#ifdef CONFIG_X86_MCE_AMD
extern bool amd_filter_mce(struct mce *m);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 5/5] x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error()
2024-06-24 21:20 [PATCH v2 0/5] Rework mce_setup() Yazen Ghannam
` (3 preceding siblings ...)
2024-06-24 21:20 ` [PATCH v2 4/5] x86/mce: Define mce_prep_record() helpers for common and per-CPU fields Yazen Ghannam
@ 2024-06-24 21:20 ` Yazen Ghannam
4 siblings, 0 replies; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-24 21:20 UTC (permalink / raw)
To: linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam
Current AMD systems can report MCA errors using the ACPI Boot Error
Record Table (BERT). The BERT entries for MCA errors will be an x86
Common Platform Error Record (CPER) with an MSR register context that
matches the MCAX/SMCA register space.
However, the BERT will not necessarily be processed on the CPU that
reported the MCA errors. Therefore, the correct CPU number needs to be
determined and the information saved in struct mce.
Use the newly defined mce_prep_record_*() helpers to get the correct
data.
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lkml.kernel.org/r/20240521125434.1555845-4-yazen.ghannam@amd.com
v1->v2:
* Rebased on new changes from patches 1 and 2.
arch/x86/kernel/cpu/mce/apei.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c
index 8e8ad63da5b6..7e60db2addec 100644
--- a/arch/x86/kernel/cpu/mce/apei.c
+++ b/arch/x86/kernel/cpu/mce/apei.c
@@ -101,12 +101,9 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
if (ctx_info->reg_arr_size < 48)
return -EINVAL;
- mce_prep_record(&m);
-
- m.extcpu = cpu;
- m.socketid = cpu_data(cpu).topo.pkg_id;
+ mce_prep_record_common(&m);
+ mce_prep_record_per_cpu(cpu, &m);
- m.apicid = lapic_id;
m.bank = (ctx_info->msr_addr >> 4) & 0xFF;
m.status = *i_mce;
m.addr = *(i_mce + 1);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-06-24 21:20 ` [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID Yazen Ghannam
@ 2024-06-25 6:50 ` Thomas Gleixner
2024-06-26 1:42 ` Yazen Ghannam
0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2024-06-25 6:50 UTC (permalink / raw)
To: Yazen Ghannam, linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam
On Mon, Jun 24 2024 at 16:20, Yazen Ghannam wrote:
> The need to look up a CPU number from an APIC ID is done in at least one
> other place outside of APIC/topology code:
> apei_smca_report_x86_error().
The need .... is done?
> #ifdef CONFIG_X86_LOCAL_APIC
> int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level);
> +int topology_get_cpunr(u32 apic_id);
> #else
> static inline int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level)
> {
> return 0;
> }
> +
> +static inline int topology_get_cpunr(u32 apic_id)
> +{
> + return -ENODEV;
Why ENODEV and not 0?
> +}
> #endif
Thanks,
tglx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/5] x86/mce: Fixup APIC ID search for x86 CPER decoding
2024-06-24 21:20 ` [PATCH v2 2/5] x86/mce: Fixup APIC ID search for x86 CPER decoding Yazen Ghannam
@ 2024-06-25 6:50 ` Thomas Gleixner
2024-06-26 1:44 ` Yazen Ghannam
0 siblings, 1 reply; 18+ messages in thread
From: Thomas Gleixner @ 2024-06-25 6:50 UTC (permalink / raw)
To: Yazen Ghannam, linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen,
Yazen Ghannam
On Mon, Jun 24 2024 at 16:20, Yazen Ghannam wrote:
> /*
> * The starting address of the register array extracted from BERT must
> * match with the first expected register in the register layout of
> @@ -99,16 +103,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
>
> mce_setup(&m);
>
> - m.extcpu = -1;
> - m.socketid = -1;
> -
> - for_each_possible_cpu(cpu) {
> - if (cpu_data(cpu).topo.initial_apicid == lapic_id) {
> - m.extcpu = cpu;
> - m.socketid = cpu_data(m.extcpu).topo.pkg_id;
> - break;
> - }
> - }
> + m.extcpu = cpu;
> + m.socketid = cpu_data(cpu).topo.pkg_id;
topology_physical_package_id() ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/5] x86/mce: Define mce_prep_record() helpers for common and per-CPU fields
2024-06-24 21:20 ` [PATCH v2 4/5] x86/mce: Define mce_prep_record() helpers for common and per-CPU fields Yazen Ghannam
@ 2024-06-25 13:19 ` Nikolay Borisov
2024-06-26 1:45 ` Yazen Ghannam
0 siblings, 1 reply; 18+ messages in thread
From: Nikolay Borisov @ 2024-06-25 13:19 UTC (permalink / raw)
To: Yazen Ghannam, linux-edac
Cc: linux-kernel, tony.luck, x86, avadhut.naik, john.allen
On 25.06.24 г. 0:20 ч., Yazen Ghannam wrote:
> Generally, MCA information for an error is gathered on the CPU that
> reported the error. In this case, CPU-specific information from the
> running CPU will be correct.
>
> However, this will be incorrect if the MCA information is gathered while
> running on a CPU that didn't report the error. One example is creating
> an MCA record using mce_prep_record() for errors reported from ACPI.
>
> Split mce_prep_record() so that there is a helper function to gather
> common, i.e. not CPU-specific, information and another helper for
> CPU-specific information.
>
> Leave mce_prep_record() defined as-is for the common case when running
> on the reporting CPU.
>
> Get MCG_CAP in the global helper even though the register is per-CPU.
> This value is not already cached per-CPU like other values. And it does
> not assist with any per-CPU decoding or handling.
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link:
> https://lkml.kernel.org/r/20240521125434.1555845-3-yazen.ghannam@amd.com
>
> v1->v2:
> * No change.
>
> arch/x86/kernel/cpu/mce/core.c | 34 ++++++++++++++++++++----------
> arch/x86/kernel/cpu/mce/internal.h | 2 ++
> 2 files changed, 25 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index dd5192ef52e0..0133f88dfffb 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -117,20 +117,32 @@ static struct irq_work mce_irq_work;
> */
> BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain);
>
> -/* Do initial initialization of a struct mce */
> -void mce_prep_record(struct mce *m)
> +void mce_prep_record_common(struct mce *m)
> {
> memset(m, 0, sizeof(struct mce));
> - m->cpu = m->extcpu = smp_processor_id();
> +
> + m->cpuid = cpuid_eax(1);
> + m->cpuvendor = boot_cpu_data.x86_vendor;
> + m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
> /* need the internal __ version to avoid deadlocks */
> - m->time = __ktime_get_real_seconds();
> - m->cpuvendor = boot_cpu_data.x86_vendor;
> - m->cpuid = cpuid_eax(1);
> - m->socketid = cpu_data(m->extcpu).topo.pkg_id;
> - m->apicid = cpu_data(m->extcpu).topo.initial_apicid;
> - m->mcgcap = __rdmsr(MSR_IA32_MCG_CAP);
> - m->ppin = cpu_data(m->extcpu).ppin;
> - m->microcode = boot_cpu_data.microcode;
> + m->time = __ktime_get_real_seconds();
> +}
> +
> +void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m)
> +{
> + m->cpu = cpu;
> + m->extcpu = cpu;
> + m->apicid = cpu_data(m->extcpu).topo.initial_apicid;
> + m->microcode = cpu_data(m->extcpu).microcode;
> + m->ppin = cpu_data(m->extcpu).ppin;
nit: Similar to tglx's feedback for patch 2 you could use topology_ppin()
> + m->socketid = cpu_data(m->extcpu).topo.pkg_id;
nit: topology_physical_package_id()
> +}
> +
<snip>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-06-25 6:50 ` Thomas Gleixner
@ 2024-06-26 1:42 ` Yazen Ghannam
2024-06-28 8:37 ` Borislav Petkov
0 siblings, 1 reply; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-26 1:42 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-edac, linux-kernel, tony.luck, x86, avadhut.naik,
john.allen
On Tue, Jun 25, 2024 at 08:50:13AM +0200, Thomas Gleixner wrote:
> On Mon, Jun 24 2024 at 16:20, Yazen Ghannam wrote:
>
> > The need to look up a CPU number from an APIC ID is done in at least one
> > other place outside of APIC/topology code:
> > apei_smca_report_x86_error().
>
> The need .... is done?
Hmm, yeah the wording isn't clear. I'll rephrase it.
>
> > #ifdef CONFIG_X86_LOCAL_APIC
> > int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level);
> > +int topology_get_cpunr(u32 apic_id);
> > #else
> > static inline int topology_get_logical_id(u32 apicid, enum x86_topology_domains at_level)
> > {
> > return 0;
> > }
> > +
> > +static inline int topology_get_cpunr(u32 apic_id)
> > +{
> > + return -ENODEV;
>
> Why ENODEV and not 0?
>
Since '0' is a valid CPU number, my first thought was that we should
return an error code if we can't explicitly find the correct CPU number.
But would/could we have SMP support without X86_LOCAL_APIC? If not, then
we'd only have CPU 0 in any case. Is this why the function above returns
0?
I can make the change to return 0 here also. But...
The MCE code depends on X86_LOCAL_APIC, and it is the only external
user. So should I drop the config check and export?
Thanks,
Yazen
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/5] x86/mce: Fixup APIC ID search for x86 CPER decoding
2024-06-25 6:50 ` Thomas Gleixner
@ 2024-06-26 1:44 ` Yazen Ghannam
0 siblings, 0 replies; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-26 1:44 UTC (permalink / raw)
To: Thomas Gleixner
Cc: linux-edac, linux-kernel, tony.luck, x86, avadhut.naik,
john.allen
On Tue, Jun 25, 2024 at 08:50:47AM +0200, Thomas Gleixner wrote:
> On Mon, Jun 24 2024 at 16:20, Yazen Ghannam wrote:
> > /*
> > * The starting address of the register array extracted from BERT must
> > * match with the first expected register in the register layout of
> > @@ -99,16 +103,8 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id)
> >
> > mce_setup(&m);
> >
> > - m.extcpu = -1;
> > - m.socketid = -1;
> > -
> > - for_each_possible_cpu(cpu) {
> > - if (cpu_data(cpu).topo.initial_apicid == lapic_id) {
> > - m.extcpu = cpu;
> > - m.socketid = cpu_data(m.extcpu).topo.pkg_id;
> > - break;
> > - }
> > - }
> > + m.extcpu = cpu;
> > + m.socketid = cpu_data(cpu).topo.pkg_id;
>
> topology_physical_package_id() ?
>
Yes, will change.
Thanks,
Yazen
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/5] x86/mce: Define mce_prep_record() helpers for common and per-CPU fields
2024-06-25 13:19 ` Nikolay Borisov
@ 2024-06-26 1:45 ` Yazen Ghannam
0 siblings, 0 replies; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-26 1:45 UTC (permalink / raw)
To: Nikolay Borisov
Cc: linux-edac, linux-kernel, tony.luck, x86, avadhut.naik,
john.allen
On Tue, Jun 25, 2024 at 04:19:40PM +0300, Nikolay Borisov wrote:
[...]
> > +
> > +void mce_prep_record_per_cpu(unsigned int cpu, struct mce *m)
> > +{
> > + m->cpu = cpu;
> > + m->extcpu = cpu;
> > + m->apicid = cpu_data(m->extcpu).topo.initial_apicid;
> > + m->microcode = cpu_data(m->extcpu).microcode;
> > + m->ppin = cpu_data(m->extcpu).ppin;
>
> nit: Similar to tglx's feedback for patch 2 you could use topology_ppin()
>
> > + m->socketid = cpu_data(m->extcpu).topo.pkg_id;
> nit: topology_physical_package_id()
>
>
Yes, will update both.
Thanks,
Yazen
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-06-26 1:42 ` Yazen Ghannam
@ 2024-06-28 8:37 ` Borislav Petkov
2024-06-28 14:15 ` Yazen Ghannam
0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2024-06-28 8:37 UTC (permalink / raw)
To: Yazen Ghannam
Cc: Thomas Gleixner, linux-edac, linux-kernel, tony.luck, x86,
avadhut.naik, john.allen
On Tue, Jun 25, 2024 at 09:42:12PM -0400, Yazen Ghannam wrote:
> But would/could we have SMP support without X86_LOCAL_APIC?
Look at how X86_LOCAL_APIC is defined in Kconfig.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-06-28 8:37 ` Borislav Petkov
@ 2024-06-28 14:15 ` Yazen Ghannam
2024-06-28 14:45 ` Borislav Petkov
0 siblings, 1 reply; 18+ messages in thread
From: Yazen Ghannam @ 2024-06-28 14:15 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, linux-edac, linux-kernel, tony.luck, x86,
avadhut.naik, john.allen
On Fri, Jun 28, 2024 at 10:37:22AM +0200, Borislav Petkov wrote:
> On Tue, Jun 25, 2024 at 09:42:12PM -0400, Yazen Ghannam wrote:
> > But would/could we have SMP support without X86_LOCAL_APIC?
>
> Look at how X86_LOCAL_APIC is defined in Kconfig.
>
Thanks for the tip. It looks to me that SMP and X86_LOCAL_APIC are
generally independent.
Thanks,
Yazen
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-06-28 14:15 ` Yazen Ghannam
@ 2024-06-28 14:45 ` Borislav Petkov
2024-07-01 17:51 ` Yazen Ghannam
0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2024-06-28 14:45 UTC (permalink / raw)
To: Yazen Ghannam
Cc: Thomas Gleixner, linux-edac, linux-kernel, tony.luck, x86,
avadhut.naik, john.allen
On Fri, Jun 28, 2024 at 10:15:42AM -0400, Yazen Ghannam wrote:
> Thanks for the tip. It looks to me that SMP and X86_LOCAL_APIC are
> generally independent.
They are?
config X86_LOCAL_APIC
def_bool y
depends on X86_64 || SMP
^^^^^^^ ^^^
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-06-28 14:45 ` Borislav Petkov
@ 2024-07-01 17:51 ` Yazen Ghannam
2024-07-01 19:07 ` Borislav Petkov
0 siblings, 1 reply; 18+ messages in thread
From: Yazen Ghannam @ 2024-07-01 17:51 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, linux-edac, linux-kernel, tony.luck, x86,
avadhut.naik, john.allen
On Fri, Jun 28, 2024 at 04:45:35PM +0200, Borislav Petkov wrote:
> On Fri, Jun 28, 2024 at 10:15:42AM -0400, Yazen Ghannam wrote:
> > Thanks for the tip. It looks to me that SMP and X86_LOCAL_APIC are
> > generally independent.
>
> They are?
>
> config X86_LOCAL_APIC
> def_bool y
> depends on X86_64 || SMP
> ^^^^^^^ ^^^
>
Right, SMP does not depend on X86_LOCAL_APIC. Otherwise, there would be
a circular dependency here.
X86_LOCAL_APIC depends on the logical OR of a bunch of options. So it
depends on "any one" of the options to be enabled. But it doesn't need
all of them.
config UP_LATE_INIT
def_bool y
depends on !SMP && X86_LOCAL_APIC
^^^^^^^^^^^^
This shows that X86_LOCAL_APIC doesn't have a hard dependency on SMP.
Otherwise, this option would never work.
Thanks,
Yazen
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-07-01 17:51 ` Yazen Ghannam
@ 2024-07-01 19:07 ` Borislav Petkov
2024-07-12 14:28 ` Yazen Ghannam
0 siblings, 1 reply; 18+ messages in thread
From: Borislav Petkov @ 2024-07-01 19:07 UTC (permalink / raw)
To: Yazen Ghannam
Cc: Thomas Gleixner, linux-edac, linux-kernel, tony.luck, x86,
avadhut.naik, john.allen
On Mon, Jul 01, 2024 at 01:51:42PM -0400, Yazen Ghannam wrote:
> X86_LOCAL_APIC depends on the logical OR of a bunch of options. So it
> depends on "any one" of the options to be enabled. But it doesn't need
> all of them.
Yes, I can see that.
How does any of that info answer your initial question?
IOW, if you don't have LAPIC support in the kernel, what CPU number should we
return for any APIC ID serving as input, and why?
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID
2024-07-01 19:07 ` Borislav Petkov
@ 2024-07-12 14:28 ` Yazen Ghannam
0 siblings, 0 replies; 18+ messages in thread
From: Yazen Ghannam @ 2024-07-12 14:28 UTC (permalink / raw)
To: Borislav Petkov
Cc: Thomas Gleixner, linux-edac, linux-kernel, tony.luck, x86,
avadhut.naik, john.allen
On Mon, Jul 01, 2024 at 09:07:04PM +0200, Borislav Petkov wrote:
> On Mon, Jul 01, 2024 at 01:51:42PM -0400, Yazen Ghannam wrote:
> > X86_LOCAL_APIC depends on the logical OR of a bunch of options. So it
> > depends on "any one" of the options to be enabled. But it doesn't need
> > all of them.
>
> Yes, I can see that.
>
> How does any of that info answer your initial question?
>
> IOW, if you don't have LAPIC support in the kernel, what CPU number should we
> return for any APIC ID serving as input, and why?
>
I still think it should return an error code, because theoretically
LAPIC can be disabled and SMP can be enabled.
But I spent some time trying to see if this would work in practice, and
it looks like you can't disable X86_LOCAL_APIC without hitting a bunch
of build errors on x86_64. It seems like a lot of common APIC and SMP
code implicitly depends on X86_LOCAL_APIC. This was true even for a tiny
config. However, this worked for an i386 build (with SMP=n).
I think the most practical option is to keep the local search routine in
mce/apei. I don't think all the additional complexity is worth it for a
simple for-loop.
Regarding the X86_LOCAL_APIC=n build issues, should these be
investigated?
Thanks,
Yazen
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-07-12 14:28 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-24 21:20 [PATCH v2 0/5] Rework mce_setup() Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 1/5] x86/topology: Export helper to get CPU number from APIC ID Yazen Ghannam
2024-06-25 6:50 ` Thomas Gleixner
2024-06-26 1:42 ` Yazen Ghannam
2024-06-28 8:37 ` Borislav Petkov
2024-06-28 14:15 ` Yazen Ghannam
2024-06-28 14:45 ` Borislav Petkov
2024-07-01 17:51 ` Yazen Ghannam
2024-07-01 19:07 ` Borislav Petkov
2024-07-12 14:28 ` Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 2/5] x86/mce: Fixup APIC ID search for x86 CPER decoding Yazen Ghannam
2024-06-25 6:50 ` Thomas Gleixner
2024-06-26 1:44 ` Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 3/5] x86/mce: Rename mce_setup() to mce_prep_record() Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 4/5] x86/mce: Define mce_prep_record() helpers for common and per-CPU fields Yazen Ghannam
2024-06-25 13:19 ` Nikolay Borisov
2024-06-26 1:45 ` Yazen Ghannam
2024-06-24 21:20 ` [PATCH v2 5/5] x86/mce: Use mce_prep_record() helpers for apei_smca_report_x86_error() Yazen Ghannam
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox