From: Borislav Petkov <bp@suse.de>
To: James Feeney <james@nurealm.net>
Cc: linux-smp@vger.kernel.org, Jens Axboe <axboe@kernel.dk>,
lkml <linux-kernel@vger.kernel.org>
Subject: Re: linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single
Date: Sun, 23 May 2021 19:05:14 +0200 [thread overview]
Message-ID: <YKqLSqIM7Gi5x+IA@zn.tnic> (raw)
In-Reply-To: <373464e3-b8a0-0fe0-b890-41df0eecf090@nurealm.net>
On Sat, May 22, 2021 at 05:28:42PM -0600, James Feeney wrote:
> Out of curiosity, I added "pr_info("%s : mcheck_intel_therm_init()
> use to run here", __func__);" to __init mcheck_init() in
> arch/x86/kernel/cpu/mce/core.c, just to see *when* it *would have*
> run, compared to when it is running now.
Yes, this is probably the only timing-sensitive difference this patch of
mine introduces. So who knows what's in that LVT APIC register earlier
and *maybe* reading it later might give us different values.
So do the same game again pls, but this time apply the patch below. It
is practically a revert of my patch and with it, your box should *not*
freeze anymore *if* my patch really is the culprit.
Thx.
---
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index ddfb3cad8dff..5ac8b827bc12 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -296,6 +296,12 @@ struct cper_sec_mem_err;
extern void apei_mce_report_mem_error(int corrected,
struct cper_sec_mem_err *mem_err);
+#ifdef CONFIG_X86_THERMAL_VECTOR
+extern void mcheck_intel_therm_init(void);
+#else
+static inline void mcheck_intel_therm_init(void) { }
+#endif
+
/*
* Enumerate new IP types and HWID values in AMD processors which support
* Scalable MCA.
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bf7fe87a7e88..ded20b8612fe 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2190,6 +2190,7 @@ __setup("mce", mcheck_enable);
int __init mcheck_init(void)
{
+ mcheck_intel_therm_init();
mce_register_decode_chain(&early_nb);
mce_register_decode_chain(&mce_uc_nb);
mce_register_decode_chain(&mce_default_nb);
diff --git a/drivers/thermal/intel/therm_throt.c b/drivers/thermal/intel/therm_throt.c
index f8e882592ba5..0ebd2386839f 100644
--- a/drivers/thermal/intel/therm_throt.c
+++ b/drivers/thermal/intel/therm_throt.c
@@ -621,19 +621,30 @@ bool x86_thermal_enabled(void)
return atomic_read(&therm_throt_en);
}
+void __init mcheck_intel_therm_init(void)
+{
+ /*
+ * This function is only called on boot CPU. Save the init thermal
+ * LVT value on BSP and use that value to restore APs' thermal LVT
+ * entry BIOS programmed later
+ */
+ if (intel_thermal_supported(&boot_cpu_data)) {
+ lvtthmr_init = apic_read(APIC_LVTTHMR);
+ pr_info("%s: lvtthmr_init: 0x%x\n", __func__, lvtthmr_init);
+ } else {
+ pr_info("%s: !intel_thermal_supported\n", __func__);
+ }
+}
+
void intel_init_thermal(struct cpuinfo_x86 *c)
{
unsigned int cpu = smp_processor_id();
int tm2 = 0;
- u32 l, h;
+ u32 l, h, tmp = -1;
if (!intel_thermal_supported(c))
return;
- /* On the BSP? */
- if (c == &boot_cpu_data)
- lvtthmr_init = apic_read(APIC_LVTTHMR);
-
/*
* First check if its enabled already, in which case there might
* be some SMM goo which handles it, so we can't even put a handler
@@ -652,13 +663,17 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
* BIOS has programmed on AP based on BSP's info we saved since BIOS
* is always setting the same value for all threads/cores.
*/
- if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED)
+ if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED) {
apic_write(APIC_LVTTHMR, lvtthmr_init);
+ tmp = apic_read(APIC_LVTTHMR);
+ }
+ pr_info("%s: CPU%d, lvtthmr_init: 0x%x, read: 0x%x, misc_enable (low): 0x%x\n",
+ __func__, cpu, lvtthmr_init, tmp, l);
if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) {
if (system_state == SYSTEM_BOOTING)
- pr_debug("CPU%d: Thermal monitoring handled by SMI\n", cpu);
+ pr_info("CPU%d: Thermal monitoring handled by SMI\n", cpu);
return;
}
--
Regards/Gruss,
Boris.
SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg
next prev parent reply other threads:[~2021-05-23 17:05 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-17 8:13 linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single James Feeney
2021-05-17 8:32 ` Borislav Petkov
2021-05-19 3:58 ` James Feeney
2021-05-19 11:12 ` Borislav Petkov
2021-05-19 20:03 ` James Feeney
2021-05-19 21:18 ` Borislav Petkov
2021-05-20 3:12 ` James Feeney
2021-05-20 9:21 ` Borislav Petkov
2021-05-21 22:11 ` James Feeney
2021-05-22 9:06 ` Borislav Petkov
2021-05-22 23:28 ` James Feeney
2021-05-22 23:28 ` James Feeney
2021-05-23 17:05 ` Borislav Petkov [this message]
2021-05-23 23:02 ` James Feeney
2021-05-24 7:51 ` Borislav Petkov
2021-05-25 4:02 ` James Feeney
2021-05-27 10:31 ` [PATCH] x86/thermal: Fix LVT thermal setup for SMI delivery mode Borislav Petkov
2021-05-27 11:49 ` Thomas Gleixner
2021-05-27 11:56 ` Borislav Petkov
2021-05-27 18:54 ` Borislav Petkov
2021-05-28 8:23 ` Thomas Gleixner
2021-05-28 11:19 ` Borislav Petkov
2021-05-31 18:26 ` James Feeney
2021-05-27 18:09 ` Srinivas Pandruvada
2021-05-27 19:01 ` Borislav Petkov
2021-05-27 20:28 ` Srinivas Pandruvada
2021-05-28 7:05 ` James Feeney
2021-05-31 21:46 ` [tip: x86/urgent] " tip-bot2 for Borislav Petkov
-- strict thread matches above, loose matches on Subject: below --
2021-05-14 18:39 linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single James Feeney
2021-05-17 12:27 ` Christoph Hellwig
2021-05-19 15:50 ` James Feeney
2021-05-03 9:44 James Feeney
2021-04-30 17:04 James Feeney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YKqLSqIM7Gi5x+IA@zn.tnic \
--to=bp@suse.de \
--cc=axboe@kernel.dk \
--cc=james@nurealm.net \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-smp@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.