From: Thomas Gleixner <tglx@linutronix.de>
To: Jan Kiszka <jan.kiszka@siemens.com>,
Henning Schild <henning.schild@siemens.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
x86@kernel.org, linux-kernel@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Guenter Roeck <linux@roeck-us.net>,
xenomai@xenomai.org, guocai.he.cn@windriver.com
Subject: Re: sched: Unexpected reschedule of offline CPU#2!
Date: Tue, 03 Sep 2024 17:27:58 +0200 [thread overview]
Message-ID: <8734mg92pt.ffs@tglx> (raw)
In-Reply-To: <745f219e-1593-4fbd-fa7f-1719ef6f444d@siemens.com>
On Tue, Jul 27 2021 at 10:46, Jan Kiszka wrote:
Picking up this dead thread again.
> What is supposed to prevent the following in mainline:
>
> CPU 0 CPU 1 CPU 2
>
> native_stop_other_cpus <INTERRUPT>
> send_IPI_allbutself ...
> <INTERRUPT>
> sysvec_reboot
> stop_this_cpu
> set_cpu_online(false)
> native_smp_send_reschedule(1)
> if (cpu_is_offline(1)) ...
Nothing. And that's what probably happens if I read the stack trace
correctly.
But we can be slightly smarter about this for the reboot IPI (the NMI
case does not have that issue).
CPU 0 CPU 1 CPU 2
native_stop_other_cpus <INTERRUPT>
send_IPI_allbutself ...
<IPI>
sysvec_reboot
wait_for_others();
</INTERRUPT>
<IPI>
sysvec_reboot
wait_for_others();
stop_this_cpu(); stop_this_cpu();
set_cpu_online(false); set_cpu_online(false);
Something like the uncompiled below.
Thanks,
tglx
---
--- a/arch/x86/include/asm/cpu.h
+++ b/arch/x86/include/asm/cpu.h
@@ -68,5 +68,6 @@ bool intel_find_matching_signature(void
int intel_microcode_sanity_check(void *mc, bool print_err, int hdr_type);
extern struct cpumask cpus_stop_mask;
+atomic_t cpus_stop_in_ipi;
#endif /* _ASM_X86_CPU_H */
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -721,7 +721,7 @@ bool xen_set_default_idle(void);
#define xen_set_default_idle 0
#endif
-void __noreturn stop_this_cpu(void *dummy);
+void __noreturn stop_this_cpu(bool sync);
void microcode_check(struct cpuinfo_x86 *prev_info);
void store_cpu_caps(struct cpuinfo_x86 *info);
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -791,9 +791,10 @@ bool xen_set_default_idle(void)
}
#endif
+atomic_t cpus_stop_in_ipi;
struct cpumask cpus_stop_mask;
-void __noreturn stop_this_cpu(void *dummy)
+void __noreturn stop_this_cpu(bool sync)
{
struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info);
unsigned int cpu = smp_processor_id();
@@ -801,6 +802,16 @@ void __noreturn stop_this_cpu(void *dumm
local_irq_disable();
/*
+ * Account this CPU and loop until the other CPUs reached this
+ * point. If they don't react, the control CPU will raise an NMI.
+ */
+ if (sync) {
+ atomic_dec(&cpus_stop_in_ipi);
+ while (atomic_read(&cpus_stop_in_ipi))
+ cpu_relax();
+ }
+
+ /*
* Remove this CPU from the online mask and disable it
* unconditionally. This might be redundant in case that the reboot
* vector was handled late and stop_other_cpus() sent an NMI.
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -788,7 +788,7 @@ static void native_machine_halt(void)
tboot_shutdown(TB_SHUTDOWN_HALT);
- stop_this_cpu(NULL);
+ stop_this_cpu(false);
}
static void native_machine_power_off(void)
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -125,7 +125,7 @@ static int smp_stop_nmi_callback(unsigne
return NMI_HANDLED;
cpu_emergency_disable_virtualization();
- stop_this_cpu(NULL);
+ stop_this_cpu(false);
return NMI_HANDLED;
}
@@ -137,7 +137,7 @@ DEFINE_IDTENTRY_SYSVEC(sysvec_reboot)
{
apic_eoi();
cpu_emergency_disable_virtualization();
- stop_this_cpu(NULL);
+ stop_this_cpu(true);
}
static int register_stop_handler(void)
@@ -189,6 +189,7 @@ static void native_stop_other_cpus(int w
*/
cpumask_copy(&cpus_stop_mask, cpu_online_mask);
cpumask_clear_cpu(this_cpu, &cpus_stop_mask);
+ atomic_set(&cpus_stop_in_ipi, num_online_cpus() - 1);
if (!cpumask_empty(&cpus_stop_mask)) {
apic_send_IPI_allbutself(REBOOT_VECTOR);
@@ -235,10 +236,12 @@ static void native_stop_other_cpus(int w
local_irq_restore(flags);
/*
- * Ensure that the cpus_stop_mask cache lines are invalidated on
- * the other CPUs. See comment vs. SME in stop_this_cpu().
+ * Ensure that the cpus_stop_mask and cpus_stop_in_ipi cache lines
+ * are invalidated on the other CPUs. See comment vs. SME in
+ * stop_this_cpu().
*/
cpumask_clear(&cpus_stop_mask);
+ atomic_set(&cpus_stop_in_ipi, 0);
}
/*
next prev parent reply other threads:[~2024-09-03 15:28 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-07-27 16:44 sched: Unexpected reschedule of offline CPU#2! Guenter Roeck
2019-07-29 9:35 ` Peter Zijlstra
2019-07-29 9:58 ` Thomas Gleixner
2019-07-29 10:13 ` Peter Zijlstra
2019-07-29 10:38 ` Thomas Gleixner
2019-07-29 10:47 ` Peter Zijlstra
2019-07-29 20:50 ` Guenter Roeck
2019-08-16 10:22 ` Thomas Gleixner
2019-08-16 19:32 ` Guenter Roeck
2019-08-17 20:21 ` Thomas Gleixner
2021-07-27 8:00 ` Henning Schild
2021-07-27 8:46 ` Jan Kiszka
2024-09-03 6:15 ` guocai.he.cn
2024-09-03 15:27 ` Thomas Gleixner [this message]
2024-09-04 7:46 ` guocai he
2024-09-18 1:50 ` My branch is v5.2/standard/preempt-rt/intel-x86 and I make a patch according guocai.he.cn
2024-09-18 2:59 ` [PATCH] patch for poweroff guocai.he.cn
2025-07-09 13:44 ` sched: Unexpected reschedule of offline CPU#2! Phil Auld
2025-07-19 21:17 ` Thomas Gleixner
2025-07-20 10:47 ` Thomas Gleixner
2025-07-20 14:14 ` Guenter Roeck
2025-07-28 13:13 ` Phil Auld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8734mg92pt.ffs@tglx \
--to=tglx@linutronix.de \
--cc=bp@alien8.de \
--cc=guocai.he.cn@windriver.com \
--cc=henning.schild@siemens.com \
--cc=jan.kiszka@siemens.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux@roeck-us.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=x86@kernel.org \
--cc=xenomai@xenomai.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox