public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [patch V2 0/8] x86/smp: Cure stop_other_cpus() and kexec() troubles
@ 2023-06-13 12:17 Thomas Gleixner
  2023-06-13 12:17 ` [patch V2 1/8] x86/smp: Make stop_other_cpus() more robust Thomas Gleixner
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Thomas Gleixner @ 2023-06-13 12:17 UTC (permalink / raw)
  To: LKML
  Cc: x86, Mario Limonciello, Tom Lendacky, Tony Battersby, Ashok Raj,
	Tony Luck, Arjan van de Veen, Eric Biederman

This is the second version of the kexec() vs. mwait_play_dead()
series. Version 1 can be found here:

  https://lore.kernel.org/r/20230603193439.502645149@linutronix.de

Aside of picking up the correction of the original patch 5 this also
integrates a fix for intermittend reboot hangs reported by Tony:

  https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com

which touches the same area. While halfways independent I added them here
as these changes conflict nicely.

So the two issues are:

  1) stop_other_cpus() continues after observing num_online_cpus() == 1.
     
     This is problematic because the to be stopped CPUs clear their online
     bit first and then invoke eventually WBINVD, which can take a long
     time. There seems to be an interaction between the WBINVD and the
     reboot mechanics as this intermittendly results in hangs.

  2) kexec() kernel can overwrite the memory locations which "offline" CPUs
     are monitoring. This write brings them out of MWAIT and they resume
     execution on overwritten text, page tables, data and stacks resulting
     in triple faults.

Cure them by:

  #1 Synchronizing stop_other_cpus() with an atomic variable which is
     decremented in stop_this_cpu() _after_ WBINVD completes.

  #2 Bringing offline CPUs out of MWAIT and move them into HLT before
     starting the kexec() kernel. Optionaly send them an INIT IPI so they
     go back into wait for startup state.

The series is also available from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/kexec

Thanks,

	tglx
---
 include/asm/cpu.h |    2 
 include/asm/smp.h |    4 +
 kernel/process.c  |   16 +++++
 kernel/smp.c      |   79 ++++++++++++++++++----------
 kernel/smpboot.c  |  149 ++++++++++++++++++++++++++++++++++++++++--------------
 5 files changed, 183 insertions(+), 67 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-06-15 10:57 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-13 12:17 [patch V2 0/8] x86/smp: Cure stop_other_cpus() and kexec() troubles Thomas Gleixner
2023-06-13 12:17 ` [patch V2 1/8] x86/smp: Make stop_other_cpus() more robust Thomas Gleixner
2023-06-14 19:42   ` Ashok Raj
2023-06-14 19:53     ` Thomas Gleixner
2023-06-14 20:47       ` Ashok Raj
2023-06-14 22:40         ` Thomas Gleixner
2023-06-13 12:17 ` [patch V2 2/8] x86/smp: Dont access non-existing CPUID leaf Thomas Gleixner
2023-06-13 12:17 ` [patch V2 3/8] x86/smp: Remove pointless wmb() from native_stop_other_cpus() Thomas Gleixner
2023-06-15  8:58   ` Peter Zijlstra
2023-06-15 10:57     ` Thomas Gleixner
2023-06-13 12:17 ` [patch V2 4/8] x86/smp: Acquire stopping_cpu unconditionally Thomas Gleixner
2023-06-15  9:02   ` Peter Zijlstra
2023-06-13 12:18 ` [patch V2 5/8] x86/smp: Use dedicated cache-line for mwait_play_dead() Thomas Gleixner
2023-06-13 12:18 ` [patch V2 6/8] x86/smp: Cure kexec() vs. mwait_play_dead() breakage Thomas Gleixner
2023-06-13 12:18 ` [patch V2 7/8] x86/smp: Split sending INIT IPI out into a helper function Thomas Gleixner
2023-06-13 12:18 ` [patch V2 8/8] x86/smp: Put CPUs into INIT on shutdown if possible Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox