public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] kexec: Force kexec to proceed under heavy deadline load
@ 2025-10-22 12:13 Pingfan Liu
  2025-10-22 12:13 ` [RFC 1/3] sched/deadline: Skip the deadline bandwidth check if kexec_in_progress Pingfan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Pingfan Liu @ 2025-10-22 12:13 UTC (permalink / raw)
  To: kexec, linux-kernel
  Cc: Pingfan Liu, Waiman Long, Peter Zijlstra, Juri Lelli,
	Pierre Gondois, Andrew Morton, Baoquan He, Ingo Molnar,
	Vincent Guittot, Dietmar Eggemann, Steven Rostedt,
	Valentin Schneider, Rafael J. Wysocki, Joel Granados

During discussion of the scheduler deadline bug [1], Pierre Gondois
pointed out a potential issue during kexec: as CPUs are unplugged, the
available DL bandwidth of the root domain gradually decreases. At some
point, insufficient bandwidth triggers an overflow detection, causing
CPU hot-removal to fail and kexec to hang.[2]
    
I reproduced it on a system with 160 cpus with the following command
  seq 10 | xargs -I{} -P10 sh -c 'chrt -d -T 1000000 -P 1000000 0 yes > /dev/null &'
  kexec -e

The system hang during the kexec process.
 
This series skips the DL bandwidth check, migrates the task from dying
CPU directly to the kexec CPU, and promotes the kexec to DL task. By
this way, the heavy deadline load will not starve the CPU hot-removal
kthread so that kexec task can move on.

In contrast to this series, an alternative aggressive approach is to
send SIGKILL to all DL tasks at the beginning of the kexec process.
Let us discuss how to resolve this issue.


[1]: https://lore.kernel.org/all/20250929133602.32462-1-piliu@redhat.com/
[2]: https://lore.kernel.org/all/3408aca5-e6c9-434a-9950-82e9147fcbba@arm.com/

Pingfan Liu (3):
  sched/deadline: Skip the deadline bandwidth check if kexec_in_progress
  kernel/cpu: Mark nonboot cpus as inactive when shutting down nonboot
    cpus
  kexec_core: Promote the kexec to DL task

 kernel/cpu.c            | 10 ++++++++++
 kernel/kexec_core.c     | 28 ++++++++++++++++++++++++++++
 kernel/sched/deadline.c |  7 +++++++
 3 files changed, 45 insertions(+)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-10-29 13:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-22 12:13 [RFC 0/3] kexec: Force kexec to proceed under heavy deadline load Pingfan Liu
2025-10-22 12:13 ` [RFC 1/3] sched/deadline: Skip the deadline bandwidth check if kexec_in_progress Pingfan Liu
2025-10-22 12:13 ` [RFC 2/3] kernel/cpu: Mark nonboot cpus as inactive when shutting down nonboot cpus Pingfan Liu
2025-10-27 17:06   ` Thomas Gleixner
2025-10-28  2:51     ` Pingfan Liu
2025-10-28 12:59       ` Thomas Gleixner
2025-10-29 11:36         ` Pingfan Liu
2025-10-29 12:13           ` Thomas Gleixner
2025-10-29 13:39             ` Pingfan Liu
2025-10-22 12:13 ` [RFC 3/3] kexec_core: Promote the kexec to DL task Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox