From: Thomas Gleixner <tglx@linutronix.de>
To: David Woodhouse <dwmw2@infradead.org>, Ming Lei <ming.lei@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
Jason Wang <jasowang@redhat.com>,
"x86@kernel.org" <x86@kernel.org>, hpa <hpa@zytor.com>,
dyoung <dyoung@redhat.com>, kexec <kexec@lists.infradead.org>,
linux-ext4 <linux-ext4@vger.kernel.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>,
eperezma <eperezma@redhat.com>,
Paolo Bonzini <bonzini@redhat.com>,
Petr Mladek <pmladek@suse.com>,
John Ogness <jogness@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: Lockdep warnings on kexec (virtio_blk, hrtimers)
Date: Fri, 13 Dec 2024 18:05:15 +0100 [thread overview]
Message-ID: <87msgz8qes.ffs@tglx> (raw)
In-Reply-To: <1a7c126f3ab8ae75e755d01a6bf9bc06730dd239.camel@infradead.org>
On Fri, Dec 13 2024 at 14:07, David Woodhouse wrote:
> On Fri, 2024-12-13 at 14:23 +0100, Thomas Gleixner wrote:
>> That's only true for the case where the new kernel takes over.
>>
>> In the case KEXEC_JUMP=n and kexec_image->preserve_context == true, then
>> it is supposed to align with suspend/resume and if you look at the code
>> then it actually mimics suspend/resume in the most dilettanteish way.
>
> Did you mean KEXEC_JUMP=y there?
Yes, of course.
> I spent a while the other week trying to understand the case where
> CONFIG_KEXEC_JUMP=n and kexec_image->preserve_context=true, and came to
> the conclusion that it was a mirage. Userspace can't *actually* set the
> KEXEC_PRESERVE_CONTEXT bit when setting up the image, if KEXEC_JUMP=n.
>
> The whole of the code path for that case is dead code. It's confusing
> because as discussed elsewhere, we don't just #ifdef out the whole of
> that dead code path, but only the bits which don't actually *compile*
> (like references to restore_processor_state() etc.).
Yes, I had to stare at it quite a while. :)
>> It's a patently bad idea to clobber the kernel with kexec jump "fixes"
>> instead of using the well tested and established suspend/resume
>> machinery.
>>
>> All it takes is to:
>>
>> 1) disable the wakeup logic
>>
>> 2) provide a mechanism to invoke machine_kexec() instead of the
>> actual suspend mechanism.
>>
>> No?
>
> Agreed. The hacky proof of concept I posted earlier invoking
> machine_kexec() instead of suspend_ops->enter() works fine. I'll look
> at cleaning it up and making it not invoke all the ACPI hooks for
> *actual* suspend to RAM, etc.
Something like the below? It survived an hour of loop testing.
> As I noted though, it doesn't address that linux-scsi report which was
> a *real* kexec, not a kjump.
I was not looking at that path at all.
Thanks,
tglx
---
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -582,4 +582,7 @@ enum suspend_stat_step {
void dpm_save_failed_dev(const char *name);
void dpm_save_failed_step(enum suspend_stat_step step);
+int kexec_suspend(void);
+void kexec_machine_execute(void);
+
#endif /* _LINUX_SUSPEND_H */
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -984,6 +984,12 @@ bool kexec_load_permitted(int kexec_imag
return true;
}
+void kexec_machine_execute(void)
+{
+ kmsg_dump(KMSG_DUMP_SHUTDOWN);
+ machine_kexec(kexec_image);
+}
+
/*
* Move into place and start executing a preloaded standalone
* executable. If nothing was preloaded return an error.
@@ -999,38 +1005,9 @@ int kernel_kexec(void)
goto Unlock;
}
-#ifdef CONFIG_KEXEC_JUMP
- if (kexec_image->preserve_context) {
- pm_prepare_console();
- error = freeze_processes();
- if (error) {
- error = -EBUSY;
- goto Restore_console;
- }
- suspend_console();
- error = dpm_suspend_start(PMSG_FREEZE);
- if (error)
- goto Resume_console;
- /* At this point, dpm_suspend_start() has been called,
- * but *not* dpm_suspend_end(). We *must* call
- * dpm_suspend_end() now. Otherwise, drivers for
- * some devices (e.g. interrupt controllers) become
- * desynchronized with the actual state of the
- * hardware at resume time, and evil weirdness ensues.
- */
- error = dpm_suspend_end(PMSG_FREEZE);
- if (error)
- goto Resume_devices;
- error = suspend_disable_secondary_cpus();
- if (error)
- goto Enable_cpus;
- local_irq_disable();
- error = syscore_suspend();
- if (error)
- goto Enable_irqs;
- } else
-#endif
- {
+ if (IS_ENABLED(CONFIG_KEXEC_JUMP) && kexec_image->preserve_context) {
+ error = kexec_suspend();
+ } else {
kexec_in_progress = true;
kernel_restart_prepare("kexec reboot");
migrate_to_reboot_cpu();
@@ -1045,30 +1022,10 @@ int kernel_kexec(void)
cpu_hotplug_enable();
pr_notice("Starting new kernel\n");
machine_shutdown();
+ kexec_machine_execute();
}
- kmsg_dump(KMSG_DUMP_SHUTDOWN);
- machine_kexec(kexec_image);
-
-#ifdef CONFIG_KEXEC_JUMP
- if (kexec_image->preserve_context) {
- syscore_resume();
- Enable_irqs:
- local_irq_enable();
- Enable_cpus:
- suspend_enable_secondary_cpus();
- dpm_resume_start(PMSG_RESTORE);
- Resume_devices:
- dpm_resume_end(PMSG_RESTORE);
- Resume_console:
- resume_console();
- thaw_processes();
- Restore_console:
- pm_restore_console();
- }
-#endif
-
- Unlock:
+Unlock:
kexec_unlock();
return error;
}
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -400,7 +400,7 @@ void __weak arch_suspend_enable_irqs(voi
*
* This function should be called after devices have been suspended.
*/
-static int suspend_enter(suspend_state_t state, bool *wakeup)
+static int suspend_enter(suspend_state_t state, bool *wakeup, bool kexec_jump)
{
int error;
@@ -445,15 +445,19 @@ static int suspend_enter(suspend_state_t
error = syscore_suspend();
if (!error) {
- *wakeup = pm_wakeup_pending();
- if (!(suspend_test(TEST_CORE) || *wakeup)) {
- trace_suspend_resume(TPS("machine_suspend"),
- state, true);
- error = suspend_ops->enter(state);
- trace_suspend_resume(TPS("machine_suspend"),
- state, false);
- } else if (*wakeup) {
- error = -EBUSY;
+ if (IS_ENABLED(CONFIG_KEXEC_JUMP) && kexec_jump) {
+ kexec_machine_execute();
+ } else {
+ *wakeup = pm_wakeup_pending();
+ if (!(suspend_test(TEST_CORE) || *wakeup)) {
+ trace_suspend_resume(TPS("machine_suspend"),
+ state, true);
+ error = suspend_ops->enter(state);
+ trace_suspend_resume(TPS("machine_suspend"),
+ state, false);
+ } else if (*wakeup) {
+ error = -EBUSY;
+ }
}
syscore_resume();
}
@@ -485,7 +489,7 @@ static int suspend_enter(suspend_state_t
* suspend_devices_and_enter - Suspend devices and enter system sleep state.
* @state: System sleep state to enter.
*/
-int suspend_devices_and_enter(suspend_state_t state)
+static int __suspend_devices_and_enter(suspend_state_t state, bool kexec_jump)
{
int error;
bool wakeup = false;
@@ -514,7 +518,7 @@ int suspend_devices_and_enter(suspend_st
goto Recover_platform;
do {
- error = suspend_enter(state, &wakeup);
+ error = suspend_enter(state, &wakeup, kexec_jump);
} while (!error && !wakeup && platform_suspend_again(state));
Resume_devices:
@@ -536,6 +540,15 @@ int suspend_devices_and_enter(suspend_st
}
/**
+ * suspend_devices_and_enter - Suspend devices and enter system sleep state.
+ * @state: System sleep state to enter.
+ */
+int suspend_devices_and_enter(suspend_state_t state)
+{
+ return __suspend_devices_and_enter(state, false);
+}
+
+/**
* suspend_finish - Clean up before finishing the suspend sequence.
*
* Call platform code to clean up, restart processes, and free the console that
@@ -607,6 +620,21 @@ static int enter_state(suspend_state_t s
return error;
}
+#ifdef CONFIG_KEXEC_JUMP
+int kexec_suspend(void)
+{
+ int error;
+
+ ksys_sync_helper();
+ error = freeze_processes();
+ if (error)
+ return error;
+ error = __suspend_devices_and_enter(PM_SUSPEND_MEM, true);
+ thaw_processes();
+ return error;
+}
+#endif
+
/**
* pm_suspend - Externally visible function for suspending the system.
* @state: System sleep state to enter.
next prev parent reply other threads:[~2024-12-13 17:05 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-09 14:28 Lockdep warnings on kexec (virtio_blk, hrtimers) David Woodhouse
2024-12-10 1:56 ` Jason Wang
2024-12-11 12:42 ` Stefan Hajnoczi
2024-12-12 11:07 ` David Woodhouse
2024-12-12 13:34 ` Thomas Gleixner
2024-12-12 13:46 ` David Woodhouse
2024-12-12 18:04 ` Thomas Gleixner
2024-12-12 19:19 ` David Woodhouse
2024-12-13 0:14 ` Thomas Gleixner
2024-12-13 9:31 ` David Woodhouse
2024-12-13 9:43 ` David Woodhouse
2024-12-13 10:42 ` Thomas Gleixner
2024-12-13 11:09 ` Ming Lei
2024-12-13 11:31 ` Thomas Gleixner
2024-12-13 11:48 ` Ming Lei
2024-12-13 13:23 ` Thomas Gleixner
2024-12-13 14:07 ` David Woodhouse
2024-12-13 17:05 ` Thomas Gleixner [this message]
2024-12-13 17:17 ` David Woodhouse
2024-12-13 17:48 ` Rafael J. Wysocki
2024-12-13 17:32 ` Rafael J. Wysocki
2024-12-13 19:06 ` Rafael J. Wysocki
2024-12-13 20:16 ` David Woodhouse
2024-12-14 9:57 ` David Woodhouse
2024-12-16 12:14 ` Rafael J. Wysocki
2024-12-13 17:59 ` Rafael J. Wysocki
2024-12-13 13:17 ` David Woodhouse
2024-12-13 11:12 ` David Woodhouse
2024-12-13 11:33 ` Ming Lei
2024-12-13 11:20 ` Peter Zijlstra
2024-12-13 13:13 ` Thomas Gleixner
2024-12-16 13:20 ` [PATCH] sched: Prevent rescheduling when interrupts are disabled Thomas Gleixner
2024-12-16 17:41 ` David Woodhouse
2024-12-12 11:12 ` Lockdep warnings on kexec (virtio_blk, hrtimers) Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87msgz8qes.ffs@tglx \
--to=tglx@linutronix.de \
--cc=axboe@kernel.dk \
--cc=bonzini@redhat.com \
--cc=dwmw2@infradead.org \
--cc=dyoung@redhat.com \
--cc=eperezma@redhat.com \
--cc=hpa@zytor.com \
--cc=jasowang@redhat.com \
--cc=jogness@linutronix.de \
--cc=kexec@lists.infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=mst@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rafael@kernel.org \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).