From: Thomas Gleixner <tglx@linutronix.de>
To: David Woodhouse <dwmw2@infradead.org>, Ming Lei <ming.lei@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
Jason Wang <jasowang@redhat.com>,
"x86@kernel.org" <x86@kernel.org>, hpa <hpa@zytor.com>,
dyoung <dyoung@redhat.com>, kexec <kexec@lists.infradead.org>,
linux-ext4 <linux-ext4@vger.kernel.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>,
eperezma <eperezma@redhat.com>,
Paolo Bonzini <bonzini@redhat.com>,
Petr Mladek <pmladek@suse.com>,
John Ogness <jogness@linutronix.de>,
Peter Zijlstra <peterz@infradead.org>,
Jens Axboe <axboe@kernel.dk>,
"Rafael J. Wysocki" <rafael@kernel.org>
Subject: Re: Lockdep warnings on kexec (virtio_blk, hrtimers)
Date: Fri, 13 Dec 2024 18:05:15 +0100 [thread overview]
Message-ID: <87msgz8qes.ffs@tglx> (raw)
In-Reply-To: <1a7c126f3ab8ae75e755d01a6bf9bc06730dd239.camel@infradead.org>
On Fri, Dec 13 2024 at 14:07, David Woodhouse wrote:
> On Fri, 2024-12-13 at 14:23 +0100, Thomas Gleixner wrote:
>> That's only true for the case where the new kernel takes over.
>>
>> In the case KEXEC_JUMP=n and kexec_image->preserve_context == true, then
>> it is supposed to align with suspend/resume and if you look at the code
>> then it actually mimics suspend/resume in the most dilettanteish way.
>
> Did you mean KEXEC_JUMP=y there?
Yes, of course.
> I spent a while the other week trying to understand the case where
> CONFIG_KEXEC_JUMP=n and kexec_image->preserve_context=true, and came to
> the conclusion that it was a mirage. Userspace can't *actually* set the
> KEXEC_PRESERVE_CONTEXT bit when setting up the image, if KEXEC_JUMP=n.
>
> The whole of the code path for that case is dead code. It's confusing
> because as discussed elsewhere, we don't just #ifdef out the whole of
> that dead code path, but only the bits which don't actually *compile*
> (like references to restore_processor_state() etc.).
Yes, I had to stare at it quite a while. :)
>> It's a patently bad idea to clobber the kernel with kexec jump "fixes"
>> instead of using the well tested and established suspend/resume
>> machinery.
>>
>> All it takes is to:
>>
>> 1) disable the wakeup logic
>>
>> 2) provide a mechanism to invoke machine_kexec() instead of the
>> actual suspend mechanism.
>>
>> No?
>
> Agreed. The hacky proof of concept I posted earlier invoking
> machine_kexec() instead of suspend_ops->enter() works fine. I'll look
> at cleaning it up and making it not invoke all the ACPI hooks for
> *actual* suspend to RAM, etc.
Something like the below? It survived an hour of loop testing.
> As I noted though, it doesn't address that linux-scsi report which was
> a *real* kexec, not a kjump.
I was not looking at that path at all.
Thanks,
tglx
---
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -582,4 +582,7 @@ enum suspend_stat_step {
void dpm_save_failed_dev(const char *name);
void dpm_save_failed_step(enum suspend_stat_step step);
+int kexec_suspend(void);
+void kexec_machine_execute(void);
+
#endif /* _LINUX_SUSPEND_H */
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -984,6 +984,12 @@ bool kexec_load_permitted(int kexec_imag
return true;
}
+void kexec_machine_execute(void)
+{
+ kmsg_dump(KMSG_DUMP_SHUTDOWN);
+ machine_kexec(kexec_image);
+}
+
/*
* Move into place and start executing a preloaded standalone
* executable. If nothing was preloaded return an error.
@@ -999,38 +1005,9 @@ int kernel_kexec(void)
goto Unlock;
}
-#ifdef CONFIG_KEXEC_JUMP
- if (kexec_image->preserve_context) {
- pm_prepare_console();
- error = freeze_processes();
- if (error) {
- error = -EBUSY;
- goto Restore_console;
- }
- suspend_console();
- error = dpm_suspend_start(PMSG_FREEZE);
- if (error)
- goto Resume_console;
- /* At this point, dpm_suspend_start() has been called,
- * but *not* dpm_suspend_end(). We *must* call
- * dpm_suspend_end() now. Otherwise, drivers for
- * some devices (e.g. interrupt controllers) become
- * desynchronized with the actual state of the
- * hardware at resume time, and evil weirdness ensues.
- */
- error = dpm_suspend_end(PMSG_FREEZE);
- if (error)
- goto Resume_devices;
- error = suspend_disable_secondary_cpus();
- if (error)
- goto Enable_cpus;
- local_irq_disable();
- error = syscore_suspend();
- if (error)
- goto Enable_irqs;
- } else
-#endif
- {
+ if (IS_ENABLED(CONFIG_KEXEC_JUMP) && kexec_image->preserve_context) {
+ error = kexec_suspend();
+ } else {
kexec_in_progress = true;
kernel_restart_prepare("kexec reboot");
migrate_to_reboot_cpu();
@@ -1045,30 +1022,10 @@ int kernel_kexec(void)
cpu_hotplug_enable();
pr_notice("Starting new kernel\n");
machine_shutdown();
+ kexec_machine_execute();
}
- kmsg_dump(KMSG_DUMP_SHUTDOWN);
- machine_kexec(kexec_image);
-
-#ifdef CONFIG_KEXEC_JUMP
- if (kexec_image->preserve_context) {
- syscore_resume();
- Enable_irqs:
- local_irq_enable();
- Enable_cpus:
- suspend_enable_secondary_cpus();
- dpm_resume_start(PMSG_RESTORE);
- Resume_devices:
- dpm_resume_end(PMSG_RESTORE);
- Resume_console:
- resume_console();
- thaw_processes();
- Restore_console:
- pm_restore_console();
- }
-#endif
-
- Unlock:
+Unlock:
kexec_unlock();
return error;
}
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -400,7 +400,7 @@ void __weak arch_suspend_enable_irqs(voi
*
* This function should be called after devices have been suspended.
*/
-static int suspend_enter(suspend_state_t state, bool *wakeup)
+static int suspend_enter(suspend_state_t state, bool *wakeup, bool kexec_jump)
{
int error;
@@ -445,15 +445,19 @@ static int suspend_enter(suspend_state_t
error = syscore_suspend();
if (!error) {
- *wakeup = pm_wakeup_pending();
- if (!(suspend_test(TEST_CORE) || *wakeup)) {
- trace_suspend_resume(TPS("machine_suspend"),
- state, true);
- error = suspend_ops->enter(state);
- trace_suspend_resume(TPS("machine_suspend"),
- state, false);
- } else if (*wakeup) {
- error = -EBUSY;
+ if (IS_ENABLED(CONFIG_KEXEC_JUMP) && kexec_jump) {
+ kexec_machine_execute();
+ } else {
+ *wakeup = pm_wakeup_pending();
+ if (!(suspend_test(TEST_CORE) || *wakeup)) {
+ trace_suspend_resume(TPS("machine_suspend"),
+ state, true);
+ error = suspend_ops->enter(state);
+ trace_suspend_resume(TPS("machine_suspend"),
+ state, false);
+ } else if (*wakeup) {
+ error = -EBUSY;
+ }
}
syscore_resume();
}
@@ -485,7 +489,7 @@ static int suspend_enter(suspend_state_t
* suspend_devices_and_enter - Suspend devices and enter system sleep state.
* @state: System sleep state to enter.
*/
-int suspend_devices_and_enter(suspend_state_t state)
+static int __suspend_devices_and_enter(suspend_state_t state, bool kexec_jump)
{
int error;
bool wakeup = false;
@@ -514,7 +518,7 @@ int suspend_devices_and_enter(suspend_st
goto Recover_platform;
do {
- error = suspend_enter(state, &wakeup);
+ error = suspend_enter(state, &wakeup, kexec_jump);
} while (!error && !wakeup && platform_suspend_again(state));
Resume_devices:
@@ -536,6 +540,15 @@ int suspend_devices_and_enter(suspend_st
}
/**
+ * suspend_devices_and_enter - Suspend devices and enter system sleep state.
+ * @state: System sleep state to enter.
+ */
+int suspend_devices_and_enter(suspend_state_t state)
+{
+ return __suspend_devices_and_enter(state, false);
+}
+
+/**
* suspend_finish - Clean up before finishing the suspend sequence.
*
* Call platform code to clean up, restart processes, and free the console that
@@ -607,6 +620,21 @@ static int enter_state(suspend_state_t s
return error;
}
+#ifdef CONFIG_KEXEC_JUMP
+int kexec_suspend(void)
+{
+ int error;
+
+ ksys_sync_helper();
+ error = freeze_processes();
+ if (error)
+ return error;
+ error = __suspend_devices_and_enter(PM_SUSPEND_MEM, true);
+ thaw_processes();
+ return error;
+}
+#endif
+
/**
* pm_suspend - Externally visible function for suspending the system.
* @state: System sleep state to enter.
next prev parent reply other threads:[~2024-12-13 17:06 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-09 14:28 Lockdep warnings on kexec (virtio_blk, hrtimers) David Woodhouse
2024-12-10 1:56 ` Jason Wang
2024-12-11 12:42 ` Stefan Hajnoczi
2024-12-12 11:07 ` David Woodhouse
2024-12-12 13:34 ` Thomas Gleixner
2024-12-12 13:46 ` David Woodhouse
2024-12-12 18:04 ` Thomas Gleixner
2024-12-12 19:19 ` David Woodhouse
2024-12-13 0:14 ` Thomas Gleixner
2024-12-13 9:31 ` David Woodhouse
2024-12-13 9:43 ` David Woodhouse
2024-12-13 10:42 ` Thomas Gleixner
2024-12-13 11:09 ` Ming Lei
2024-12-13 11:31 ` Thomas Gleixner
2024-12-13 11:48 ` Ming Lei
2024-12-13 13:23 ` Thomas Gleixner
2024-12-13 14:07 ` David Woodhouse
2024-12-13 17:05 ` Thomas Gleixner [this message]
2024-12-13 17:17 ` David Woodhouse
2024-12-13 17:48 ` Rafael J. Wysocki
2024-12-13 17:32 ` Rafael J. Wysocki
2024-12-13 19:06 ` Rafael J. Wysocki
2024-12-13 20:16 ` David Woodhouse
2024-12-14 9:57 ` David Woodhouse
2024-12-16 12:14 ` Rafael J. Wysocki
2024-12-13 17:59 ` Rafael J. Wysocki
2024-12-13 13:17 ` David Woodhouse
2024-12-13 11:12 ` David Woodhouse
2024-12-13 11:33 ` Ming Lei
2024-12-13 11:20 ` Peter Zijlstra
2024-12-13 13:13 ` Thomas Gleixner
2024-12-16 13:20 ` [PATCH] sched: Prevent rescheduling when interrupts are disabled Thomas Gleixner
2024-12-16 17:41 ` David Woodhouse
2024-12-12 11:12 ` Lockdep warnings on kexec (virtio_blk, hrtimers) Ming Lei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87msgz8qes.ffs@tglx \
--to=tglx@linutronix.de \
--cc=axboe@kernel.dk \
--cc=bonzini@redhat.com \
--cc=dwmw2@infradead.org \
--cc=dyoung@redhat.com \
--cc=eperezma@redhat.com \
--cc=hpa@zytor.com \
--cc=jasowang@redhat.com \
--cc=jogness@linutronix.de \
--cc=kexec@lists.infradead.org \
--cc=linux-ext4@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=mst@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rafael@kernel.org \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.