* [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling
@ 2023-05-12 23:31 ` Sean Christopherson
0 siblings, 0 replies; 43+ messages in thread
From: Sean Christopherson @ 2023-05-12 23:31 UTC (permalink / raw)
To: kvm-riscv
Fix a bug where enabling hardware virtualization can race with a forced
reboot, e.g. `reboot -f`, and result in virt hardware being enabled when
the reboot is attempted, and thus hanging the reboot.
Found by inspection, confirmed by hacking the reboot flow to wait until
KVM loads (the problematic window is ridiculously small).
Fully tested only on x86, compile tested on other architectures.
v2:
- Rename KVM's callback to kvm_shutdown() to match the hook. [Marc]
- Don't add a spurious newline. [Marc]
v1: https://lore.kernel.org/all/20230310221414.811690-1-seanjc at google.com
Sean Christopherson (2):
KVM: Use syscore_ops instead of reboot_notifier to hook
restart/shutdown
KVM: Don't enable hardware after a restart/shutdown is initiated
virt/kvm/kvm_main.c | 43 +++++++++++++++++++++++++++----------------
1 file changed, 27 insertions(+), 16 deletions(-)
base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac
--
2.40.1.606.ga4b1b128d6-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling @ 2023-05-12 23:31 ` Sean Christopherson 0 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-05-12 23:31 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, linux-kernel, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, kvmarm, Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra, kvm-riscv, Sean Christopherson Fix a bug where enabling hardware virtualization can race with a forced reboot, e.g. `reboot -f`, and result in virt hardware being enabled when the reboot is attempted, and thus hanging the reboot. Found by inspection, confirmed by hacking the reboot flow to wait until KVM loads (the problematic window is ridiculously small). Fully tested only on x86, compile tested on other architectures. v2: - Rename KVM's callback to kvm_shutdown() to match the hook. [Marc] - Don't add a spurious newline. [Marc] v1: https://lore.kernel.org/all/20230310221414.811690-1-seanjc@google.com Sean Christopherson (2): KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown KVM: Don't enable hardware after a restart/shutdown is initiated virt/kvm/kvm_main.c | 43 +++++++++++++++++++++++++++---------------- 1 file changed, 27 insertions(+), 16 deletions(-) base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac -- 2.40.1.606.ga4b1b128d6-goog ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-05-12 23:31 ` Sean Christopherson @ 2023-05-12 23:31 ` Sean Christopherson -1 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-05-12 23:31 UTC (permalink / raw) To: kvm-riscv Use syscore_ops.shutdown to disable hardware virtualization during a reboot instead of using the dedicated reboot_notifier so that KVM disables virtualization _after_ system_state has been updated. This will allow fixing a race in KVM's handling of a forced reboot where KVM can end up enabling hardware virtualization between kernel_restart_prepare() and machine_restart(). Rename KVM's hook to match the syscore op to avoid any possible confusion from wiring up a "reboot" helper to a "shutdown" hook (neither "shutdown nor "reboot" is completely accurate as the hook handles both). Opportunistically rewrite kvm_shutdown()'s comment to make it less VMX specific, and to explain why kvm_rebooting exists. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: kvmarm at lists.linux.dev Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Anup Patel <anup@brainfault.org> Cc: Atish Patra <atishp@atishpatra.org> Cc: kvm-riscv at lists.infradead.org Signed-off-by: Sean Christopherson <seanjc@google.com> --- virt/kvm/kvm_main.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d1abb331ea68..e771b6a013c9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5211,26 +5211,24 @@ static int hardware_enable_all(void) return r; } -static int kvm_reboot(struct notifier_block *notifier, unsigned long val, - void *v) +static void kvm_shutdown(void) { /* - * Some (well, at least mine) BIOSes hang on reboot if - * in vmx root mode. - * - * And Intel TXT required VMX off for all cpu when system shutdown. + * Disable hardware virtualization and set kvm_rebooting to indicate + * that KVM has asynchronously disabled hardware virtualization, i.e. + * that relevant errors and exceptions aren't entirely unexpected. + * Some flavors of hardware virtualization need to be disabled before + * transferring control to firmware (to perform shutdown/reboot), e.g. + * on x86, virtualization can block INIT interrupts, which are used by + * firmware to pull APs back under firmware control. Note, this path + * is used for both shutdown and reboot scenarios, i.e. neither name is + * 100% comprehensive. */ pr_info("kvm: exiting hardware virtualization\n"); kvm_rebooting = true; on_each_cpu(hardware_disable_nolock, NULL, 1); - return NOTIFY_OK; } -static struct notifier_block kvm_reboot_notifier = { - .notifier_call = kvm_reboot, - .priority = 0, -}; - static int kvm_suspend(void) { /* @@ -5261,6 +5259,7 @@ static void kvm_resume(void) static struct syscore_ops kvm_syscore_ops = { .suspend = kvm_suspend, .resume = kvm_resume, + .shutdown = kvm_shutdown, }; #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */ static int hardware_enable_all(void) @@ -5965,7 +5964,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) if (r) return r; - register_reboot_notifier(&kvm_reboot_notifier); register_syscore_ops(&kvm_syscore_ops); #endif @@ -6037,7 +6035,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) err_vcpu_cache: #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING unregister_syscore_ops(&kvm_syscore_ops); - unregister_reboot_notifier(&kvm_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); #endif return r; @@ -6063,7 +6060,6 @@ void kvm_exit(void) kvm_async_pf_deinit(); #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING unregister_syscore_ops(&kvm_syscore_ops); - unregister_reboot_notifier(&kvm_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); #endif kvm_irqfd_exit(); -- 2.40.1.606.ga4b1b128d6-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-05-12 23:31 ` Sean Christopherson 0 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-05-12 23:31 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, linux-kernel, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, kvmarm, Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra, kvm-riscv, Sean Christopherson Use syscore_ops.shutdown to disable hardware virtualization during a reboot instead of using the dedicated reboot_notifier so that KVM disables virtualization _after_ system_state has been updated. This will allow fixing a race in KVM's handling of a forced reboot where KVM can end up enabling hardware virtualization between kernel_restart_prepare() and machine_restart(). Rename KVM's hook to match the syscore op to avoid any possible confusion from wiring up a "reboot" helper to a "shutdown" hook (neither "shutdown nor "reboot" is completely accurate as the hook handles both). Opportunistically rewrite kvm_shutdown()'s comment to make it less VMX specific, and to explain why kvm_rebooting exists. Cc: Marc Zyngier <maz@kernel.org> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: James Morse <james.morse@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: kvmarm@lists.linux.dev Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Aleksandar Markovic <aleksandar.qemu.devel@gmail.com> Cc: Anup Patel <anup@brainfault.org> Cc: Atish Patra <atishp@atishpatra.org> Cc: kvm-riscv@lists.infradead.org Signed-off-by: Sean Christopherson <seanjc@google.com> --- virt/kvm/kvm_main.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d1abb331ea68..e771b6a013c9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5211,26 +5211,24 @@ static int hardware_enable_all(void) return r; } -static int kvm_reboot(struct notifier_block *notifier, unsigned long val, - void *v) +static void kvm_shutdown(void) { /* - * Some (well, at least mine) BIOSes hang on reboot if - * in vmx root mode. - * - * And Intel TXT required VMX off for all cpu when system shutdown. + * Disable hardware virtualization and set kvm_rebooting to indicate + * that KVM has asynchronously disabled hardware virtualization, i.e. + * that relevant errors and exceptions aren't entirely unexpected. + * Some flavors of hardware virtualization need to be disabled before + * transferring control to firmware (to perform shutdown/reboot), e.g. + * on x86, virtualization can block INIT interrupts, which are used by + * firmware to pull APs back under firmware control. Note, this path + * is used for both shutdown and reboot scenarios, i.e. neither name is + * 100% comprehensive. */ pr_info("kvm: exiting hardware virtualization\n"); kvm_rebooting = true; on_each_cpu(hardware_disable_nolock, NULL, 1); - return NOTIFY_OK; } -static struct notifier_block kvm_reboot_notifier = { - .notifier_call = kvm_reboot, - .priority = 0, -}; - static int kvm_suspend(void) { /* @@ -5261,6 +5259,7 @@ static void kvm_resume(void) static struct syscore_ops kvm_syscore_ops = { .suspend = kvm_suspend, .resume = kvm_resume, + .shutdown = kvm_shutdown, }; #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */ static int hardware_enable_all(void) @@ -5965,7 +5964,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) if (r) return r; - register_reboot_notifier(&kvm_reboot_notifier); register_syscore_ops(&kvm_syscore_ops); #endif @@ -6037,7 +6035,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module) err_vcpu_cache: #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING unregister_syscore_ops(&kvm_syscore_ops); - unregister_reboot_notifier(&kvm_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); #endif return r; @@ -6063,7 +6060,6 @@ void kvm_exit(void) kvm_async_pf_deinit(); #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING unregister_syscore_ops(&kvm_syscore_ops); - unregister_reboot_notifier(&kvm_reboot_notifier); cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE); #endif kvm_irqfd_exit(); -- 2.40.1.606.ga4b1b128d6-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-05-12 23:31 ` Sean Christopherson (?) @ 2023-12-09 7:26 ` Gowans, James -1 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-09 7:26 UTC (permalink / raw) To: pbonzini@redhat.com, Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., ebiederm@xmission.com Cc: yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org Hi Sean, Blast from the past but I've just been bitten by this patch when rebasing across v6.4. On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > Use syscore_ops.shutdown to disable hardware virtualization during a > reboot instead of using the dedicated reboot_notifier so that KVM disables > virtualization _after_ system_state has been updated. This will allow > fixing a race in KVM's handling of a forced reboot where KVM can end up > enabling hardware virtualization between kernel_restart_prepare() and > machine_restart(). The issue is that, AFAICT, the syscore_ops.shutdown are not called when doing a kexec. Reboot notifiers are called across kexec via: kernel_kexec kernel_restart_prepare blocking_notifier_call_chain kvm_reboot So after this patch, KVM is not shutdown during kexec; if hardware virt mode is enabled then the kexec hangs in exactly the same manner as you describe with the reboot. Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are called in native_machine_shutdown, but KVM is not one of these. Thoughts on possible ways to fix this: a) go back to reboot notifiers b) get kexec to call syscore_shutdown() to invoke all of these callbacks c) Add a KVM-specific callback to native_machine_shutdown(); we only need this for Intel x86, right? My slight preference is towards adding syscore_shutdown() to kexec, but I'm not sure that's feasible. Adding kexec maintainers for input. JG _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-09 7:26 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-09 7:26 UTC (permalink / raw) To: pbonzini@redhat.com, Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., ebiederm@xmission.com Cc: yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org Hi Sean, Blast from the past but I've just been bitten by this patch when rebasing across v6.4. On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > Use syscore_ops.shutdown to disable hardware virtualization during a > reboot instead of using the dedicated reboot_notifier so that KVM disables > virtualization _after_ system_state has been updated. This will allow > fixing a race in KVM's handling of a forced reboot where KVM can end up > enabling hardware virtualization between kernel_restart_prepare() and > machine_restart(). The issue is that, AFAICT, the syscore_ops.shutdown are not called when doing a kexec. Reboot notifiers are called across kexec via: kernel_kexec kernel_restart_prepare blocking_notifier_call_chain kvm_reboot So after this patch, KVM is not shutdown during kexec; if hardware virt mode is enabled then the kexec hangs in exactly the same manner as you describe with the reboot. Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are called in native_machine_shutdown, but KVM is not one of these. Thoughts on possible ways to fix this: a) go back to reboot notifiers b) get kexec to call syscore_shutdown() to invoke all of these callbacks c) Add a KVM-specific callback to native_machine_shutdown(); we only need this for Intel x86, right? My slight preference is towards adding syscore_shutdown() to kexec, but I'm not sure that's feasible. Adding kexec maintainers for input. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-09 7:26 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-09 7:26 UTC (permalink / raw) To: kvm-riscv Hi Sean, Blast from the past but I've just been bitten by this patch when rebasing across v6.4. On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > Use syscore_ops.shutdown to disable hardware virtualization during a > reboot instead of using the dedicated reboot_notifier so that KVM disables > virtualization _after_ system_state has been updated.? This will allow > fixing a race in KVM's handling of a forced reboot where KVM can end up > enabling hardware virtualization between kernel_restart_prepare() and > machine_restart(). The issue is that, AFAICT, the syscore_ops.shutdown are not called when doing a kexec. Reboot notifiers are called across kexec via: kernel_kexec kernel_restart_prepare blocking_notifier_call_chain kvm_reboot So after this patch, KVM is not shutdown during kexec; if hardware virt mode is enabled then the kexec hangs in exactly the same manner as you describe with the reboot. Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are called in native_machine_shutdown, but KVM is not one of these. Thoughts on possible ways to fix this: a) go back to reboot notifiers b) get kexec to call syscore_shutdown() to invoke all of these callbacks c) Add a KVM-specific callback to native_machine_shutdown(); we only need this for Intel x86, right? My slight preference is towards adding syscore_shutdown() to kexec, but I'm not sure that's feasible. Adding kexec maintainers for input. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-09 7:26 ` Gowans, James (?) @ 2023-12-10 4:53 ` Eric W. Biederman -1 siblings, 0 replies; 43+ messages in thread From: Eric W. Biederman @ 2023-12-10 4:53 UTC (permalink / raw) To: Gowans, James Cc: pbonzini@redhat.com, Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org "Gowans, James" <jgowans@amazon.com> writes: > Hi Sean, > > Blast from the past but I've just been bitten by this patch when > rebasing across v6.4. > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: >> Use syscore_ops.shutdown to disable hardware virtualization during a >> reboot instead of using the dedicated reboot_notifier so that KVM disables >> virtualization _after_ system_state has been updated. This will allow >> fixing a race in KVM's handling of a forced reboot where KVM can end up >> enabling hardware virtualization between kernel_restart_prepare() and >> machine_restart(). > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > doing a kexec. Reboot notifiers are called across kexec via: > > kernel_kexec > kernel_restart_prepare > blocking_notifier_call_chain > kvm_reboot > > So after this patch, KVM is not shutdown during kexec; if hardware virt > mode is enabled then the kexec hangs in exactly the same manner as you > describe with the reboot. kernel_restart_prepare calls device_shutdown. Which should call all of the shutdown operations. This has been the way the code has been structured since December 2005. > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > called in native_machine_shutdown, but KVM is not one of these. > > Thoughts on possible ways to fix this: > a) go back to reboot notifiers > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > c) Add a KVM-specific callback to native_machine_shutdown(); we only > need this for Intel x86, right? > > My slight preference is towards adding syscore_shutdown() to kexec, but > I'm not sure that's feasible. Adding kexec maintainers for input. Why isn't device_suthdown calling syscore_shutdown? What problem are you running into with your rebase that worked with reboot notifiers that is not working with syscore_shutdown? Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-10 4:53 ` Eric W. Biederman 0 siblings, 0 replies; 43+ messages in thread From: Eric W. Biederman @ 2023-12-10 4:53 UTC (permalink / raw) To: Gowans, James Cc: pbonzini@redhat.com, Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org "Gowans, James" <jgowans@amazon.com> writes: > Hi Sean, > > Blast from the past but I've just been bitten by this patch when > rebasing across v6.4. > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: >> Use syscore_ops.shutdown to disable hardware virtualization during a >> reboot instead of using the dedicated reboot_notifier so that KVM disables >> virtualization _after_ system_state has been updated. This will allow >> fixing a race in KVM's handling of a forced reboot where KVM can end up >> enabling hardware virtualization between kernel_restart_prepare() and >> machine_restart(). > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > doing a kexec. Reboot notifiers are called across kexec via: > > kernel_kexec > kernel_restart_prepare > blocking_notifier_call_chain > kvm_reboot > > So after this patch, KVM is not shutdown during kexec; if hardware virt > mode is enabled then the kexec hangs in exactly the same manner as you > describe with the reboot. kernel_restart_prepare calls device_shutdown. Which should call all of the shutdown operations. This has been the way the code has been structured since December 2005. > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > called in native_machine_shutdown, but KVM is not one of these. > > Thoughts on possible ways to fix this: > a) go back to reboot notifiers > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > c) Add a KVM-specific callback to native_machine_shutdown(); we only > need this for Intel x86, right? > > My slight preference is towards adding syscore_shutdown() to kexec, but > I'm not sure that's feasible. Adding kexec maintainers for input. Why isn't device_suthdown calling syscore_shutdown? What problem are you running into with your rebase that worked with reboot notifiers that is not working with syscore_shutdown? Eric ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-10 4:53 ` Eric W. Biederman 0 siblings, 0 replies; 43+ messages in thread From: Eric W. Biederman @ 2023-12-10 4:53 UTC (permalink / raw) To: kvm-riscv "Gowans, James" <jgowans@amazon.com> writes: > Hi Sean, > > Blast from the past but I've just been bitten by this patch when > rebasing across v6.4. > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: >> Use syscore_ops.shutdown to disable hardware virtualization during a >> reboot instead of using the dedicated reboot_notifier so that KVM disables >> virtualization _after_ system_state has been updated.? This will allow >> fixing a race in KVM's handling of a forced reboot where KVM can end up >> enabling hardware virtualization between kernel_restart_prepare() and >> machine_restart(). > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > doing a kexec. Reboot notifiers are called across kexec via: > > kernel_kexec > kernel_restart_prepare > blocking_notifier_call_chain > kvm_reboot > > So after this patch, KVM is not shutdown during kexec; if hardware virt > mode is enabled then the kexec hangs in exactly the same manner as you > describe with the reboot. kernel_restart_prepare calls device_shutdown. Which should call all of the shutdown operations. This has been the way the code has been structured since December 2005. > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > called in native_machine_shutdown, but KVM is not one of these. > > Thoughts on possible ways to fix this: > a) go back to reboot notifiers > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > c) Add a KVM-specific callback to native_machine_shutdown(); we only > need this for Intel x86, right? > > My slight preference is towards adding syscore_shutdown() to kexec, but > I'm not sure that's feasible. Adding kexec maintainers for input. Why isn't device_suthdown calling syscore_shutdown? What problem are you running into with your rebase that worked with reboot notifiers that is not working with syscore_shutdown? Eric ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-10 4:53 ` Eric W. Biederman (?) @ 2023-12-11 7:54 ` Gowans, James -1 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 7:54 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, anup@brainfault.org, aleksandar.qemu.devel@gmail.com On Sat, 2023-12-09 at 22:53 -0600, Eric W. Biederman wrote: > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > > Use syscore_ops.shutdown to disable hardware virtualization during a > > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > > virtualization _after_ system_state has been updated. This will allow > > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > > enabling hardware virtualization between kernel_restart_prepare() and > > > machine_restart(). > > > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > > doing a kexec. Reboot notifiers are called across kexec via: > > > > kernel_kexec > > kernel_restart_prepare > > blocking_notifier_call_chain > > kvm_reboot > > > > So after this patch, KVM is not shutdown during kexec; if hardware virt > > mode is enabled then the kexec hangs in exactly the same manner as you > > describe with the reboot. > > kernel_restart_prepare calls device_shutdown. Which should call all > of the shutdown operations. This has been the way the code has been > structured since December 2005. Yes, kernel_reset_prepare calls device_shutdown which calls dev->driver->shutdown for each dev which has a driver. KVM, however, is not a dev with a driver, hence doesn't have a dev->driver->shutdown callback. So KVM is no-op'ed in device_shutdown. KVM adds its shutdown callback to syscore_ops and expects to be shut down that way. > > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > > called in native_machine_shutdown, but KVM is not one of these. > > > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > Why isn't device_suthdown calling syscore_shutdown? I'm not sure - that would indeed be one way to fix; adding a call to syscore_shutdown() inside device_shutdown. We may need to clean up other callers to not do their own syscore_shutdown and rather depend on device_shutdown to do it all. Would you support adding syscore_shutdown() to device_shutdown()? > > What problem are you running into with your rebase that worked with > reboot notifiers that is not working with syscore_shutdown? Prior to this commit [1] which changed KVM from reboot notifiers to syscore_ops, KVM's reboot notifier shutdown callback was invoked on kexec via kernel_restart_prepare. After this commit, KVM is not being shut down because currently the kexec flow does not call syscore_shutdown. JG [1] https://github.com/torvalds/linux/commit/6735150b6997 _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 7:54 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 7:54 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, anup@brainfault.org, aleksandar.qemu.devel@gmail.com On Sat, 2023-12-09 at 22:53 -0600, Eric W. Biederman wrote: > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > > Use syscore_ops.shutdown to disable hardware virtualization during a > > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > > virtualization _after_ system_state has been updated. This will allow > > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > > enabling hardware virtualization between kernel_restart_prepare() and > > > machine_restart(). > > > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > > doing a kexec. Reboot notifiers are called across kexec via: > > > > kernel_kexec > > kernel_restart_prepare > > blocking_notifier_call_chain > > kvm_reboot > > > > So after this patch, KVM is not shutdown during kexec; if hardware virt > > mode is enabled then the kexec hangs in exactly the same manner as you > > describe with the reboot. > > kernel_restart_prepare calls device_shutdown. Which should call all > of the shutdown operations. This has been the way the code has been > structured since December 2005. Yes, kernel_reset_prepare calls device_shutdown which calls dev->driver->shutdown for each dev which has a driver. KVM, however, is not a dev with a driver, hence doesn't have a dev->driver->shutdown callback. So KVM is no-op'ed in device_shutdown. KVM adds its shutdown callback to syscore_ops and expects to be shut down that way. > > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > > called in native_machine_shutdown, but KVM is not one of these. > > > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > Why isn't device_suthdown calling syscore_shutdown? I'm not sure - that would indeed be one way to fix; adding a call to syscore_shutdown() inside device_shutdown. We may need to clean up other callers to not do their own syscore_shutdown and rather depend on device_shutdown to do it all. Would you support adding syscore_shutdown() to device_shutdown()? > > What problem are you running into with your rebase that worked with > reboot notifiers that is not working with syscore_shutdown? Prior to this commit [1] which changed KVM from reboot notifiers to syscore_ops, KVM's reboot notifier shutdown callback was invoked on kexec via kernel_restart_prepare. After this commit, KVM is not being shut down because currently the kexec flow does not call syscore_shutdown. JG [1] https://github.com/torvalds/linux/commit/6735150b6997 ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 7:54 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 7:54 UTC (permalink / raw) To: kvm-riscv On Sat, 2023-12-09 at 22:53 -0600, Eric W. Biederman wrote: > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > > Use syscore_ops.shutdown to disable hardware virtualization during a > > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > > virtualization _after_ system_state has been updated.? This will allow > > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > > enabling hardware virtualization between kernel_restart_prepare() and > > > machine_restart(). > > > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > > doing a kexec. Reboot notifiers are called across kexec via: > > > > kernel_kexec > > ?? kernel_restart_prepare > > ???? blocking_notifier_call_chain > > ?????? kvm_reboot > > > > So after this patch, KVM is not shutdown during kexec; if hardware virt > > mode is enabled then the kexec hangs in exactly the same manner as you > > describe with the reboot. > > kernel_restart_prepare calls device_shutdown.? Which should call all > of the shutdown operations.? This has been the way the code has been > structured since December 2005. Yes, kernel_reset_prepare calls device_shutdown which calls dev->driver->shutdown for each dev which has a driver. KVM, however, is not a dev with a driver, hence doesn't have a dev->driver->shutdown callback. So KVM is no-op'ed in device_shutdown. KVM adds its shutdown callback to syscore_ops and expects to be shut down that way. > > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > > called in native_machine_shutdown, but KVM is not one of these. > > > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > Why isn't device_suthdown calling syscore_shutdown? I'm not sure - that would indeed be one way to fix; adding a call to syscore_shutdown() inside device_shutdown. We may need to clean up other callers to not do their own syscore_shutdown and rather depend on device_shutdown to do it all. Would you support adding syscore_shutdown() to device_shutdown()? > > What problem are you running into with your rebase that worked with > reboot notifiers that is not working with syscore_shutdown? Prior to this commit [1] which changed KVM from reboot notifiers to syscore_ops, KVM's reboot notifier shutdown callback was invoked on kexec via kernel_restart_prepare. After this commit, KVM is not being shut down because currently the kexec flow does not call syscore_shutdown. JG [1] https://github.com/torvalds/linux/commit/6735150b6997 ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-11 7:54 ` Gowans, James (?) @ 2023-12-11 10:27 ` Gowans, James -1 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 10:27 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, anup@brainfault.org, aleksandar.qemu.devel@gmail.com On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: > > > > What problem are you running into with your rebase that worked with > > reboot notifiers that is not working with syscore_shutdown? > > Prior to this commit [1] which changed KVM from reboot notifiers to > syscore_ops, KVM's reboot notifier shutdown callback was invoked on > kexec via kernel_restart_prepare. > > After this commit, KVM is not being shut down because currently the > kexec flow does not call syscore_shutdown. I think I missed what you're asking here; you're asking for a reproducer for the specific failure? 1. Launch a QEMU VM with -enable-kvm flag 2. Do an immediate (-f flag) kexec: kexec -f --reuse-cmdline ./bzImage Somewhere after doing the RET to new kernel in the relocate_kernel asm function the new kernel starts triple faulting; I can't exactly figure out where but I think it has to do with the new kernel trying to modify CR3 while the VMXE bit is still set in CR4 causing the triple fault. If KVM has been shut down via the shutdown callback, or alternatively if the QEMU process has actually been killed first (by not doing a -f exec) then the VMXE bit is clear and the kexec goes smoothly. So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a triple fault crash. JG _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 10:27 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 10:27 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, anup@brainfault.org, aleksandar.qemu.devel@gmail.com On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: > > > > What problem are you running into with your rebase that worked with > > reboot notifiers that is not working with syscore_shutdown? > > Prior to this commit [1] which changed KVM from reboot notifiers to > syscore_ops, KVM's reboot notifier shutdown callback was invoked on > kexec via kernel_restart_prepare. > > After this commit, KVM is not being shut down because currently the > kexec flow does not call syscore_shutdown. I think I missed what you're asking here; you're asking for a reproducer for the specific failure? 1. Launch a QEMU VM with -enable-kvm flag 2. Do an immediate (-f flag) kexec: kexec -f --reuse-cmdline ./bzImage Somewhere after doing the RET to new kernel in the relocate_kernel asm function the new kernel starts triple faulting; I can't exactly figure out where but I think it has to do with the new kernel trying to modify CR3 while the VMXE bit is still set in CR4 causing the triple fault. If KVM has been shut down via the shutdown callback, or alternatively if the QEMU process has actually been killed first (by not doing a -f exec) then the VMXE bit is clear and the kexec goes smoothly. So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a triple fault crash. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 10:27 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 10:27 UTC (permalink / raw) To: kvm-riscv On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: > > > > What problem are you running into with your rebase that worked with > > reboot notifiers that is not working with syscore_shutdown? > > Prior to this commit [1] which changed KVM from reboot notifiers to > syscore_ops, KVM's reboot notifier shutdown callback was invoked on > kexec via kernel_restart_prepare. > > After this commit, KVM is not being shut down because currently the > kexec flow does not call syscore_shutdown. I think I missed what you're asking here; you're asking for a reproducer for the specific failure? 1. Launch a QEMU VM with -enable-kvm flag 2. Do an immediate (-f flag) kexec: kexec -f --reuse-cmdline ./bzImage Somewhere after doing the RET to new kernel in the relocate_kernel asm function the new kernel starts triple faulting; I can't exactly figure out where but I think it has to do with the new kernel trying to modify CR3 while the VMXE bit is still set in CR4 causing the triple fault. If KVM has been shut down via the shutdown callback, or alternatively if the QEMU process has actually been killed first (by not doing a -f exec) then the VMXE bit is clear and the kexec goes smoothly. So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a triple fault crash. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-11 10:27 ` Gowans, James (?) @ 2023-12-11 23:50 ` Eric W. Biederman -1 siblings, 0 replies; 43+ messages in thread From: Eric W. Biederman @ 2023-12-11 23:50 UTC (permalink / raw) To: Gowans, James Cc: Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, anup@brainfault.org, aleksandar.qemu.devel@gmail.com "Gowans, James" <jgowans@amazon.com> writes: > On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: >> > >> > What problem are you running into with your rebase that worked with >> > reboot notifiers that is not working with syscore_shutdown? >> >> Prior to this commit [1] which changed KVM from reboot notifiers to >> syscore_ops, KVM's reboot notifier shutdown callback was invoked on >> kexec via kernel_restart_prepare. >> >> After this commit, KVM is not being shut down because currently the >> kexec flow does not call syscore_shutdown. > > I think I missed what you're asking here; you're asking for a reproducer > for the specific failure? > > 1. Launch a QEMU VM with -enable-kvm flag > > 2. Do an immediate (-f flag) kexec: > kexec -f --reuse-cmdline ./bzImage > > Somewhere after doing the RET to new kernel in the relocate_kernel asm > function the new kernel starts triple faulting; I can't exactly figure > out where but I think it has to do with the new kernel trying to modify > CR3 while the VMXE bit is still set in CR4 causing the triple fault. > > If KVM has been shut down via the shutdown callback, or alternatively if > the QEMU process has actually been killed first (by not doing a -f exec) > then the VMXE bit is clear and the kexec goes smoothly. > > So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a > triple fault crash. You mentioned I rebase so I thought your were backporting kernel patches. By rebase do you mean you porting your userspace to a newer kernel? In any event I believe the bug with respect to kexec was introduced in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()"). That is where syscore_shutdown was removed from kernel_restart_prepare(). At this point it looks like someone just needs to add the missing syscore_shutdown call into kernel_kexec() right after migrate_to_reboot_cpu() is called. That said I am not seeing the reboot notifiers being called on the kexec path either so your issue with kvm might be deeper. Eric _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 23:50 ` Eric W. Biederman 0 siblings, 0 replies; 43+ messages in thread From: Eric W. Biederman @ 2023-12-11 23:50 UTC (permalink / raw) To: Gowans, James Cc: Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, anup@brainfault.org, aleksandar.qemu.devel@gmail.com "Gowans, James" <jgowans@amazon.com> writes: > On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: >> > >> > What problem are you running into with your rebase that worked with >> > reboot notifiers that is not working with syscore_shutdown? >> >> Prior to this commit [1] which changed KVM from reboot notifiers to >> syscore_ops, KVM's reboot notifier shutdown callback was invoked on >> kexec via kernel_restart_prepare. >> >> After this commit, KVM is not being shut down because currently the >> kexec flow does not call syscore_shutdown. > > I think I missed what you're asking here; you're asking for a reproducer > for the specific failure? > > 1. Launch a QEMU VM with -enable-kvm flag > > 2. Do an immediate (-f flag) kexec: > kexec -f --reuse-cmdline ./bzImage > > Somewhere after doing the RET to new kernel in the relocate_kernel asm > function the new kernel starts triple faulting; I can't exactly figure > out where but I think it has to do with the new kernel trying to modify > CR3 while the VMXE bit is still set in CR4 causing the triple fault. > > If KVM has been shut down via the shutdown callback, or alternatively if > the QEMU process has actually been killed first (by not doing a -f exec) > then the VMXE bit is clear and the kexec goes smoothly. > > So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a > triple fault crash. You mentioned I rebase so I thought your were backporting kernel patches. By rebase do you mean you porting your userspace to a newer kernel? In any event I believe the bug with respect to kexec was introduced in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()"). That is where syscore_shutdown was removed from kernel_restart_prepare(). At this point it looks like someone just needs to add the missing syscore_shutdown call into kernel_kexec() right after migrate_to_reboot_cpu() is called. That said I am not seeing the reboot notifiers being called on the kexec path either so your issue with kvm might be deeper. Eric ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 23:50 ` Eric W. Biederman 0 siblings, 0 replies; 43+ messages in thread From: Eric W. Biederman @ 2023-12-11 23:50 UTC (permalink / raw) To: kvm-riscv "Gowans, James" <jgowans@amazon.com> writes: > On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: >> > >> > What problem are you running into with your rebase that worked with >> > reboot notifiers that is not working with syscore_shutdown? >> >> Prior to this commit [1] which changed KVM from reboot notifiers to >> syscore_ops, KVM's reboot notifier shutdown callback was invoked on >> kexec via kernel_restart_prepare. >> >> After this commit, KVM is not being shut down because currently the >> kexec flow does not call syscore_shutdown. > > I think I missed what you're asking here; you're asking for a reproducer > for the specific failure? > > 1. Launch a QEMU VM with -enable-kvm flag > > 2. Do an immediate (-f flag) kexec: > kexec -f --reuse-cmdline ./bzImage > > Somewhere after doing the RET to new kernel in the relocate_kernel asm > function the new kernel starts triple faulting; I can't exactly figure > out where but I think it has to do with the new kernel trying to modify > CR3 while the VMXE bit is still set in CR4 causing the triple fault. > > If KVM has been shut down via the shutdown callback, or alternatively if > the QEMU process has actually been killed first (by not doing a -f exec) > then the VMXE bit is clear and the kexec goes smoothly. > > So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a > triple fault crash. You mentioned I rebase so I thought your were backporting kernel patches. By rebase do you mean you porting your userspace to a newer kernel? In any event I believe the bug with respect to kexec was introduced in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()"). That is where syscore_shutdown was removed from kernel_restart_prepare(). At this point it looks like someone just needs to add the missing syscore_shutdown call into kernel_kexec() right after migrate_to_reboot_cpu() is called. That said I am not seeing the reboot notifiers being called on the kexec path either so your issue with kvm might be deeper. Eric ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-11 23:50 ` Eric W. Biederman (?) @ 2023-12-12 8:50 ` Gowans, James -1 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-12 8:50 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kexec@lists.infradead.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Mon, 2023-12-11 at 17:50 -0600, Eric W. Biederman wrote: > "Gowans, James" <jgowans@amazon.com> writes: > > > On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: > > > > > > > > What problem are you running into with your rebase that worked with > > > > reboot notifiers that is not working with syscore_shutdown? > > > > > > Prior to this commit [1] which changed KVM from reboot notifiers to > > > syscore_ops, KVM's reboot notifier shutdown callback was invoked on > > > kexec via kernel_restart_prepare. > > > > > > After this commit, KVM is not being shut down because currently the > > > kexec flow does not call syscore_shutdown. > > > > I think I missed what you're asking here; you're asking for a reproducer > > for the specific failure? > > > > 1. Launch a QEMU VM with -enable-kvm flag > > > > 2. Do an immediate (-f flag) kexec: > > kexec -f --reuse-cmdline ./bzImage > > > > Somewhere after doing the RET to new kernel in the relocate_kernel asm > > function the new kernel starts triple faulting; I can't exactly figure > > out where but I think it has to do with the new kernel trying to modify > > CR3 while the VMXE bit is still set in CR4 causing the triple fault. > > > > If KVM has been shut down via the shutdown callback, or alternatively if > > the QEMU process has actually been killed first (by not doing a -f exec) > > then the VMXE bit is clear and the kexec goes smoothly. > > > > So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a > > triple fault crash. > > You mentioned I rebase so I thought your were backporting kernel patches. > By rebase do you mean you porting your userspace to a newer kernel? I've been working on some patches and when I rebased my work-in-progress patches to latest master then kexec stopped working when KVM VMs exist. Originally the WIP patches were based on an older stable version. > > In any event I believe the bug with respect to kexec was introduced in > commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > disable_nonboot_cpus()"). That is where syscore_shutdown was removed > from kernel_restart_prepare(). > > At this point it looks like someone just needs to add the missing > syscore_shutdown call into kernel_kexec() right after > migrate_to_reboot_cpu() is called. Seems good and I'm happy to do that; one thing we need to check first: are all CPUs online at that point? The commit message for 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") speaks about: "one CPU on-line and interrupts disabled" when syscore_shutdown is called. KVM's syscore shutdown hook does: on_each_cpu(hardware_disable_nolock, NULL, 1); ... so that smells to me like it wants all the CPUs to be online at kvm_shutdown point. It's not clear to me: 1. Does hardware_disable_nolock actually need to be done on *every* CPU or would the offlined ones be fine to ignore because they will be reset and the VMXE bit will be cleared that way? With cooperative CPU handover we probably do indeed want to do this on every CPU and not depend on resetting. 2. Are CPUs actually offline at this point? When that commit was authored there used to be a call to hardware_disable_nolock() but that's not there anymore. > > That said I am not seeing the reboot notifiers being called on the kexec > path either so your issue with kvm might be deeper. Previously it was called via: kernel_kexec kernel_restart_prepare blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); kvm_shutdown JG _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-12 8:50 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-12 8:50 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kexec@lists.infradead.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Mon, 2023-12-11 at 17:50 -0600, Eric W. Biederman wrote: > "Gowans, James" <jgowans@amazon.com> writes: > > > On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: > > > > > > > > What problem are you running into with your rebase that worked with > > > > reboot notifiers that is not working with syscore_shutdown? > > > > > > Prior to this commit [1] which changed KVM from reboot notifiers to > > > syscore_ops, KVM's reboot notifier shutdown callback was invoked on > > > kexec via kernel_restart_prepare. > > > > > > After this commit, KVM is not being shut down because currently the > > > kexec flow does not call syscore_shutdown. > > > > I think I missed what you're asking here; you're asking for a reproducer > > for the specific failure? > > > > 1. Launch a QEMU VM with -enable-kvm flag > > > > 2. Do an immediate (-f flag) kexec: > > kexec -f --reuse-cmdline ./bzImage > > > > Somewhere after doing the RET to new kernel in the relocate_kernel asm > > function the new kernel starts triple faulting; I can't exactly figure > > out where but I think it has to do with the new kernel trying to modify > > CR3 while the VMXE bit is still set in CR4 causing the triple fault. > > > > If KVM has been shut down via the shutdown callback, or alternatively if > > the QEMU process has actually been killed first (by not doing a -f exec) > > then the VMXE bit is clear and the kexec goes smoothly. > > > > So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a > > triple fault crash. > > You mentioned I rebase so I thought your were backporting kernel patches. > By rebase do you mean you porting your userspace to a newer kernel? I've been working on some patches and when I rebased my work-in-progress patches to latest master then kexec stopped working when KVM VMs exist. Originally the WIP patches were based on an older stable version. > > In any event I believe the bug with respect to kexec was introduced in > commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > disable_nonboot_cpus()"). That is where syscore_shutdown was removed > from kernel_restart_prepare(). > > At this point it looks like someone just needs to add the missing > syscore_shutdown call into kernel_kexec() right after > migrate_to_reboot_cpu() is called. Seems good and I'm happy to do that; one thing we need to check first: are all CPUs online at that point? The commit message for 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") speaks about: "one CPU on-line and interrupts disabled" when syscore_shutdown is called. KVM's syscore shutdown hook does: on_each_cpu(hardware_disable_nolock, NULL, 1); ... so that smells to me like it wants all the CPUs to be online at kvm_shutdown point. It's not clear to me: 1. Does hardware_disable_nolock actually need to be done on *every* CPU or would the offlined ones be fine to ignore because they will be reset and the VMXE bit will be cleared that way? With cooperative CPU handover we probably do indeed want to do this on every CPU and not depend on resetting. 2. Are CPUs actually offline at this point? When that commit was authored there used to be a call to hardware_disable_nolock() but that's not there anymore. > > That said I am not seeing the reboot notifiers being called on the kexec > path either so your issue with kvm might be deeper. Previously it was called via: kernel_kexec kernel_restart_prepare blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); kvm_shutdown JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-12 8:50 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-12 8:50 UTC (permalink / raw) To: kvm-riscv On Mon, 2023-12-11 at 17:50 -0600, Eric W. Biederman wrote: > "Gowans, James" <jgowans@amazon.com> writes: > > > On Mon, 2023-12-11 at 09:54 +0200, James Gowans wrote: > > > > > > > > What problem are you running into with your rebase that worked with > > > > reboot notifiers that is not working with syscore_shutdown? > > > > > > Prior to this commit [1] which changed KVM from reboot notifiers to > > > syscore_ops, KVM's reboot notifier shutdown callback was invoked on > > > kexec via kernel_restart_prepare. > > > > > > After this commit, KVM is not being shut down because currently the > > > kexec flow does not call syscore_shutdown. > > > > I think I missed what you're asking here; you're asking for a reproducer > > for the specific failure? > > > > 1. Launch a QEMU VM with -enable-kvm flag > > > > 2. Do an immediate (-f flag) kexec: > > kexec -f --reuse-cmdline ./bzImage > > > > Somewhere after doing the RET to new kernel in the relocate_kernel asm > > function the new kernel starts triple faulting; I can't exactly figure > > out where but I think it has to do with the new kernel trying to modify > > CR3 while the VMXE bit is still set in CR4 causing the triple fault. > > > > If KVM has been shut down via the shutdown callback, or alternatively if > > the QEMU process has actually been killed first (by not doing a -f exec) > > then the VMXE bit is clear and the kexec goes smoothly. > > > > So, TL;DR: kexec -f use to work with a KVM VM active, now it goes into a > > triple fault crash. > > You mentioned I rebase so I thought your were backporting kernel patches. > By rebase do you mean you porting your userspace to a newer kernel? I've been working on some patches and when I rebased my work-in-progress patches to latest master then kexec stopped working when KVM VMs exist. Originally the WIP patches were based on an older stable version. > > In any event I believe the bug with respect to kexec was introduced in > commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > disable_nonboot_cpus()"). That is where syscore_shutdown was removed > from kernel_restart_prepare(). > > At this point it looks like someone just needs to add the missing > syscore_shutdown call into kernel_kexec() right after > migrate_to_reboot_cpu() is called. Seems good and I'm happy to do that; one thing we need to check first: are all CPUs online at that point? The commit message for 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") speaks about: "one CPU on-line and interrupts disabled" when syscore_shutdown is called. KVM's syscore shutdown hook does: on_each_cpu(hardware_disable_nolock, NULL, 1); ... so that smells to me like it wants all the CPUs to be online at kvm_shutdown point. It's not clear to me: 1. Does hardware_disable_nolock actually need to be done on *every* CPU or would the offlined ones be fine to ignore because they will be reset and the VMXE bit will be cleared that way? With cooperative CPU handover we probably do indeed want to do this on every CPU and not depend on resetting. 2. Are CPUs actually offline at this point? When that commit was authored there used to be a call to hardware_disable_nolock() but that's not there anymore. > > That said I am not seeing the reboot notifiers being called on the kexec > path either so your issue with kvm might be deeper. Previously it was called via: kernel_kexec kernel_restart_prepare blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); kvm_shutdown JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-12 8:50 ` Gowans, James (?) @ 2023-12-12 13:38 ` Paolo Bonzini -1 siblings, 0 replies; 43+ messages in thread From: Paolo Bonzini @ 2023-12-12 13:38 UTC (permalink / raw) To: Gowans, James Cc: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H., yuzenghui@huawei.com, kexec@lists.infradead.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Tue, Dec 12, 2023 at 9:51 AM Gowans, James <jgowans@amazon.com> wrote: > 1. Does hardware_disable_nolock actually need to be done on *every* CPU > or would the offlined ones be fine to ignore because they will be reset > and the VMXE bit will be cleared that way? With cooperative CPU handover > we probably do indeed want to do this on every CPU and not depend on > resetting. Offlined and onlined CPUs are handled via the CPU hotplug state machine, which calls into kvm_online_cpu and kvm_offline_cpu. Paolo _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-12 13:38 ` Paolo Bonzini 0 siblings, 0 replies; 43+ messages in thread From: Paolo Bonzini @ 2023-12-12 13:38 UTC (permalink / raw) To: Gowans, James Cc: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H., yuzenghui@huawei.com, kexec@lists.infradead.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Tue, Dec 12, 2023 at 9:51 AM Gowans, James <jgowans@amazon.com> wrote: > 1. Does hardware_disable_nolock actually need to be done on *every* CPU > or would the offlined ones be fine to ignore because they will be reset > and the VMXE bit will be cleared that way? With cooperative CPU handover > we probably do indeed want to do this on every CPU and not depend on > resetting. Offlined and onlined CPUs are handled via the CPU hotplug state machine, which calls into kvm_online_cpu and kvm_offline_cpu. Paolo ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-12 13:38 ` Paolo Bonzini 0 siblings, 0 replies; 43+ messages in thread From: Paolo Bonzini @ 2023-12-12 13:38 UTC (permalink / raw) To: kvm-riscv On Tue, Dec 12, 2023 at 9:51?AM Gowans, James <jgowans@amazon.com> wrote: > 1. Does hardware_disable_nolock actually need to be done on *every* CPU > or would the offlined ones be fine to ignore because they will be reset > and the VMXE bit will be cleared that way? With cooperative CPU handover > we probably do indeed want to do this on every CPU and not depend on > resetting. Offlined and onlined CPUs are handled via the CPU hotplug state machine, which calls into kvm_online_cpu and kvm_offline_cpu. Paolo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-12 8:50 ` Gowans, James (?) @ 2023-12-13 6:47 ` Gowans, James -1 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-13 6:47 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kexec@lists.infradead.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Tue, 2023-12-12 at 10:50 +0200, James Gowans wrote: > > > > In any event I believe the bug with respect to kexec was introduced in > > commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > > disable_nonboot_cpus()"). That is where syscore_shutdown was removed > > from kernel_restart_prepare(). > > > > At this point it looks like someone just needs to add the missing > > syscore_shutdown call into kernel_kexec() right after > > migrate_to_reboot_cpu() is called. > > Seems good and I'm happy to do that; one thing we need to check first: > are all CPUs online at that point? The commit message for > 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") > speaks about: "one CPU on-line and interrupts disabled" when > syscore_shutdown is called. KVM's syscore shutdown hook does: > > on_each_cpu(hardware_disable_nolock, NULL, 1); > > ... so that smells to me like it wants all the CPUs to be online at > kvm_shutdown point. > > It's not clear to me: > > 1. Does hardware_disable_nolock actually need to be done on *every* CPU > or would the offlined ones be fine to ignore because they will be reset > and the VMXE bit will be cleared that way? With cooperative CPU handover > we probably do indeed want to do this on every CPU and not depend on > resetting. > > 2. Are CPUs actually offline at this point? When that commit was > authored there used to be a call to hardware_disable_nolock() but that's > not there anymore. I've sent out a patch: https://lore.kernel.org/kexec/20231213064004.2419447-1-jgowans@amazon.com/T/#u Let's continue the discussion there. JG _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-13 6:47 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-13 6:47 UTC (permalink / raw) To: Graf (AWS), Alexander, seanjc@google.com, ebiederm@xmission.com, Schönherr, Jan H. Cc: yuzenghui@huawei.com, kexec@lists.infradead.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Tue, 2023-12-12 at 10:50 +0200, James Gowans wrote: > > > > In any event I believe the bug with respect to kexec was introduced in > > commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > > disable_nonboot_cpus()"). That is where syscore_shutdown was removed > > from kernel_restart_prepare(). > > > > At this point it looks like someone just needs to add the missing > > syscore_shutdown call into kernel_kexec() right after > > migrate_to_reboot_cpu() is called. > > Seems good and I'm happy to do that; one thing we need to check first: > are all CPUs online at that point? The commit message for > 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") > speaks about: "one CPU on-line and interrupts disabled" when > syscore_shutdown is called. KVM's syscore shutdown hook does: > > on_each_cpu(hardware_disable_nolock, NULL, 1); > > ... so that smells to me like it wants all the CPUs to be online at > kvm_shutdown point. > > It's not clear to me: > > 1. Does hardware_disable_nolock actually need to be done on *every* CPU > or would the offlined ones be fine to ignore because they will be reset > and the VMXE bit will be cleared that way? With cooperative CPU handover > we probably do indeed want to do this on every CPU and not depend on > resetting. > > 2. Are CPUs actually offline at this point? When that commit was > authored there used to be a call to hardware_disable_nolock() but that's > not there anymore. I've sent out a patch: https://lore.kernel.org/kexec/20231213064004.2419447-1-jgowans@amazon.com/T/#u Let's continue the discussion there. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-13 6:47 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-13 6:47 UTC (permalink / raw) To: kvm-riscv On Tue, 2023-12-12 at 10:50 +0200, James Gowans wrote: > > > > In any event I believe the bug with respect to kexec was introduced in > > commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after > > disable_nonboot_cpus()").? That is where syscore_shutdown was removed > > from kernel_restart_prepare(). > > > > At this point it looks like someone just needs to add the missing > > syscore_shutdown call into kernel_kexec() right after > > migrate_to_reboot_cpu() is called. > > Seems good and I'm happy to do that; one thing we need to check first: > are all CPUs online at that point? The commit message for > 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") > speaks about: "one CPU on-line and interrupts disabled" when > syscore_shutdown is called. KVM's syscore shutdown hook does: > > on_each_cpu(hardware_disable_nolock, NULL, 1); > > ... so that smells to me like it wants all the CPUs to be online at > kvm_shutdown point. > > It's not clear to me: > > 1. Does hardware_disable_nolock actually need to be done on *every* CPU > or would the offlined ones be fine to ignore because they will be reset > and the VMXE bit will be cleared that way? With cooperative CPU handover > we probably do indeed want to do this on every CPU and not depend on > resetting. > > 2. Are CPUs actually offline at this point? When that commit was > authored there used to be a call to hardware_disable_nolock() but that's > not there anymore. I've sent out a patch: https://lore.kernel.org/kexec/20231213064004.2419447-1-jgowans at amazon.com/T/#u Let's continue the discussion there. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-09 7:26 ` Gowans, James (?) @ 2023-12-11 17:34 ` Sean Christopherson -1 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-12-11 17:34 UTC (permalink / raw) To: James Gowans Cc: pbonzini@redhat.com, Alexander Graf, Jan Schönherr, ebiederm@xmission.com, yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org On Sat, Dec 09, 2023, James Gowans wrote: > Hi Sean, > > Blast from the past but I've just been bitten by this patch when > rebasing across v6.4. > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > Use syscore_ops.shutdown to disable hardware virtualization during a > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > virtualization _after_ system_state has been updated. This will allow > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > enabling hardware virtualization between kernel_restart_prepare() and > > machine_restart(). > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > doing a kexec. Reboot notifiers are called across kexec via: > > kernel_kexec > kernel_restart_prepare > blocking_notifier_call_chain > kvm_reboot > > So after this patch, KVM is not shutdown during kexec; if hardware virt > mode is enabled then the kexec hangs in exactly the same manner as you > describe with the reboot. > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > called in native_machine_shutdown, but KVM is not one of these. > > Thoughts on possible ways to fix this: > a) go back to reboot notifiers > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > c) Add a KVM-specific callback to native_machine_shutdown(); we only > need this for Intel x86, right? I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really should leave virtualization enabled across kexec(), even if leaving virtualization enabled is relatively benign on other architectures. One more option would be: d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > My slight preference is towards adding syscore_shutdown() to kexec, but > I'm not sure that's feasible. Adding kexec maintainers for input. In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the kexec_image->preserve_context path does syscore_suspend() (and then resume(), so it's not completely uncharted territory. However, there's a rather big wrinkle in that not all of the existing .shutdown() implementations are obviously ok to call during kexec. Luckily, AFAICT there are very few users of the syscore .shutdown hook, so it's at least feasible to go that route. x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't see how leaving #MC reporting enabled across kexec can work. ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. The interrupt controllers though? x86 disables IRQs at the very beginning of machine_kexec(), so it's likely fine. But every other architecture? No clue. E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I have no idea if that can run afoul of any of the paths below. arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown = kvm_shutdown, The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from kexec pretty much the same as shutdown for reboot, but other architectures have what appear to be unique paths for handling kexec. FWIW, if we want to go with option (b), syscore_shutdown() hooks could use kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 17:34 ` Sean Christopherson 0 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-12-11 17:34 UTC (permalink / raw) To: James Gowans Cc: pbonzini@redhat.com, Alexander Graf, Jan Schönherr, ebiederm@xmission.com, yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org On Sat, Dec 09, 2023, James Gowans wrote: > Hi Sean, > > Blast from the past but I've just been bitten by this patch when > rebasing across v6.4. > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > Use syscore_ops.shutdown to disable hardware virtualization during a > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > virtualization _after_ system_state has been updated. This will allow > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > enabling hardware virtualization between kernel_restart_prepare() and > > machine_restart(). > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > doing a kexec. Reboot notifiers are called across kexec via: > > kernel_kexec > kernel_restart_prepare > blocking_notifier_call_chain > kvm_reboot > > So after this patch, KVM is not shutdown during kexec; if hardware virt > mode is enabled then the kexec hangs in exactly the same manner as you > describe with the reboot. > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > called in native_machine_shutdown, but KVM is not one of these. > > Thoughts on possible ways to fix this: > a) go back to reboot notifiers > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > c) Add a KVM-specific callback to native_machine_shutdown(); we only > need this for Intel x86, right? I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really should leave virtualization enabled across kexec(), even if leaving virtualization enabled is relatively benign on other architectures. One more option would be: d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > My slight preference is towards adding syscore_shutdown() to kexec, but > I'm not sure that's feasible. Adding kexec maintainers for input. In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the kexec_image->preserve_context path does syscore_suspend() (and then resume(), so it's not completely uncharted territory. However, there's a rather big wrinkle in that not all of the existing .shutdown() implementations are obviously ok to call during kexec. Luckily, AFAICT there are very few users of the syscore .shutdown hook, so it's at least feasible to go that route. x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't see how leaving #MC reporting enabled across kexec can work. ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. The interrupt controllers though? x86 disables IRQs at the very beginning of machine_kexec(), so it's likely fine. But every other architecture? No clue. E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I have no idea if that can run afoul of any of the paths below. arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown = kvm_shutdown, The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from kexec pretty much the same as shutdown for reboot, but other architectures have what appear to be unique paths for handling kexec. FWIW, if we want to go with option (b), syscore_shutdown() hooks could use kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 17:34 ` Sean Christopherson 0 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-12-11 17:34 UTC (permalink / raw) To: kvm-riscv On Sat, Dec 09, 2023, James Gowans wrote: > Hi Sean, > > Blast from the past but I've just been bitten by this patch when > rebasing across v6.4. > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > Use syscore_ops.shutdown to disable hardware virtualization during a > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > virtualization _after_ system_state has been updated.? This will allow > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > enabling hardware virtualization between kernel_restart_prepare() and > > machine_restart(). > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > doing a kexec. Reboot notifiers are called across kexec via: > > kernel_kexec > kernel_restart_prepare > blocking_notifier_call_chain > kvm_reboot > > So after this patch, KVM is not shutdown during kexec; if hardware virt > mode is enabled then the kexec hangs in exactly the same manner as you > describe with the reboot. > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > called in native_machine_shutdown, but KVM is not one of these. > > Thoughts on possible ways to fix this: > a) go back to reboot notifiers > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > c) Add a KVM-specific callback to native_machine_shutdown(); we only > need this for Intel x86, right? I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really should leave virtualization enabled across kexec(), even if leaving virtualization enabled is relatively benign on other architectures. One more option would be: d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > My slight preference is towards adding syscore_shutdown() to kexec, but > I'm not sure that's feasible. Adding kexec maintainers for input. In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the kexec_image->preserve_context path does syscore_suspend() (and then resume(), so it's not completely uncharted territory. However, there's a rather big wrinkle in that not all of the existing .shutdown() implementations are obviously ok to call during kexec. Luckily, AFAICT there are very few users of the syscore .shutdown hook, so it's at least feasible to go that route. x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't see how leaving #MC reporting enabled across kexec can work. ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. The interrupt controllers though? x86 disables IRQs at the very beginning of machine_kexec(), so it's likely fine. But every other architecture? No clue. E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I have no idea if that can run afoul of any of the paths below. arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown = kvm_shutdown, The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from kexec pretty much the same as shutdown for reboot, but other architectures have what appear to be unique paths for handling kexec. FWIW, if we want to go with option (b), syscore_shutdown() hooks could use kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-11 17:34 ` Sean Christopherson (?) @ 2023-12-11 17:51 ` Sean Christopherson -1 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-12-11 17:51 UTC (permalink / raw) To: James Gowans Cc: pbonzini@redhat.com, Alexander Graf, Jan Schönherr, ebiederm@xmission.com, yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org On Mon, Dec 11, 2023, Sean Christopherson wrote: > On Sat, Dec 09, 2023, James Gowans wrote: > > Hi Sean, > > > > Blast from the past but I've just been bitten by this patch when > > rebasing across v6.4. > > > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > > Use syscore_ops.shutdown to disable hardware virtualization during a > > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > > virtualization _after_ system_state has been updated. This will allow > > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > > enabling hardware virtualization between kernel_restart_prepare() and > > > machine_restart(). > > > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > > doing a kexec. Reboot notifiers are called across kexec via: > > > > kernel_kexec > > kernel_restart_prepare > > blocking_notifier_call_chain > > kvm_reboot > > > > So after this patch, KVM is not shutdown during kexec; if hardware virt > > mode is enabled then the kexec hangs in exactly the same manner as you > > describe with the reboot. > > > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > > called in native_machine_shutdown, but KVM is not one of these. > > > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking > of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really > should leave virtualization enabled across kexec(), even if leaving virtualization *shouldn't* > enabled is relatively benign on other architectures. > > One more option would be: > > d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the > kexec_image->preserve_context path does syscore_suspend() (and then resume(), so > it's not completely uncharted territory. > > However, there's a rather big wrinkle in that not all of the existing .shutdown() > implementations are obviously ok to call during kexec. Luckily, AFAICT there are > very few users of the syscore .shutdown hook, so it's at least feasible to go that > route. > > x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't > see how leaving #MC reporting enabled across kexec can work. > > ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. > > The interrupt controllers though? x86 disables IRQs at the very beginning of > machine_kexec(), so it's likely fine. But every other architecture? No clue. > E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I > have no idea if that can run afoul of any of the paths below. > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, > arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, > drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, > virt/kvm/kvm_main.c .shutdown = kvm_shutdown, > > The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from > kexec pretty much the same as shutdown for reboot, but other architectures have > what appear to be unique paths for handling kexec. > > FWIW, if we want to go with option (b), syscore_shutdown() hooks could use > kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 17:51 ` Sean Christopherson 0 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-12-11 17:51 UTC (permalink / raw) To: James Gowans Cc: pbonzini@redhat.com, Alexander Graf, Jan Schönherr, ebiederm@xmission.com, yuzenghui@huawei.com, atishp@atishpatra.org, kvm-riscv@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, chenhuacai@kernel.org, linux-kernel@vger.kernel.org, kvmarm@lists.linux.dev, maz@kernel.org, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, kexec@lists.infradead.org On Mon, Dec 11, 2023, Sean Christopherson wrote: > On Sat, Dec 09, 2023, James Gowans wrote: > > Hi Sean, > > > > Blast from the past but I've just been bitten by this patch when > > rebasing across v6.4. > > > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > > Use syscore_ops.shutdown to disable hardware virtualization during a > > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > > virtualization _after_ system_state has been updated. This will allow > > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > > enabling hardware virtualization between kernel_restart_prepare() and > > > machine_restart(). > > > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > > doing a kexec. Reboot notifiers are called across kexec via: > > > > kernel_kexec > > kernel_restart_prepare > > blocking_notifier_call_chain > > kvm_reboot > > > > So after this patch, KVM is not shutdown during kexec; if hardware virt > > mode is enabled then the kexec hangs in exactly the same manner as you > > describe with the reboot. > > > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > > called in native_machine_shutdown, but KVM is not one of these. > > > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking > of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really > should leave virtualization enabled across kexec(), even if leaving virtualization *shouldn't* > enabled is relatively benign on other architectures. > > One more option would be: > > d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the > kexec_image->preserve_context path does syscore_suspend() (and then resume(), so > it's not completely uncharted territory. > > However, there's a rather big wrinkle in that not all of the existing .shutdown() > implementations are obviously ok to call during kexec. Luckily, AFAICT there are > very few users of the syscore .shutdown hook, so it's at least feasible to go that > route. > > x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't > see how leaving #MC reporting enabled across kexec can work. > > ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. > > The interrupt controllers though? x86 disables IRQs at the very beginning of > machine_kexec(), so it's likely fine. But every other architecture? No clue. > E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I > have no idea if that can run afoul of any of the paths below. > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, > arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, > drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, > virt/kvm/kvm_main.c .shutdown = kvm_shutdown, > > The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from > kexec pretty much the same as shutdown for reboot, but other architectures have > what appear to be unique paths for handling kexec. > > FWIW, if we want to go with option (b), syscore_shutdown() hooks could use > kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 17:51 ` Sean Christopherson 0 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-12-11 17:51 UTC (permalink / raw) To: kvm-riscv On Mon, Dec 11, 2023, Sean Christopherson wrote: > On Sat, Dec 09, 2023, James Gowans wrote: > > Hi Sean, > > > > Blast from the past but I've just been bitten by this patch when > > rebasing across v6.4. > > > > On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote: > > > Use syscore_ops.shutdown to disable hardware virtualization during a > > > reboot instead of using the dedicated reboot_notifier so that KVM disables > > > virtualization _after_ system_state has been updated.? This will allow > > > fixing a race in KVM's handling of a forced reboot where KVM can end up > > > enabling hardware virtualization between kernel_restart_prepare() and > > > machine_restart(). > > > > The issue is that, AFAICT, the syscore_ops.shutdown are not called when > > doing a kexec. Reboot notifiers are called across kexec via: > > > > kernel_kexec > > kernel_restart_prepare > > blocking_notifier_call_chain > > kvm_reboot > > > > So after this patch, KVM is not shutdown during kexec; if hardware virt > > mode is enabled then the kexec hangs in exactly the same manner as you > > describe with the reboot. > > > > Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are > > called in native_machine_shutdown, but KVM is not one of these. > > > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking > of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really > should leave virtualization enabled across kexec(), even if leaving virtualization *shouldn't* > enabled is relatively benign on other architectures. > > One more option would be: > > d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the > kexec_image->preserve_context path does syscore_suspend() (and then resume(), so > it's not completely uncharted territory. > > However, there's a rather big wrinkle in that not all of the existing .shutdown() > implementations are obviously ok to call during kexec. Luckily, AFAICT there are > very few users of the syscore .shutdown hook, so it's at least feasible to go that > route. > > x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't > see how leaving #MC reporting enabled across kexec can work. > > ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. > > The interrupt controllers though? x86 disables IRQs at the very beginning of > machine_kexec(), so it's likely fine. But every other architecture? No clue. > E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I > have no idea if that can run afoul of any of the paths below. > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, > arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, > drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, > virt/kvm/kvm_main.c .shutdown = kvm_shutdown, > > The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from > kexec pretty much the same as shutdown for reboot, but other architectures have > what appear to be unique paths for handling kexec. > > FWIW, if we want to go with option (b), syscore_shutdown() hooks could use > kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown 2023-12-11 17:34 ` Sean Christopherson (?) @ 2023-12-11 18:47 ` Gowans, James -1 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 18:47 UTC (permalink / raw) To: pbonzini@redhat.com, Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., ebiederm@xmission.com Cc: yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Mon, 2023-12-11 at 09:34 -0800, Sean Christopherson wrote: > On Sat, Dec 09, 2023, James Gowans wrote: > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking > of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really > should leave virtualization enabled across kexec(), even if leaving virtualization > enabled is relatively benign on other architectures. Good to know. Agreed that clean shutdown in all cases is best and we discard (c). > > One more option would be: > > d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the > kexec_image->preserve_context path does syscore_suspend() (and then resume(), so > it's not completely uncharted territory. > > However, there's a rather big wrinkle in that not all of the existing .shutdown() > implementations are obviously ok to call during kexec. Luckily, AFAICT there are > very few users of the syscore .shutdown hook, so it's at least feasible to go that > route. > > x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't > see how leaving #MC reporting enabled across kexec can work. > > ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. I like your observation here that we probably have other misses like MCE which should be shut down too - that's a hint that adding syscore_shutdown() to kexec is the way to go. > > The interrupt controllers though? x86 disables IRQs at the very beginning of > machine_kexec(), so it's likely fine. But every other architecture? No clue. > E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I > have no idea if that can run afoul of any of the paths below. > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, > arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, > drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, > virt/kvm/kvm_main.c .shutdown = kvm_shutdown, > > The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from > kexec pretty much the same as shutdown for reboot, but other architectures have > what appear to be unique paths for handling kexec. > > FWIW, if we want to go with option (b), syscore_shutdown() hooks could use > kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. Yeah, perhaps that's the best: add syscore_shutdown to kexec and get the callers to handle both cases if necessary. We could get maintainers for all of these drivers to sign off on the change and say whether they need to differentiate between kexec and reboot. Eric, what are your thoughts on this approach? I can try to whip up a patch for this and add the maintainers for all of the drivers. JG _______________________________________________ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 18:47 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 18:47 UTC (permalink / raw) To: pbonzini@redhat.com, Graf (AWS), Alexander, seanjc@google.com, Schönherr, Jan H., ebiederm@xmission.com Cc: yuzenghui@huawei.com, kvm-riscv@lists.infradead.org, kexec@lists.infradead.org, james.morse@arm.com, oliver.upton@linux.dev, suzuki.poulose@arm.com, chenhuacai@kernel.org, atishp@atishpatra.org, linux-kernel@vger.kernel.org, maz@kernel.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org On Mon, 2023-12-11 at 09:34 -0800, Sean Christopherson wrote: > On Sat, Dec 09, 2023, James Gowans wrote: > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking > of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really > should leave virtualization enabled across kexec(), even if leaving virtualization > enabled is relatively benign on other architectures. Good to know. Agreed that clean shutdown in all cases is best and we discard (c). > > One more option would be: > > d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the > kexec_image->preserve_context path does syscore_suspend() (and then resume(), so > it's not completely uncharted territory. > > However, there's a rather big wrinkle in that not all of the existing .shutdown() > implementations are obviously ok to call during kexec. Luckily, AFAICT there are > very few users of the syscore .shutdown hook, so it's at least feasible to go that > route. > > x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't > see how leaving #MC reporting enabled across kexec can work. > > ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. I like your observation here that we probably have other misses like MCE which should be shut down too - that's a hint that adding syscore_shutdown() to kexec is the way to go. > > The interrupt controllers though? x86 disables IRQs at the very beginning of > machine_kexec(), so it's likely fine. But every other architecture? No clue. > E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I > have no idea if that can run afoul of any of the paths below. > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, > arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, > drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, > virt/kvm/kvm_main.c .shutdown = kvm_shutdown, > > The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from > kexec pretty much the same as shutdown for reboot, but other architectures have > what appear to be unique paths for handling kexec. > > FWIW, if we want to go with option (b), syscore_shutdown() hooks could use > kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. Yeah, perhaps that's the best: add syscore_shutdown to kexec and get the callers to handle both cases if necessary. We could get maintainers for all of these drivers to sign off on the change and say whether they need to differentiate between kexec and reboot. Eric, what are your thoughts on this approach? I can try to whip up a patch for this and add the maintainers for all of the drivers. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown @ 2023-12-11 18:47 ` Gowans, James 0 siblings, 0 replies; 43+ messages in thread From: Gowans, James @ 2023-12-11 18:47 UTC (permalink / raw) To: kvm-riscv On Mon, 2023-12-11 at 09:34 -0800, Sean Christopherson wrote: > On Sat, Dec 09, 2023, James Gowans wrote: > > Thoughts on possible ways to fix this: > > a) go back to reboot notifiers > > b) get kexec to call syscore_shutdown() to invoke all of these callbacks > > c) Add a KVM-specific callback to native_machine_shutdown(); we only > > need this for Intel x86, right? > > I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking > of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really > should leave virtualization enabled across kexec(), even if leaving virtualization > enabled is relatively benign on other architectures. Good to know. Agreed that clean shutdown in all cases is best and we discard (c). > > One more option would be: > > d) Add another sycore hook, e.g. syscore_kexec() specifically for this path. > > > My slight preference is towards adding syscore_shutdown() to kexec, but > > I'm not sure that's feasible. Adding kexec maintainers for input. > > In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the > kexec_image->preserve_context path does syscore_suspend() (and then resume(), so > it's not completely uncharted territory. > > However, there's a rather big wrinkle in that not all of the existing .shutdown() > implementations are obviously ok to call during kexec. Luckily, AFAICT there are > very few users of the syscore .shutdown hook, so it's at least feasible to go that > route. > > x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't > see how leaving #MC reporting enabled across kexec can work. > > ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct. I like your observation here that we probably have other misses like MCE which should be shut down too - that's a hint that adding syscore_shutdown() to kexec is the way to go. > > The interrupt controllers though? x86 disables IRQs at the very beginning of > machine_kexec(), so it's likely fine. But every other architecture? No clue. > E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I > have no idea if that can run afoul of any of the paths below. > > arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, > arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, > arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, > drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, > drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, > drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, > kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, > virt/kvm/kvm_main.c .shutdown = kvm_shutdown, > > The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from > kexec pretty much the same as shutdown for reboot, but other architectures have > what appear to be unique paths for handling kexec. > > FWIW, if we want to go with option (b), syscore_shutdown() hooks could use > kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec. Yeah, perhaps that's the best: add syscore_shutdown to kexec and get the callers to handle both cases if necessary. We could get maintainers for all of these drivers to sign off on the change and say whether they need to differentiate between kexec and reboot. Eric, what are your thoughts on this approach? I can try to whip up a patch for this and add the maintainers for all of the drivers. JG ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 2/2] KVM: Don't enable hardware after a restart/shutdown is initiated 2023-05-12 23:31 ` Sean Christopherson @ 2023-05-12 23:31 ` Sean Christopherson -1 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-05-12 23:31 UTC (permalink / raw) To: kvm-riscv Reject hardware enabling, i.e. VM creation, if a restart/shutdown has been initiated to avoid re-enabling hardware between kvm_reboot() and machine_{halt,power_off,restart}(). The restart case is especially problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on SVM) blocks INIT, which results in the restart/reboot hanging as BIOS is unable to wake and rendezvous with APs. Note, this bug, and the original issue that motivated the addition of kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`. In a "normal" reboot, userspace will gracefully teardown userspace before triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process that might do ioctl(KVM_CREATE_VM) is long gone. Fixes: 8e1c18157d87 ("KVM: VMX: Disable VMX when system shutdown") Signed-off-by: Sean Christopherson <seanjc@google.com> --- virt/kvm/kvm_main.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e771b6a013c9..cc36a7fc8a86 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5182,7 +5182,20 @@ static void hardware_disable_all(void) static int hardware_enable_all(void) { atomic_t failed = ATOMIC_INIT(0); - int r = 0; + int r; + + /* + * Do not enable hardware virtualization if the system is going down. + * If userspace initiated a forced reboot, e.g. reboot -f, then it's + * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling + * after kvm_reboot() is called. Note, this relies on system_state + * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops + * hook instead of registering a dedicated reboot notifier (the latter + * runs before system_state is updated). + */ + if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF || + system_state == SYSTEM_RESTART) + return -EBUSY; /* * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu() @@ -5195,6 +5208,8 @@ static int hardware_enable_all(void) cpus_read_lock(); mutex_lock(&kvm_lock); + r = 0; + kvm_usage_count++; if (kvm_usage_count == 1) { on_each_cpu(hardware_enable_nolock, &failed, 1); -- 2.40.1.606.ga4b1b128d6-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 2/2] KVM: Don't enable hardware after a restart/shutdown is initiated @ 2023-05-12 23:31 ` Sean Christopherson 0 siblings, 0 replies; 43+ messages in thread From: Sean Christopherson @ 2023-05-12 23:31 UTC (permalink / raw) To: Paolo Bonzini Cc: kvm, linux-kernel, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, kvmarm, Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra, kvm-riscv, Sean Christopherson Reject hardware enabling, i.e. VM creation, if a restart/shutdown has been initiated to avoid re-enabling hardware between kvm_reboot() and machine_{halt,power_off,restart}(). The restart case is especially problematic (for x86) as enabling VMX (or clearing GIF in KVM_RUN on SVM) blocks INIT, which results in the restart/reboot hanging as BIOS is unable to wake and rendezvous with APs. Note, this bug, and the original issue that motivated the addition of kvm_reboot(), is effectively limited to a forced reboot, e.g. `reboot -f`. In a "normal" reboot, userspace will gracefully teardown userspace before triggering the kernel reboot (modulo bugs, errors, etc), i.e. any process that might do ioctl(KVM_CREATE_VM) is long gone. Fixes: 8e1c18157d87 ("KVM: VMX: Disable VMX when system shutdown") Signed-off-by: Sean Christopherson <seanjc@google.com> --- virt/kvm/kvm_main.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e771b6a013c9..cc36a7fc8a86 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5182,7 +5182,20 @@ static void hardware_disable_all(void) static int hardware_enable_all(void) { atomic_t failed = ATOMIC_INIT(0); - int r = 0; + int r; + + /* + * Do not enable hardware virtualization if the system is going down. + * If userspace initiated a forced reboot, e.g. reboot -f, then it's + * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling + * after kvm_reboot() is called. Note, this relies on system_state + * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops + * hook instead of registering a dedicated reboot notifier (the latter + * runs before system_state is updated). + */ + if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF || + system_state == SYSTEM_RESTART) + return -EBUSY; /* * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu() @@ -5195,6 +5208,8 @@ static int hardware_enable_all(void) cpus_read_lock(); mutex_lock(&kvm_lock); + r = 0; + kvm_usage_count++; if (kvm_usage_count == 1) { on_each_cpu(hardware_enable_nolock, &failed, 1); -- 2.40.1.606.ga4b1b128d6-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling 2023-05-12 23:31 ` Sean Christopherson @ 2023-05-18 7:38 ` Marc Zyngier -1 siblings, 0 replies; 43+ messages in thread From: Marc Zyngier @ 2023-05-18 7:38 UTC (permalink / raw) To: kvm-riscv On Sat, 13 May 2023 00:31:25 +0100, Sean Christopherson <seanjc@google.com> wrote: > > Fix a bug where enabling hardware virtualization can race with a forced > reboot, e.g. `reboot -f`, and result in virt hardware being enabled when > the reboot is attempted, and thus hanging the reboot. > > Found by inspection, confirmed by hacking the reboot flow to wait until > KVM loads (the problematic window is ridiculously small). > > Fully tested only on x86, compile tested on other architectures. > > v2: > - Rename KVM's callback to kvm_shutdown() to match the hook. [Marc] > - Don't add a spurious newline. [Marc] > > v1: https://lore.kernel.org/all/20230310221414.811690-1-seanjc at google.com > > Sean Christopherson (2): > KVM: Use syscore_ops instead of reboot_notifier to hook > restart/shutdown > KVM: Don't enable hardware after a restart/shutdown is initiated > > virt/kvm/kvm_main.c | 43 +++++++++++++++++++++++++++---------------- > 1 file changed, 27 insertions(+), 16 deletions(-) Acked-by: Marc Zyngier <maz@kernel.org> M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling @ 2023-05-18 7:38 ` Marc Zyngier 0 siblings, 0 replies; 43+ messages in thread From: Marc Zyngier @ 2023-05-18 7:38 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, kvm, linux-kernel, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, kvmarm, Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra, kvm-riscv On Sat, 13 May 2023 00:31:25 +0100, Sean Christopherson <seanjc@google.com> wrote: > > Fix a bug where enabling hardware virtualization can race with a forced > reboot, e.g. `reboot -f`, and result in virt hardware being enabled when > the reboot is attempted, and thus hanging the reboot. > > Found by inspection, confirmed by hacking the reboot flow to wait until > KVM loads (the problematic window is ridiculously small). > > Fully tested only on x86, compile tested on other architectures. > > v2: > - Rename KVM's callback to kvm_shutdown() to match the hook. [Marc] > - Don't add a spurious newline. [Marc] > > v1: https://lore.kernel.org/all/20230310221414.811690-1-seanjc@google.com > > Sean Christopherson (2): > KVM: Use syscore_ops instead of reboot_notifier to hook > restart/shutdown > KVM: Don't enable hardware after a restart/shutdown is initiated > > virt/kvm/kvm_main.c | 43 +++++++++++++++++++++++++++---------------- > 1 file changed, 27 insertions(+), 16 deletions(-) Acked-by: Marc Zyngier <maz@kernel.org> M. -- Without deviation from the norm, progress is not possible. ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling 2023-05-12 23:31 ` Sean Christopherson @ 2023-05-19 17:41 ` Paolo Bonzini -1 siblings, 0 replies; 43+ messages in thread From: Paolo Bonzini @ 2023-05-19 17:41 UTC (permalink / raw) To: kvm-riscv On 5/13/23 01:31, Sean Christopherson wrote: > Fix a bug where enabling hardware virtualization can race with a forced > reboot, e.g. `reboot -f`, and result in virt hardware being enabled when > the reboot is attempted, and thus hanging the reboot. > > Found by inspection, confirmed by hacking the reboot flow to wait until > KVM loads (the problematic window is ridiculously small). > > Fully tested only on x86, compile tested on other architectures. > > v2: > - Rename KVM's callback to kvm_shutdown() to match the hook. [Marc] > - Don't add a spurious newline. [Marc] > > v1: https://lore.kernel.org/all/20230310221414.811690-1-seanjc at google.com > > Sean Christopherson (2): > KVM: Use syscore_ops instead of reboot_notifier to hook > restart/shutdown > KVM: Don't enable hardware after a restart/shutdown is initiated > > virt/kvm/kvm_main.c | 43 +++++++++++++++++++++++++++---------------- > 1 file changed, 27 insertions(+), 16 deletions(-) > > > base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac Queued, thanks. Paolo ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling @ 2023-05-19 17:41 ` Paolo Bonzini 0 siblings, 0 replies; 43+ messages in thread From: Paolo Bonzini @ 2023-05-19 17:41 UTC (permalink / raw) To: Sean Christopherson Cc: kvm, linux-kernel, Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu, kvmarm, Huacai Chen, Aleksandar Markovic, Anup Patel, Atish Patra, kvm-riscv On 5/13/23 01:31, Sean Christopherson wrote: > Fix a bug where enabling hardware virtualization can race with a forced > reboot, e.g. `reboot -f`, and result in virt hardware being enabled when > the reboot is attempted, and thus hanging the reboot. > > Found by inspection, confirmed by hacking the reboot flow to wait until > KVM loads (the problematic window is ridiculously small). > > Fully tested only on x86, compile tested on other architectures. > > v2: > - Rename KVM's callback to kvm_shutdown() to match the hook. [Marc] > - Don't add a spurious newline. [Marc] > > v1: https://lore.kernel.org/all/20230310221414.811690-1-seanjc@google.com > > Sean Christopherson (2): > KVM: Use syscore_ops instead of reboot_notifier to hook > restart/shutdown > KVM: Don't enable hardware after a restart/shutdown is initiated > > virt/kvm/kvm_main.c | 43 +++++++++++++++++++++++++++---------------- > 1 file changed, 27 insertions(+), 16 deletions(-) > > > base-commit: b3c98052d46948a8d65d2778c7f306ff38366aac Queued, thanks. Paolo ^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2023-12-13 6:48 UTC | newest] Thread overview: 43+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-12 23:31 [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling Sean Christopherson 2023-05-12 23:31 ` Sean Christopherson 2023-05-12 23:31 ` [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown Sean Christopherson 2023-05-12 23:31 ` Sean Christopherson 2023-12-09 7:26 ` Gowans, James 2023-12-09 7:26 ` Gowans, James 2023-12-09 7:26 ` Gowans, James 2023-12-10 4:53 ` Eric W. Biederman 2023-12-10 4:53 ` Eric W. Biederman 2023-12-10 4:53 ` Eric W. Biederman 2023-12-11 7:54 ` Gowans, James 2023-12-11 7:54 ` Gowans, James 2023-12-11 7:54 ` Gowans, James 2023-12-11 10:27 ` Gowans, James 2023-12-11 10:27 ` Gowans, James 2023-12-11 10:27 ` Gowans, James 2023-12-11 23:50 ` Eric W. Biederman 2023-12-11 23:50 ` Eric W. Biederman 2023-12-11 23:50 ` Eric W. Biederman 2023-12-12 8:50 ` Gowans, James 2023-12-12 8:50 ` Gowans, James 2023-12-12 8:50 ` Gowans, James 2023-12-12 13:38 ` Paolo Bonzini 2023-12-12 13:38 ` Paolo Bonzini 2023-12-12 13:38 ` Paolo Bonzini 2023-12-13 6:47 ` Gowans, James 2023-12-13 6:47 ` Gowans, James 2023-12-13 6:47 ` Gowans, James 2023-12-11 17:34 ` Sean Christopherson 2023-12-11 17:34 ` Sean Christopherson 2023-12-11 17:34 ` Sean Christopherson 2023-12-11 17:51 ` Sean Christopherson 2023-12-11 17:51 ` Sean Christopherson 2023-12-11 17:51 ` Sean Christopherson 2023-12-11 18:47 ` Gowans, James 2023-12-11 18:47 ` Gowans, James 2023-12-11 18:47 ` Gowans, James 2023-05-12 23:31 ` [PATCH v2 2/2] KVM: Don't enable hardware after a restart/shutdown is initiated Sean Christopherson 2023-05-12 23:31 ` Sean Christopherson 2023-05-18 7:38 ` [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling Marc Zyngier 2023-05-18 7:38 ` Marc Zyngier 2023-05-19 17:41 ` Paolo Bonzini 2023-05-19 17:41 ` Paolo Bonzini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.