From: Sean Christopherson <seanjc@google.com>
To: James Gowans <jgowans@amazon.com>
Cc: "pbonzini@redhat.com" <pbonzini@redhat.com>,
"Alexander Graf" <graf@amazon.de>,
"Jan Schönherr" <jschoenh@amazon.de>,
"ebiederm@xmission.com" <ebiederm@xmission.com>,
"yuzenghui@huawei.com" <yuzenghui@huawei.com>,
"atishp@atishpatra.org" <atishp@atishpatra.org>,
"kvm-riscv@lists.infradead.org" <kvm-riscv@lists.infradead.org>,
"james.morse@arm.com" <james.morse@arm.com>,
"suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
"oliver.upton@linux.dev" <oliver.upton@linux.dev>,
"chenhuacai@kernel.org" <chenhuacai@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"kvmarm@lists.linux.dev" <kvmarm@lists.linux.dev>,
"maz@kernel.org" <maz@kernel.org>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"aleksandar.qemu.devel@gmail.com"
<aleksandar.qemu.devel@gmail.com>,
"anup@brainfault.org" <anup@brainfault.org>,
"kexec@lists.infradead.org" <kexec@lists.infradead.org>
Subject: Re: [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown
Date: Mon, 11 Dec 2023 09:34:24 -0800 [thread overview]
Message-ID: <ZXdIIBUXcCIK28ys@google.com> (raw)
In-Reply-To: <cfed942fc767fa7b2fabc68a3357a7b95bd6a589.camel@amazon.com>
On Sat, Dec 09, 2023, James Gowans wrote:
> Hi Sean,
>
> Blast from the past but I've just been bitten by this patch when
> rebasing across v6.4.
>
> On Fri, 2023-05-12 at 16:31 -0700, Sean Christopherson wrote:
> > Use syscore_ops.shutdown to disable hardware virtualization during a
> > reboot instead of using the dedicated reboot_notifier so that KVM disables
> > virtualization _after_ system_state has been updated. This will allow
> > fixing a race in KVM's handling of a forced reboot where KVM can end up
> > enabling hardware virtualization between kernel_restart_prepare() and
> > machine_restart().
>
> The issue is that, AFAICT, the syscore_ops.shutdown are not called when
> doing a kexec. Reboot notifiers are called across kexec via:
>
> kernel_kexec
> kernel_restart_prepare
> blocking_notifier_call_chain
> kvm_reboot
>
> So after this patch, KVM is not shutdown during kexec; if hardware virt
> mode is enabled then the kexec hangs in exactly the same manner as you
> describe with the reboot.
>
> Some specific shutdown callbacks, for example IOMMU, HPET, IRQ, etc are
> called in native_machine_shutdown, but KVM is not one of these.
>
> Thoughts on possible ways to fix this:
> a) go back to reboot notifiers
> b) get kexec to call syscore_shutdown() to invoke all of these callbacks
> c) Add a KVM-specific callback to native_machine_shutdown(); we only
> need this for Intel x86, right?
I don't like (c). VMX is the most sensitive/problematic, e.g. the whole blocking
of INIT thing, but SVM can also run afoul of EFER.SVME being cleared, and KVM really
should leave virtualization enabled across kexec(), even if leaving virtualization
enabled is relatively benign on other architectures.
One more option would be:
d) Add another sycore hook, e.g. syscore_kexec() specifically for this path.
> My slight preference is towards adding syscore_shutdown() to kexec, but
> I'm not sure that's feasible. Adding kexec maintainers for input.
In a vacuum, that'd be my preference too. It's the obvious choice IMO, e.g. the
kexec_image->preserve_context path does syscore_suspend() (and then resume(), so
it's not completely uncharted territory.
However, there's a rather big wrinkle in that not all of the existing .shutdown()
implementations are obviously ok to call during kexec. Luckily, AFAICT there are
very few users of the syscore .shutdown hook, so it's at least feasible to go that
route.
x86's mce_syscore_shutdown() should be ok, and arguably is correct, e.g. I don't
see how leaving #MC reporting enabled across kexec can work.
ledtrig_cpu_syscore_shutdown() is also likely ok and arguably correct.
The interrupt controllers though? x86 disables IRQs at the very beginning of
machine_kexec(), so it's likely fine. But every other architecture? No clue.
E.g. PPC's default_machine_kexec() sends IPIs to shutdown other CPUs, though I
have no idea if that can run afoul of any of the paths below.
arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown,
arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown,
arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown,
drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown,
drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown,
drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown,
drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown,
kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown,
virt/kvm/kvm_main.c .shutdown = kvm_shutdown,
The whole thing is a bit of a mess. E.g. x86 treats machine_shutdown() from
kexec pretty much the same as shutdown for reboot, but other architectures have
what appear to be unique paths for handling kexec.
FWIW, if we want to go with option (b), syscore_shutdown() hooks could use
kexec_in_progress to differentiate between "regular" shutdown/reboot and kexec.
next prev parent reply other threads:[~2023-12-11 17:34 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-12 23:31 [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling Sean Christopherson
2023-05-12 23:31 ` [PATCH v2 1/2] KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown Sean Christopherson
2023-12-09 7:26 ` Gowans, James
2023-12-10 4:53 ` Eric W. Biederman
2023-12-11 7:54 ` Gowans, James
2023-12-11 10:27 ` Gowans, James
2023-12-11 23:50 ` Eric W. Biederman
2023-12-12 8:50 ` Gowans, James
2023-12-12 13:38 ` Paolo Bonzini
2023-12-13 6:47 ` Gowans, James
2023-12-11 17:34 ` Sean Christopherson [this message]
2023-12-11 17:51 ` Sean Christopherson
2023-12-11 18:47 ` Gowans, James
2023-05-12 23:31 ` [PATCH v2 2/2] KVM: Don't enable hardware after a restart/shutdown is initiated Sean Christopherson
2023-05-18 7:38 ` [PATCH v2 0/2] KVM: Fix race between reboot and hardware enabling Marc Zyngier
2023-05-19 17:41 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXdIIBUXcCIK28ys@google.com \
--to=seanjc@google.com \
--cc=aleksandar.qemu.devel@gmail.com \
--cc=anup@brainfault.org \
--cc=atishp@atishpatra.org \
--cc=chenhuacai@kernel.org \
--cc=ebiederm@xmission.com \
--cc=graf@amazon.de \
--cc=james.morse@arm.com \
--cc=jgowans@amazon.com \
--cc=jschoenh@amazon.de \
--cc=kexec@lists.infradead.org \
--cc=kvm-riscv@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=oliver.upton@linux.dev \
--cc=pbonzini@redhat.com \
--cc=suzuki.poulose@arm.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox