* Broadwell server reboot with vmx: unexpected exit reason 0x3 @ 2019-09-30 8:43 Jinpu Wang 2019-09-30 10:48 ` Liran Alon 0 siblings, 1 reply; 5+ messages in thread From: Jinpu Wang @ 2019-09-30 8:43 UTC (permalink / raw) To: kvm Dear KVM experts, We have a Broadwell server reboot itself recently, before the reboot, there were error messages from KVM in netconsole: [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx: unexpected exit reason 0x3 [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6 vmx: unexpected exit reason 0x3 [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d vmx: unexpected exit reason 0x3 [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx: unexpected exit reason 0x3 [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2 vmx: unexpected exit reason 0x3 [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2 vmx: unexpected exit reason 0x3 [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6 vmx: unexpected exit reason 0x3 [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2 vmx: unexpected exit reason 0x3 [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b vmx: unexpected exit reason 0x3 [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d vmx: unexpected exit reason 0x3 Kernel version is: 4.14.129 CPU is Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz There is no crashdump generated, only above message right before server reboot. Anyone has an idea, what could cause the reboot? is there a known problem in this regards? I notice EXIT_REASON_INIT_SIGNAL(3) is introduced recently, is it related? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/x86/kvm?id=4b9852f4f38909a9ca74e71afb35aafba0871aa1 Regards, Jinpu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3 2019-09-30 8:43 Broadwell server reboot with vmx: unexpected exit reason 0x3 Jinpu Wang @ 2019-09-30 10:48 ` Liran Alon 2019-10-02 17:29 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: Liran Alon @ 2019-09-30 10:48 UTC (permalink / raw) To: Jinpu Wang; +Cc: kvm > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote: > > Dear KVM experts, > > We have a Broadwell server reboot itself recently, before the reboot, > there were error messages from KVM in netconsole: > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx: > unexpected exit reason 0x3 > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6 > vmx: unexpected exit reason 0x3 > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d > vmx: unexpected exit reason 0x3 > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx: > unexpected exit reason 0x3 > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2 > vmx: unexpected exit reason 0x3 > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2 > vmx: unexpected exit reason 0x3 > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6 > vmx: unexpected exit reason 0x3 > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2 > vmx: unexpected exit reason 0x3 > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b > vmx: unexpected exit reason 0x3 > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d > vmx: unexpected exit reason 0x3 The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL) is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC. In simple terms, it means that one CPU was running inside guest while another CPU have sent it a signal to reset itself. I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier). kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot. Which should result on every CPU running VMX’s hardware_disable() which should exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE). Therefore, I’m quite puzzled on how a server reboot triggers the scenario you present here. Can you send your full kernel log? > > Kernel version is: 4.14.129 > CPU is Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz > There is no crashdump generated, only above message right before server reboot. > > Anyone has an idea, what could cause the reboot? is there a known > problem in this regards? > > I notice EXIT_REASON_INIT_SIGNAL(3) is introduced recently, is it related? > https://urldefense.proofpoint.com/v2/url?u=https-3A__git.kernel.org_pub_scm_linux_kernel_git_torvalds_linux.git_commit_arch_x86_kvm-3Fid-3D4b9852f4f38909a9ca74e71afb35aafba0871aa1&d=DwIBaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=3JMSVEOhF1eCpny7VowcBwzScGDxjUkUZpipoP8Hlqw&s=war3Qw8cey9BewvAWmnGQdx3TY7EnL6O5aUkrg3FQUg&e= As the author of this commit, this shouldn’t be related. i.e. It won’t help you to apply this commit to your kernel. That commit changes the handling of *virtual* INIT signals inside guest. What you are seeing here are exits which results from a *physical* INIT signal while CPU was in guest. -Liran > > Regards, > Jinpu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3 2019-09-30 10:48 ` Liran Alon @ 2019-10-02 17:29 ` Sean Christopherson 2019-10-04 8:53 ` Jinpu Wang 0 siblings, 1 reply; 5+ messages in thread From: Sean Christopherson @ 2019-10-02 17:29 UTC (permalink / raw) To: Liran Alon; +Cc: Jinpu Wang, kvm On Mon, Sep 30, 2019 at 01:48:15PM +0300, Liran Alon wrote: > > > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote: > > > > Dear KVM experts, > > > > We have a Broadwell server reboot itself recently, before the reboot, > > there were error messages from KVM in netconsole: > > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx: > > unexpected exit reason 0x3 > > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6 > > vmx: unexpected exit reason 0x3 > > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d > > vmx: unexpected exit reason 0x3 > > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx: > > unexpected exit reason 0x3 > > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2 > > vmx: unexpected exit reason 0x3 > > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2 > > vmx: unexpected exit reason 0x3 > > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6 > > vmx: unexpected exit reason 0x3 > > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2 > > vmx: unexpected exit reason 0x3 > > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b > > vmx: unexpected exit reason 0x3 > > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d > > vmx: unexpected exit reason 0x3 > > The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL) > is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC. > > In simple terms, it means that one CPU was running inside guest while > another CPU have sent it a signal to reset itself. > > I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier). > kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot. > Which should result on every CPU running VMX’s hardware_disable() which should > exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE). > > Therefore, I’m quite puzzled on how a server reboot triggers the scenario you > present here. Can you send your full kernel log? My guess is that the system triggered an emergency reboot and was either unable to force CPUs out of VMX non-root with NMIs, hit a triple fault shutdown and auto-generated INITs before it could shootdown the other CPUs, or didn't even attempt the NMI because VMX wasn't enabled on the CPU that triggered reboot. In arch/x86/kernel/reboot.c: /* Use NMIs as IPIs to tell all CPUs to disable virtualization */ static void emergency_vmx_disable_all(void) { /* Just make sure we won't change CPUs while doing this */ local_irq_disable(); /* * We need to disable VMX on all CPUs before rebooting, otherwise * we risk hanging up the machine, because the CPU ignore INIT * signals when VMX is enabled. * * We can't take any locks and we may be on an inconsistent * state, so we use NMIs as IPIs to tell the other CPUs to disable * VMX and halt. * * For safety, we will avoid running the nmi_shootdown_cpus() * stuff unnecessarily, but we don't have a way to check * if other CPUs have VMX enabled. So we will call it only if the * CPU we are running on has VMX enabled. * * We will miss cases where VMX is not enabled on all CPUs. This * shouldn't do much harm because KVM always enable VMX on all * CPUs anyway. But we can miss it on the small window where KVM * is still enabling VMX. */ if (cpu_has_vmx() && cpu_vmx_enabled()) { /* Disable VMX on this CPU. */ cpu_vmxoff(); /* Halt and disable VMX on the other CPUs */ nmi_shootdown_cpus(vmxoff_nmi); } } static void native_machine_emergency_restart(void) { ... if (reboot_emergency) emergency_vmx_disable_all(); } ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3 2019-10-02 17:29 ` Sean Christopherson @ 2019-10-04 8:53 ` Jinpu Wang 2019-10-17 18:52 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: Jinpu Wang @ 2019-10-04 8:53 UTC (permalink / raw) To: Sean Christopherson; +Cc: Liran Alon, kvm On Wed, Oct 2, 2019 at 7:29 PM Sean Christopherson <sean.j.christopherson@intel.com> wrote: > > On Mon, Sep 30, 2019 at 01:48:15PM +0300, Liran Alon wrote: > > > > > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote: > > > > > > Dear KVM experts, > > > > > > We have a Broadwell server reboot itself recently, before the reboot, > > > there were error messages from KVM in netconsole: > > > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx: > > > unexpected exit reason 0x3 > > > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6 > > > vmx: unexpected exit reason 0x3 > > > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d > > > vmx: unexpected exit reason 0x3 > > > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx: > > > unexpected exit reason 0x3 > > > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2 > > > vmx: unexpected exit reason 0x3 > > > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2 > > > vmx: unexpected exit reason 0x3 > > > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6 > > > vmx: unexpected exit reason 0x3 > > > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2 > > > vmx: unexpected exit reason 0x3 > > > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b > > > vmx: unexpected exit reason 0x3 > > > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d > > > vmx: unexpected exit reason 0x3 > > > > The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL) > > is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC. > > > > In simple terms, it means that one CPU was running inside guest while > > another CPU have sent it a signal to reset itself. > > > > I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier). > > kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot. > > Which should result on every CPU running VMX’s hardware_disable() which should > > exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE). > > > > Therefore, I’m quite puzzled on how a server reboot triggers the scenario you > > present here. Can you send your full kernel log? > > My guess is that the system triggered an emergency reboot and was either > unable to force CPUs out of VMX non-root with NMIs, hit a triple fault > shutdown and auto-generated INITs before it could shootdown the other > CPUs, or didn't even attempt the NMI because VMX wasn't enabled on the > CPU that triggered reboot. > > In arch/x86/kernel/reboot.c: > > /* Use NMIs as IPIs to tell all CPUs to disable virtualization */ > static void emergency_vmx_disable_all(void) > { > /* Just make sure we won't change CPUs while doing this */ > local_irq_disable(); > > /* > * We need to disable VMX on all CPUs before rebooting, otherwise > * we risk hanging up the machine, because the CPU ignore INIT > * signals when VMX is enabled. > * > * We can't take any locks and we may be on an inconsistent > * state, so we use NMIs as IPIs to tell the other CPUs to disable > * VMX and halt. > * > * For safety, we will avoid running the nmi_shootdown_cpus() > * stuff unnecessarily, but we don't have a way to check > * if other CPUs have VMX enabled. So we will call it only if the > * CPU we are running on has VMX enabled. > * > * We will miss cases where VMX is not enabled on all CPUs. This > * shouldn't do much harm because KVM always enable VMX on all > * CPUs anyway. But we can miss it on the small window where KVM > * is still enabling VMX. > */ > if (cpu_has_vmx() && cpu_vmx_enabled()) { > /* Disable VMX on this CPU. */ > cpu_vmxoff(); > > /* Halt and disable VMX on the other CPUs */ > nmi_shootdown_cpus(vmxoff_nmi); > > } > } > > static void native_machine_emergency_restart(void) > { > ... > > if (reboot_emergency) > emergency_vmx_disable_all(); > } > Thanks for the information, Sean, I checked the call path for emergency_restart, I would expect there should be a kernel message to indicate the reason why it has to do the emergency_restart, but there is nothing logged in netconsole or kernel log. I don't understand. Do you have a guess what could cause the system to trigger an emergency reboot? Regards, Jinpu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Broadwell server reboot with vmx: unexpected exit reason 0x3 2019-10-04 8:53 ` Jinpu Wang @ 2019-10-17 18:52 ` Sean Christopherson 0 siblings, 0 replies; 5+ messages in thread From: Sean Christopherson @ 2019-10-17 18:52 UTC (permalink / raw) To: Jinpu Wang; +Cc: Liran Alon, kvm On Fri, Oct 04, 2019 at 10:53:40AM +0200, Jinpu Wang wrote: > On Wed, Oct 2, 2019 at 7:29 PM Sean Christopherson > <sean.j.christopherson@intel.com> wrote: > > > > On Mon, Sep 30, 2019 at 01:48:15PM +0300, Liran Alon wrote: > > > > > > > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote: > > > > > > > > Dear KVM experts, > > > > > > > > We have a Broadwell server reboot itself recently, before the reboot, > > > > there were error messages from KVM in netconsole: > > > > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx: > > > > unexpected exit reason 0x3 > > > > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6 > > > > vmx: unexpected exit reason 0x3 > > > > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d > > > > vmx: unexpected exit reason 0x3 > > > > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx: > > > > unexpected exit reason 0x3 > > > > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2 > > > > vmx: unexpected exit reason 0x3 > > > > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2 > > > > vmx: unexpected exit reason 0x3 > > > > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6 > > > > vmx: unexpected exit reason 0x3 > > > > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2 > > > > vmx: unexpected exit reason 0x3 > > > > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b > > > > vmx: unexpected exit reason 0x3 > > > > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d > > > > vmx: unexpected exit reason 0x3 > > > > > > The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL) > > > is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC. > > > > > > In simple terms, it means that one CPU was running inside guest while > > > another CPU have sent it a signal to reset itself. > > > > > > I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier). > > > kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot. > > > Which should result on every CPU running VMX’s hardware_disable() which should > > > exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE). > > > > > > Therefore, I’m quite puzzled on how a server reboot triggers the scenario you > > > present here. Can you send your full kernel log? > > > > My guess is that the system triggered an emergency reboot and was either > > unable to force CPUs out of VMX non-root with NMIs, hit a triple fault > > shutdown and auto-generated INITs before it could shootdown the other > > CPUs, or didn't even attempt the NMI because VMX wasn't enabled on the > > CPU that triggered reboot. > > > > In arch/x86/kernel/reboot.c: > > > > /* Use NMIs as IPIs to tell all CPUs to disable virtualization */ > > static void emergency_vmx_disable_all(void) > > { > > /* Just make sure we won't change CPUs while doing this */ > > local_irq_disable(); > > > > /* > > * We need to disable VMX on all CPUs before rebooting, otherwise > > * we risk hanging up the machine, because the CPU ignore INIT > > * signals when VMX is enabled. > > * > > * We can't take any locks and we may be on an inconsistent > > * state, so we use NMIs as IPIs to tell the other CPUs to disable > > * VMX and halt. > > * > > * For safety, we will avoid running the nmi_shootdown_cpus() > > * stuff unnecessarily, but we don't have a way to check > > * if other CPUs have VMX enabled. So we will call it only if the > > * CPU we are running on has VMX enabled. > > * > > * We will miss cases where VMX is not enabled on all CPUs. This > > * shouldn't do much harm because KVM always enable VMX on all > > * CPUs anyway. But we can miss it on the small window where KVM > > * is still enabling VMX. > > */ > > if (cpu_has_vmx() && cpu_vmx_enabled()) { > > /* Disable VMX on this CPU. */ > > cpu_vmxoff(); > > > > /* Halt and disable VMX on the other CPUs */ > > nmi_shootdown_cpus(vmxoff_nmi); > > > > } > > } > > > > static void native_machine_emergency_restart(void) > > { > > ... > > > > if (reboot_emergency) > > emergency_vmx_disable_all(); > > } > > > Thanks for the information, Sean, I checked the call path for > emergency_restart, I would expect there should be a kernel message > to indicate the reason why it has to do the emergency_restart, but > there is nothing logged in netconsole or kernel log. I don't > understand. > > Do you have a guess what could cause the system to trigger an emergency reboot? Not really. The emergency reboot thing itself is a guess. Sorry :-( ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-10-17 18:52 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-09-30 8:43 Broadwell server reboot with vmx: unexpected exit reason 0x3 Jinpu Wang 2019-09-30 10:48 ` Liran Alon 2019-10-02 17:29 ` Sean Christopherson 2019-10-04 8:53 ` Jinpu Wang 2019-10-17 18:52 ` Sean Christopherson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox