public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Liran Alon <liran.alon@oracle.com>
Cc: Jinpu Wang <jinpu.wang@cloud.ionos.com>, kvm@vger.kernel.org
Subject: Re: Broadwell server reboot with vmx: unexpected exit reason 0x3
Date: Wed, 2 Oct 2019 10:29:44 -0700	[thread overview]
Message-ID: <20191002172943.GG9615@linux.intel.com> (raw)
In-Reply-To: <DDC3DE27-46A3-4CB4-9AB8-C3C2F1D54777@oracle.com>

On Mon, Sep 30, 2019 at 01:48:15PM +0300, Liran Alon wrote:
> 
> > On 30 Sep 2019, at 11:43, Jinpu Wang <jinpu.wang@cloud.ionos.com> wrote:
> > 
> > Dear KVM experts,
> > 
> > We have a Broadwell server reboot itself recently, before the reboot,
> > there were error messages from KVM in netconsole:
> > [5599380.317055] kvm [9046]: vcpu1, guest rIP: 0xffffffff816ad716 vmx:
> > unexpected exit reason 0x3
> > [5599380.317060] kvm [49626]: vcpu0, guest rIP: 0xffffffff81060fe6
> > vmx: unexpected exit reason 0x3
> > [5599380.317062] kvm [36632]: vcpu0, guest rIP: 0xffffffff8103970d
> > vmx: unexpected exit reason 0x3
> > [5599380.317064] kvm [9620]: vcpu1, guest rIP: 0xffffffffb6c1b08e vmx:
> > unexpected exit reason 0x3
> > [5599380.317067] kvm [49925]: vcpu5, guest rIP: 0xffffffff9b406ea2
> > vmx: unexpected exit reason 0x3
> > [5599380.317068] kvm [49925]: vcpu3, guest rIP: 0xffffffff9b406ea2
> > vmx: unexpected exit reason 0x3
> > [5599380.317070] kvm [33871]: vcpu2, guest rIP: 0xffffffff81060fe6
> > vmx: unexpected exit reason 0x3
> > [5599380.317072] kvm [49925]: vcpu4, guest rIP: 0xffffffff9b406ea2
> > vmx: unexpected exit reason 0x3
> > [5599380.317074] kvm [48505]: vcpu1, guest rIP: 0xffffffffaf36bf9b
> > vmx: unexpected exit reason 0x3
> > [5599380.317076] kvm [21880]: vcpu1, guest rIP: 0xffffffff8103970d
> > vmx: unexpected exit reason 0x3
> 
> The only way a CPU will raise this exit-reason (3 == EXIT_REASON_INIT_SIGNAL)
> is if CPU is in VMX non-root mode while it has a pending INIT signal in LAPIC.
> 
> In simple terms, it means that one CPU was running inside guest while
> another CPU have sent it a signal to reset itself.
> 
> I see in code that kvm_init() does register_reboot_notifier(&kvm_reboot_notifier).
> kvm_reboot() runs hardware_disable_nolock() on each CPU before reboot.
> Which should result on every CPU running VMX’s hardware_disable() which should
> exit VMX operation (VMXOFF) and disable VMX (Clear CR4.VMXE).
> 
> Therefore, I’m quite puzzled on how a server reboot triggers the scenario you
> present here.  Can you send your full kernel log?

My guess is that the system triggered an emergency reboot and was either
unable to force CPUs out of VMX non-root with NMIs, hit a triple fault
shutdown and auto-generated INITs before it could shootdown the other
CPUs, or didn't even attempt the NMI because VMX wasn't enabled on the
CPU that triggered reboot.

In arch/x86/kernel/reboot.c:

/* Use NMIs as IPIs to tell all CPUs to disable virtualization */
static void emergency_vmx_disable_all(void)
{
	/* Just make sure we won't change CPUs while doing this */
	local_irq_disable();

	/*
	 * We need to disable VMX on all CPUs before rebooting, otherwise
	 * we risk hanging up the machine, because the CPU ignore INIT
	 * signals when VMX is enabled.
	 *
	 * We can't take any locks and we may be on an inconsistent
	 * state, so we use NMIs as IPIs to tell the other CPUs to disable
	 * VMX and halt.
	 *
	 * For safety, we will avoid running the nmi_shootdown_cpus()
	 * stuff unnecessarily, but we don't have a way to check
	 * if other CPUs have VMX enabled. So we will call it only if the
	 * CPU we are running on has VMX enabled.
	 *
	 * We will miss cases where VMX is not enabled on all CPUs. This
	 * shouldn't do much harm because KVM always enable VMX on all
	 * CPUs anyway. But we can miss it on the small window where KVM
	 * is still enabling VMX.
	 */
	if (cpu_has_vmx() && cpu_vmx_enabled()) {
		/* Disable VMX on this CPU. */
		cpu_vmxoff();

		/* Halt and disable VMX on the other CPUs */
		nmi_shootdown_cpus(vmxoff_nmi);

	}
}

static void native_machine_emergency_restart(void)
{
	...

	if (reboot_emergency)
		emergency_vmx_disable_all();
}


  reply	other threads:[~2019-10-02 17:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-30  8:43 Broadwell server reboot with vmx: unexpected exit reason 0x3 Jinpu Wang
2019-09-30 10:48 ` Liran Alon
2019-10-02 17:29   ` Sean Christopherson [this message]
2019-10-04  8:53     ` Jinpu Wang
2019-10-17 18:52       ` Sean Christopherson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191002172943.GG9615@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=jinpu.wang@cloud.ionos.com \
    --cc=kvm@vger.kernel.org \
    --cc=liran.alon@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox