From mboxrd@z Thu Jan  1 00:00:00 1970
From: Avi Kivity <avi@redhat.com>
Subject: Re: [RFC] KVM: Fix simultaneous NMIs
Date: Thu, 15 Sep 2011 20:02:32 +0300
Message-ID: <4E722FA8.2030006@redhat.com>
References: <1316097911-16424-1-git-send-email-avi@redhat.com> <4E722140.4070702@siemens.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
To: Jan Kiszka <jan.kiszka@siemens.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:31445 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751441Ab1IORCk (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 15 Sep 2011 13:02:40 -0400
In-Reply-To: <4E722140.4070702@siemens.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 09/15/2011 07:01 PM, Jan Kiszka wrote:
> On 2011-09-15 16:45, Avi Kivity wrote:
> >  If simultaneous NMIs happen, we're supposed to queue the second
> >  and next (collapsing them), but currently we sometimes collapse
> >  the second into the first.
>
> Can you describe the race in a few more details here ("sometimes" sounds
> like "I don't know when" :) )?

In this case it was "I'm in a hurry".

> >
> >   void kvm_inject_nmi(struct kvm_vcpu *vcpu)
> >   {
> >  +	atomic_inc(&vcpu->arch.nmi_pending);
> >   	kvm_make_request(KVM_REQ_EVENT, vcpu);
> >  -	vcpu->arch.nmi_pending = 1;
>
> Does the reordering matter?

I think so.  Suppose the vcpu enters just after kvm_make_request(); it 
sees KVM_REQ_EVENT and clears it, but doesn't see nmi_pending because it 
wasn't set set.  Then comes a kick, the guest is reentered with 
nmi_pending set but KVM_REQ_EVENT clear and sails through the check and 
enters the guest.  The NMI is delayed until the next KVM_REQ_EVENT.

> Do we need barriers?

Yes.

>
> >  @@ -5570,9 +5570,9 @@ static void inject_pending_event(struct kvm_vcpu *vcpu)
> >   	}
> >
> >   	/* try to inject new event if pending */
> >  -	if (vcpu->arch.nmi_pending) {
> >  +	if (atomic_read(&vcpu->arch.nmi_pending)) {
> >   		if (kvm_x86_ops->nmi_allowed(vcpu)) {
> >  -			vcpu->arch.nmi_pending = false;
> >  +			atomic_dec(&vcpu->arch.nmi_pending);
>
> Here we lost NMIs in the past by overwriting nmi_pending while another
> one was already queued, right?

One place, yes.  The other is kvm_inject_nmi() - if the first nmi didn't 
get picked up by the vcpu by the time the second nmi arrives, we lose 
the second nmi.

> >   	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> >   		inject_pending_event(vcpu);
> >
> >   		/* enable NMI/IRQ window open exits if needed */
> >  -		if (nmi_pending)
> >  +		if (atomic_read(&vcpu->arch.nmi_pending)
> >  +		&&  nmi_in_progress(vcpu))
>
> Is nmi_pending&&  !nmi_in_progress possible at all?

Yes, due to NMI-blocked-by-STI.  A really touchy area.

> Is it rather a BUG
> condition?

No.

> If not, what will happen next?

The NMI window will open and we'll inject the NMI.  But I think we have 
a bug here - we should only kvm_collapse_nmis() if an NMI handler was 
indeed running, yet we do it unconditionally.

> >
> >  +static inline void kvm_collapse_pending_nmis(struct kvm_vcpu *vcpu)
> >  +{
> >  +	/* Collapse all NMIs queued while an NMI handler was running to one */
> >  +	if (atomic_read(&vcpu->arch.nmi_pending))
> >  +		atomic_set(&vcpu->arch.nmi_pending, 1);
>
> Is it OK that NMIs injected after the collapse will increment this to>
> 1 again? Or is that impossible?
>

It's possible and okay.  We're now completing execution of IRET.  Doing 
atomic_set() after atomic_inc() means the NMI happened before IRET 
completed, and vice versa.  Since these events are asynchronous, we're 
free to choose one or the other (a self-IPI-NMI just before the IRET 
must be swallowed, and a self-IPI-NMI just after the IRET would only be 
executed after the next time around the handler).


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.