kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] KVM: Fix race in apic->pending_events processing
@ 2013-05-26 13:00 Gleb Natapov
  2013-05-28 10:56 ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-26 13:00 UTC (permalink / raw)
  To: kvm; +Cc: pbonzini, Jan Kiszka

apic->pending_events processing has a race that may cause INIT and SIPI
processing to be reordered:

vpu0:                            vcpu1:
set INIT
                               test_and_clear_bit(KVM_APIC_INIT)
                                  process INIT
set INIT
set SIPI
                               test_and_clear_bit(KVM_APIC_SIPI)
                                  process SIPI

At the and INIT is left pending in pending_events. The following patch
tries to fix this using the fact that if INIT comes after SIPI it drops
SIPI from the pending_events, so if pending_events is different after
SIPI is processed it means that INIT was issued after SIPI otherwise
all pending event are processed and pending_events can be reset to zero.
 
Signed-off-by: Gleb Natapov <gleb@redhat.com>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9d75193..67686b8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1850,6 +1850,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	unsigned int sipi_vector;
+	unsigned long pe;
 
 	if (!kvm_vcpu_has_lapic(vcpu))
 		return;
@@ -1862,7 +1863,8 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 		else
 			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
 	}
-	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
+	pe = apic->pending_events;
+	if (test_bit(KVM_APIC_SIPI, &pe) &&
 	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
 		/* evaluate pending_events before reading the vector */
 		smp_rmb();
@@ -1871,6 +1873,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 			 vcpu->vcpu_id, sipi_vector);
 		kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
 		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
+		cmpxchg(&apic->pending_events, pe, 0);
 	}
 }
 
--
			Gleb.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-26 13:00 [PATCH RFC] KVM: Fix race in apic->pending_events processing Gleb Natapov
@ 2013-05-28 10:56 ` Paolo Bonzini
  2013-05-28 12:56   ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-28 10:56 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 26/05/2013 15:00, Gleb Natapov ha scritto:
> apic->pending_events processing has a race that may cause INIT and SIPI
> processing to be reordered:
> 
> vpu0:                            vcpu1:
> set INIT
>                                test_and_clear_bit(KVM_APIC_INIT)
>                                   process INIT
> set INIT
> set SIPI
>                                test_and_clear_bit(KVM_APIC_SIPI)
>                                   process SIPI
> 
> At the and INIT is left pending in pending_events. The following patch
> tries to fix this using the fact that if INIT comes after SIPI it drops
> SIPI from the pending_events, so if pending_events is different after
> SIPI is processed it means that INIT was issued after SIPI otherwise
> all pending event are processed and pending_events can be reset to zero.

The patch looks correct, but is there any reason to do the cmpxchg at
the end?

That is, would this have any problem?  It seems a bit simpler:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e1adbb4..3fe00fc 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1873,7 +1873,11 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 		else
 			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
 	}
-	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
+	/*
+	 * Note that we may get another INIT+SIPI sequence right here; process
+	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
+	 */
+	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
 	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
 		/* evaluate pending_events before reading the vector */
 		smp_rmb();

(Even if we go with your patch, it deserves a comment in the code).

Paolo

> Signed-off-by: Gleb Natapov <gleb@redhat.com>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9d75193..67686b8 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1850,6 +1850,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	unsigned int sipi_vector;
> +	unsigned long pe;
>  
>  	if (!kvm_vcpu_has_lapic(vcpu))
>  		return;
> @@ -1862,7 +1863,8 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  		else
>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>  	}
> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> +	pe = apic->pending_events;
> +	if (test_bit(KVM_APIC_SIPI, &pe) &&
>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>  		/* evaluate pending_events before reading the vector */
>  		smp_rmb();
> @@ -1871,6 +1873,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  			 vcpu->vcpu_id, sipi_vector);
>  		kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
>  		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> +		cmpxchg(&apic->pending_events, pe, 0);
>  	}
>  }
>  
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-28 10:56 ` Paolo Bonzini
@ 2013-05-28 12:56   ` Gleb Natapov
  2013-05-28 13:48     ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-28 12:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Tue, May 28, 2013 at 12:56:19PM +0200, Paolo Bonzini wrote:
> Il 26/05/2013 15:00, Gleb Natapov ha scritto:
> > apic->pending_events processing has a race that may cause INIT and SIPI
> > processing to be reordered:
> > 
> > vpu0:                            vcpu1:
> > set INIT
> >                                test_and_clear_bit(KVM_APIC_INIT)
> >                                   process INIT
> > set INIT
> > set SIPI
> >                                test_and_clear_bit(KVM_APIC_SIPI)
> >                                   process SIPI
> > 
> > At the and INIT is left pending in pending_events. The following patch
> > tries to fix this using the fact that if INIT comes after SIPI it drops
> > SIPI from the pending_events, so if pending_events is different after
> > SIPI is processed it means that INIT was issued after SIPI otherwise
> > all pending event are processed and pending_events can be reset to zero.
> 
> The patch looks correct, but is there any reason to do the cmpxchg at
> the end?
> 
> That is, would this have any problem?  It seems a bit simpler:
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e1adbb4..3fe00fc 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1873,7 +1873,11 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  		else
>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>  	}
> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> +	/*
> +	 * Note that we may get another INIT+SIPI sequence right here; process
> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> +	 */
> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
Because pending_events can be INIT/SIPI at this point and it should be
interpreted as: do SIPI and ignore INIT (atomically).

>  		/* evaluate pending_events before reading the vector */
>  		smp_rmb();
> 
> (Even if we go with your patch, it deserves a comment in the code).
> 
We can move explanation from the commit message to a coment.

> Paolo
> 
> > Signed-off-by: Gleb Natapov <gleb@redhat.com>
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 9d75193..67686b8 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1850,6 +1850,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> >  {
> >  	struct kvm_lapic *apic = vcpu->arch.apic;
> >  	unsigned int sipi_vector;
> > +	unsigned long pe;
> >  
> >  	if (!kvm_vcpu_has_lapic(vcpu))
> >  		return;
> > @@ -1862,7 +1863,8 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> >  		else
> >  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >  	}
> > -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> > +	pe = apic->pending_events;
> > +	if (test_bit(KVM_APIC_SIPI, &pe) &&
> >  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >  		/* evaluate pending_events before reading the vector */
> >  		smp_rmb();
> > @@ -1871,6 +1873,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> >  			 vcpu->vcpu_id, sipi_vector);
> >  		kvm_vcpu_deliver_sipi_vector(vcpu, sipi_vector);
> >  		vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
> > +		cmpxchg(&apic->pending_events, pe, 0);
> >  	}
> >  }
> >  
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-28 12:56   ` Gleb Natapov
@ 2013-05-28 13:48     ` Paolo Bonzini
  2013-05-28 15:00       ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-28 13:48 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>> >  		else
>> >  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>> >  	}
>> > -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>> > +	/*
>> > +	 * Note that we may get another INIT+SIPI sequence right here; process
>> > +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>> > +	 */
>> > +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>> >  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> Because pending_events can be INIT/SIPI at this point and it should be
> interpreted as: do SIPI and ignore INIT (atomically).

My patch does "do another INIT (which will have no effect) and do SIPI 
after that INIT", which is different but has almost the same effect.  
If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
the next iteration of kvm_apic_accept_events do both.  The difference 
would be that in a carefully-timed sequence of interrupts

    INIT-INIT-SIPI-INIT-SIPI

your version would do many SIPIs, while mine would do just one.

Hmm... there is a reference to this in 25.2 "Other causes of VM exits": 
"If a logical processor is in the wait-for-SIPI state, INIT signals are 
blocked.  They do not cause VM exits in this case."  It is not for the 
physical processor, but it makes sense to have the same thing.  Is this 
the reason why you did the cmpxchg at the end?

But then, there's another way to mask INITs in the wait-for-SIPI 
state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
wait-for-SIPI, you can do:

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e1adbb4..36bc308 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -720,8 +720,12 @@ out:
 		break;
 
 	case APIC_DM_INIT:
-		if (!trig_mode || level) {
+		if ((!trig_mode || level) &&
+		    vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED) {
 			result = 1;
+
+			/* check mp_state before writing apic->pending_events */
+			smp_mb();
 			/* assumes that there are only KVM_APIC_INIT/SIPI */
 			apic->pending_events = (1UL << KVM_APIC_INIT);
 			/* make sure pending_events is visible before sending
@@ -1865,13 +1869,17 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 	if (!kvm_vcpu_has_lapic(vcpu))
 		return;
 
-	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
+	if (test_bit(KVM_APIC_INIT, &apic->pending_events)) {
 		kvm_lapic_reset(vcpu);
 		kvm_vcpu_reset(vcpu);
 		if (kvm_vcpu_is_bsp(apic->vcpu))
 			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
 		else
 			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
+
+		/* write mp_state before toggling KVM_APIC_INIT */
+		smp_mb__before_clear_bit();
+		clear_bit(KVM_APIC_INIT, &apic->pending_events);
 	}
 	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
 	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {

I don't have a particular preference.

Paolo

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-28 13:48     ` Paolo Bonzini
@ 2013-05-28 15:00       ` Gleb Natapov
  2013-05-28 16:33         ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-28 15:00 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >> >  		else
> >> >  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >> >  	}
> >> > -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >> > +	/*
> >> > +	 * Note that we may get another INIT+SIPI sequence right here; process
> >> > +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >> > +	 */
> >> > +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >> >  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> > Because pending_events can be INIT/SIPI at this point and it should be
> > interpreted as: do SIPI and ignore INIT (atomically).
> 
> My patch does "do another INIT (which will have no effect) and do SIPI 
> after that INIT", which is different but has almost the same effect.  
> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> the next iteration of kvm_apic_accept_events do both.  The difference 
> would be that in a carefully-timed sequence of interrupts
> 
You assume that the next processing will actually happen, but this is
not necessary the case.

>     INIT-INIT-SIPI-INIT-SIPI
> 
> your version would do many SIPIs, while mine would do just one.
> 
> Hmm... there is a reference to this in 25.2 "Other causes of VM exits": 
> "If a logical processor is in the wait-for-SIPI state, INIT signals are 
> blocked.  They do not cause VM exits in this case."  It is not for the 
> physical processor, but it makes sense to have the same thing.  Is this 
> the reason why you did the cmpxchg at the end?
> 
No, but if helps us model proper behaviour for nested guest +1 to it :)
But until we handle INIT in nested virt it is not something that
dictates the solution.

> But then, there's another way to mask INITs in the wait-for-SIPI 
> state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
> wait-for-SIPI, you can do:
> 
Haven't checked it for races (especially races between multiple CPUS
sending INIT), but looks more complicated to me.

> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index e1adbb4..36bc308 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -720,8 +720,12 @@ out:
>  		break;
>  
>  	case APIC_DM_INIT:
> -		if (!trig_mode || level) {
> +		if ((!trig_mode || level) &&
> +		    vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED) {
>  			result = 1;
> +
> +			/* check mp_state before writing apic->pending_events */
> +			smp_mb();
>  			/* assumes that there are only KVM_APIC_INIT/SIPI */
>  			apic->pending_events = (1UL << KVM_APIC_INIT);
>  			/* make sure pending_events is visible before sending
> @@ -1865,13 +1869,17 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  	if (!kvm_vcpu_has_lapic(vcpu))
>  		return;
>  
> -	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> +	if (test_bit(KVM_APIC_INIT, &apic->pending_events)) {
>  		kvm_lapic_reset(vcpu);
>  		kvm_vcpu_reset(vcpu);
>  		if (kvm_vcpu_is_bsp(apic->vcpu))
>  			vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE;
>  		else
>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> +
> +		/* write mp_state before toggling KVM_APIC_INIT */
> +		smp_mb__before_clear_bit();
> +		clear_bit(KVM_APIC_INIT, &apic->pending_events);
>  	}
>  	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> 
> I don't have a particular preference.
> 
> Paolo

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-28 15:00       ` Gleb Natapov
@ 2013-05-28 16:33         ` Paolo Bonzini
  2013-05-30  1:20           ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-28 16:33 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>  		else
>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>  	}
>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>> +	/*
>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>> +	 */
>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>> Because pending_events can be INIT/SIPI at this point and it should be
>>> interpreted as: do SIPI and ignore INIT (atomically).
>>
>> My patch does "do another INIT (which will have no effect) and do SIPI 
>> after that INIT", which is different but has almost the same effect.  
>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>> the next iteration of kvm_apic_accept_events do both.  The difference 
>> would be that in a carefully-timed sequence of interrupts
>>
> You assume that the next processing will actually happen, but this is
> not necessary the case.

Why not?  The INIT and SIPI that have just been sent have kicked the
VCPU again.

>> But then, there's another way to mask INITs in the wait-for-SIPI 
>> state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
>> wait-for-SIPI, you can do:
>>
> Haven't checked it for races (especially races between multiple CPUS
> sending INIT), but looks more complicated to me.

Ok, let's go with yours.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-28 16:33         ` Paolo Bonzini
@ 2013-05-30  1:20           ` Gleb Natapov
  2013-05-30  5:41             ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-30  1:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> > On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>  		else
> >>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>  	}
> >>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>> +	/*
> >>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>> +	 */
> >>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>> Because pending_events can be INIT/SIPI at this point and it should be
> >>> interpreted as: do SIPI and ignore INIT (atomically).
> >>
> >> My patch does "do another INIT (which will have no effect) and do SIPI 
> >> after that INIT", which is different but has almost the same effect.  
> >> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >> the next iteration of kvm_apic_accept_events do both.  The difference 
> >> would be that in a carefully-timed sequence of interrupts
> >>
> > You assume that the next processing will actually happen, but this is
> > not necessary the case.
> 
> Why not?  The INIT and SIPI that have just been sent have kicked the
> VCPU again.
> 
kick is a nop if vcpu thread is not in a halt or in a guest.

> >> But then, there's another way to mask INITs in the wait-for-SIPI 
> >> state.  Considering that KVM_MP_STATE_INIT_RECEIVED is really a 
> >> wait-for-SIPI, you can do:
> >>
> > Haven't checked it for races (especially races between multiple CPUS
> > sending INIT), but looks more complicated to me.
> 
> Ok, let's go with yours.
> 
> Paolo

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30  1:20           ` Gleb Natapov
@ 2013-05-30  5:41             ` Paolo Bonzini
  2013-05-30  6:01               ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-30  5:41 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>>>  		else
>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>>>  	}
>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>>>> +	/*
>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>>>> +	 */
>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>>>> Because pending_events can be INIT/SIPI at this point and it should be
>>>>> interpreted as: do SIPI and ignore INIT (atomically).
>>>>
>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
>>>> after that INIT", which is different but has almost the same effect.  
>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
>>>> would be that in a carefully-timed sequence of interrupts
>>>>
>>> You assume that the next processing will actually happen, but this is
>>> not necessary the case.
>>
>> Why not?  The INIT and SIPI that have just been sent have kicked the
>> VCPU again.
>
> kick is a nop if vcpu thread is not in a halt or in a guest.

But the KVM_REQ_EVENT request will be caught at:

        if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
            || need_resched() || signal_pending(current)) {
                vcpu->mode = OUTSIDE_GUEST_MODE;
                smp_wmb();
                local_irq_enable();
                preempt_enable();
                r = 1;
                goto cancel_injection;
        }

and the entry will be canceled.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30  5:41             ` Paolo Bonzini
@ 2013-05-30  6:01               ` Gleb Natapov
  2013-05-30  6:31                 ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-30  6:01 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> > On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> >> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> >>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>>>  		else
> >>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>>>  	}
> >>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>>>> +	/*
> >>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>>>> +	 */
> >>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>>>> Because pending_events can be INIT/SIPI at this point and it should be
> >>>>> interpreted as: do SIPI and ignore INIT (atomically).
> >>>>
> >>>> My patch does "do another INIT (which will have no effect) and do SIPI 
> >>>> after that INIT", which is different but has almost the same effect.  
> >>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >>>> the next iteration of kvm_apic_accept_events do both.  The difference 
> >>>> would be that in a carefully-timed sequence of interrupts
> >>>>
> >>> You assume that the next processing will actually happen, but this is
> >>> not necessary the case.
> >>
> >> Why not?  The INIT and SIPI that have just been sent have kicked the
> >> VCPU again.
> >
> > kick is a nop if vcpu thread is not in a halt or in a guest.
> 
> But the KVM_REQ_EVENT request will be caught at:
> 
>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>             || need_resched() || signal_pending(current)) {
>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>                 smp_wmb();
>                 local_irq_enable();
>                 preempt_enable();
>                 r = 1;
>                 goto cancel_injection;
>         }
> 
> and the entry will be canceled.
> 
But vcpu may be in non running state so we will not get here.

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30  6:01               ` Gleb Natapov
@ 2013-05-30  6:31                 ` Paolo Bonzini
  2013-05-30  7:09                   ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-30  6:31 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 30/05/2013 08:01, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>>>>>  		else
>>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>>>>>  	}
>>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>>>>>> +	/*
>>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>>>>>> +	 */
>>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
>>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
>>>>>>
>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
>>>>>> after that INIT", which is different but has almost the same effect.  
>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
>>>>>> would be that in a carefully-timed sequence of interrupts
>>>>>>
>>>>> You assume that the next processing will actually happen, but this is
>>>>> not necessary the case.
>>>>
>>>> Why not?  The INIT and SIPI that have just been sent have kicked the
>>>> VCPU again.
>>>
>>> kick is a nop if vcpu thread is not in a halt or in a guest.
>>
>> But the KVM_REQ_EVENT request will be caught at:
>>
>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>             || need_resched() || signal_pending(current)) {
>>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>>                 smp_wmb();
>>                 local_irq_enable();
>>                 preempt_enable();
>>                 r = 1;
>>                 goto cancel_injection;
>>         }
>>
>> and the entry will be canceled.

I was wrong: we exit immediately because state is
KVM_MP_STATE_INIT_RECEIVED.  But then...

> But vcpu may be in non running state so we will not get here.

... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
loop once more (modulo pending signals of course).

On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
call kvm_apic_accept_events again and do the INIT+SIPI.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30  6:31                 ` Paolo Bonzini
@ 2013-05-30  7:09                   ` Gleb Natapov
  2013-05-30  7:30                     ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-30  7:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 08:01, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> >>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> >>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> >>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>>>>>  		else
> >>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>>>>>  	}
> >>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>>>>>> +	/*
> >>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>>>>>> +	 */
> >>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
> >>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
> >>>>>>
> >>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
> >>>>>> after that INIT", which is different but has almost the same effect.  
> >>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
> >>>>>> would be that in a carefully-timed sequence of interrupts
> >>>>>>
> >>>>> You assume that the next processing will actually happen, but this is
> >>>>> not necessary the case.
> >>>>
> >>>> Why not?  The INIT and SIPI that have just been sent have kicked the
> >>>> VCPU again.
> >>>
> >>> kick is a nop if vcpu thread is not in a halt or in a guest.
> >>
> >> But the KVM_REQ_EVENT request will be caught at:
> >>
> >>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> >>             || need_resched() || signal_pending(current)) {
> >>                 vcpu->mode = OUTSIDE_GUEST_MODE;
> >>                 smp_wmb();
> >>                 local_irq_enable();
> >>                 preempt_enable();
> >>                 r = 1;
> >>                 goto cancel_injection;
> >>         }
> >>
> >> and the entry will be canceled.
> 
> I was wrong: we exit immediately because state is
> KVM_MP_STATE_INIT_RECEIVED.  But then...
> 
> > But vcpu may be in non running state so we will not get here.
> 
> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
> loop once more (modulo pending signals of course).
> 
> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
> kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
> kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
> call kvm_apic_accept_events again and do the INIT+SIPI.
> 
Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
lose the event.

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30  7:09                   ` Gleb Natapov
@ 2013-05-30  7:30                     ` Paolo Bonzini
  2013-05-30 12:34                       ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-30  7:30 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 30/05/2013 09:09, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 08:01, Gleb Natapov ha scritto:
>>> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
>>>> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
>>>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
>>>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
>>>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
>>>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
>>>>>>>>>>>  		else
>>>>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>>>>>>>>>>>  	}
>>>>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
>>>>>>>>>>> +	/*
>>>>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
>>>>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
>>>>>>>>>>> +	 */
>>>>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
>>>>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>>>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
>>>>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
>>>>>>>>
>>>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
>>>>>>>> after that INIT", which is different but has almost the same effect.  
>>>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
>>>>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
>>>>>>>> would be that in a carefully-timed sequence of interrupts
>>>>>>>>
>>>>>>> You assume that the next processing will actually happen, but this is
>>>>>>> not necessary the case.
>>>>>>
>>>>>> Why not?  The INIT and SIPI that have just been sent have kicked the
>>>>>> VCPU again.
>>>>>
>>>>> kick is a nop if vcpu thread is not in a halt or in a guest.
>>>>
>>>> But the KVM_REQ_EVENT request will be caught at:
>>>>
>>>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
>>>>             || need_resched() || signal_pending(current)) {
>>>>                 vcpu->mode = OUTSIDE_GUEST_MODE;
>>>>                 smp_wmb();
>>>>                 local_irq_enable();
>>>>                 preempt_enable();
>>>>                 r = 1;
>>>>                 goto cancel_injection;
>>>>         }
>>>>
>>>> and the entry will be canceled.
>>
>> I was wrong: we exit immediately because state is
>> KVM_MP_STATE_INIT_RECEIVED.  But then...
>>
>>> But vcpu may be in non running state so we will not get here.
>>
>> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
>> loop once more (modulo pending signals of course).
>>
>> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
>> kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
>> kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
>> call kvm_apic_accept_events again and do the INIT+SIPI.
>
> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> lose the event.

Ok, then I'd prefer to have the cmpxchg directly in the if, as in
http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505

Thanks for the discussion!

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30  7:30                     ` Paolo Bonzini
@ 2013-05-30 12:34                       ` Gleb Natapov
  2013-05-30 12:58                         ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-30 12:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Thu, May 30, 2013 at 09:30:41AM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 09:09, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 08:31:11AM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 08:01, Gleb Natapov ha scritto:
> >>> On Thu, May 30, 2013 at 07:41:05AM +0200, Paolo Bonzini wrote:
> >>>> Il 30/05/2013 03:20, Gleb Natapov ha scritto:
> >>>>> On Tue, May 28, 2013 at 06:33:39PM +0200, Paolo Bonzini wrote:
> >>>>>> Il 28/05/2013 17:00, Gleb Natapov ha scritto:
> >>>>>>> On Tue, May 28, 2013 at 03:48:58PM +0200, Paolo Bonzini wrote:
> >>>>>>>> Il 28/05/2013 14:56, Gleb Natapov ha scritto:
> >>>>>>>>>>>  		else
> >>>>>>>>>>>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >>>>>>>>>>>  	}
> >>>>>>>>>>> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> >>>>>>>>>>> +	/*
> >>>>>>>>>>> +	 * Note that we may get another INIT+SIPI sequence right here; process
> >>>>>>>>>>> +	 * the INIT first.  Assumes that there are only KVM_APIC_INIT/SIPI.
> >>>>>>>>>>> +	 */
> >>>>>>>>>>> +	if (cmpxchg(&apic->pending_events, KVM_APIC_SIPI, 0) == KVM_APIC_SIPI &&
> >>>>>>>>>>>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >>>>>>>>> Because pending_events can be INIT/SIPI at this point and it should be
> >>>>>>>>> interpreted as: do SIPI and ignore INIT (atomically).
> >>>>>>>>
> >>>>>>>> My patch does "do another INIT (which will have no effect) and do SIPI 
> >>>>>>>> after that INIT", which is different but has almost the same effect.  
> >>>>>>>> If pending_events is INIT/SIPI, it ignores the SIPI for now and lets 
> >>>>>>>> the next iteration of kvm_apic_accept_events do both.  The difference 
> >>>>>>>> would be that in a carefully-timed sequence of interrupts
> >>>>>>>>
> >>>>>>> You assume that the next processing will actually happen, but this is
> >>>>>>> not necessary the case.
> >>>>>>
> >>>>>> Why not?  The INIT and SIPI that have just been sent have kicked the
> >>>>>> VCPU again.
> >>>>>
> >>>>> kick is a nop if vcpu thread is not in a halt or in a guest.
> >>>>
> >>>> But the KVM_REQ_EVENT request will be caught at:
> >>>>
> >>>>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> >>>>             || need_resched() || signal_pending(current)) {
> >>>>                 vcpu->mode = OUTSIDE_GUEST_MODE;
> >>>>                 smp_wmb();
> >>>>                 local_irq_enable();
> >>>>                 preempt_enable();
> >>>>                 r = 1;
> >>>>                 goto cancel_injection;
> >>>>         }
> >>>>
> >>>> and the entry will be canceled.
> >>
> >> I was wrong: we exit immediately because state is
> >> KVM_MP_STATE_INIT_RECEIVED.  But then...
> >>
> >>> But vcpu may be in non running state so we will not get here.
> >>
> >> ... vcpu_enter_guest will return 1 and __vcpu_run goes around the while
> >> loop once more (modulo pending signals of course).
> >>
> >> On the next iteration __vcpu_run will call kvm_vcpu_block, which calls
> >> kvm_arch_vcpu_runnable.  kvm_arch_vcpu_runnable returns true because
> >> kvm_apic_has_events(vcpu) is also true.  This will set KVM_REQ_UNHALT,
> >> call kvm_apic_accept_events again and do the INIT+SIPI.
> >
> > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> > lose the event.
> 
> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> 
I still do not. Both of them are tricky, mine does not coalesce events
needlessly.

> Thanks for the discussion!
> 
> Paolo

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30 12:34                       ` Gleb Natapov
@ 2013-05-30 12:58                         ` Paolo Bonzini
  2013-05-30 13:10                           ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-30 12:58 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 30/05/2013 14:34, Gleb Natapov ha scritto:
>>> > >
>>> > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
>>> > > lose the event.
>> > 
>> > Ok, then I'd prefer to have the cmpxchg directly in the if, as in
>> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
>> > 
> I still do not. Both of them are tricky, mine does not coalesce events
> needlessly.

Agreed that both are tricky, but I don't think my patch is coalescing
events.  If you have

    INIT    SIPI     INIT     SIPI
                  ^                           ^
                  INIT bit cleared here       SIPI bit checked here

my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
bit was set by the second SIPI, not by the first.  In fact the first
SIPI was cancelled by the second INIT, and thus should not be processed
at all.

Instead, with your patch KVM will service all four events; strictly
speaking it is wrong to service the first SIPI, which is why I prefer
having the cmpxchg in the beginning.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30 12:58                         ` Paolo Bonzini
@ 2013-05-30 13:10                           ` Gleb Natapov
  2013-05-30 13:23                             ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-30 13:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
> >>> > >
> >>> > > Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> >>> > > lose the event.
> >> > 
> >> > Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> >> > http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> >> > 
> > I still do not. Both of them are tricky, mine does not coalesce events
> > needlessly.
> 
> Agreed that both are tricky, but I don't think my patch is coalescing
> events.  If you have
> 
>     INIT    SIPI     INIT     SIPI
>                   ^                           ^
>                   INIT bit cleared here       SIPI bit checked here
> 
Not sure I understand what you are trying to say here.

> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
> bit was set by the second SIPI, not by the first.  In fact the first
> SIPI was cancelled by the second INIT, and thus should not be processed
> at all.
That is called coalesced.

> 
> Instead, with your patch KVM will service all four events; strictly
> speaking it is wrong to service the first SIPI, which is why I prefer
> having the cmpxchg in the beginning.
> y
Why is it wrong?

I do not see what are you arguing about.

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30 13:10                           ` Gleb Natapov
@ 2013-05-30 13:23                             ` Paolo Bonzini
  2013-05-30 13:35                               ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-30 13:23 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 30/05/2013 15:10, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
>>>>>>>
>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
>>>>>>> lose the event.
>>>>>
>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
>>>>>
>>> I still do not. Both of them are tricky, mine does not coalesce events
>>> needlessly.
>>
>> Agreed that both are tricky, but I don't think my patch is coalescing
>> events.  If you have
>>
>>     INIT    SIPI     INIT     SIPI
>>                   ^                           ^
>>                   INIT bit cleared here       SIPI bit checked here
>>
> Not sure I understand what you are trying to say here.

I'll redo the picture below.

>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
>> bit was set by the second SIPI, not by the first.  In fact the first
>> SIPI was cancelled by the second INIT, and thus should not be processed
>> at all.
> That is called coalesced.

Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
not coalescing, it is proper detection of a cancelled SIPI.  We have:

   event sent           event processed            pending_events
    INIT                                                 INIT
    SIPI                                               INIT|SIPI
                             INIT                        SIPI
XX  INIT                                                 INIT
    SIPI                                               INIT|SIPI
YY                           SIPI                      INIT|SIPI
                        failed cmpxchg                 INIT|SIPI
                             INIT                        SIPI
                             SIPI                          0

At the line I marked with YY, you're processing an interrupt that is not
anymore in the pending_events.  It was dropped at point XX.

With my patch it is:


   event sent           event processed            pending_events
    INIT                                                 INIT
    SIPI                                               INIT|SIPI
                             INIT                        SIPI
XX  INIT                                                 INIT
    SIPI                                               INIT|SIPI
                       failed cmpxchg                 INIT|SIPI
                             INIT                        SIPI
                             SIPI                         0

>> Instead, with your patch KVM will service all four events; strictly
>> speaking it is wrong to service the first SIPI, which is why I prefer
>> having the cmpxchg in the beginning.
>
> Why is it wrong?

Because the first SIPI was dropped atomically with the triggering of the
second INIT, it's as if you were handling it twice.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30 13:23                             ` Paolo Bonzini
@ 2013-05-30 13:35                               ` Gleb Natapov
  2013-05-30 14:15                                 ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-30 13:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 15:10, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
> >>>>>>>
> >>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> >>>>>>> lose the event.
> >>>>>
> >>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> >>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> >>>>>
> >>> I still do not. Both of them are tricky, mine does not coalesce events
> >>> needlessly.
> >>
> >> Agreed that both are tricky, but I don't think my patch is coalescing
> >> events.  If you have
> >>
> >>     INIT    SIPI     INIT     SIPI
> >>                   ^                           ^
> >>                   INIT bit cleared here       SIPI bit checked here
> >>
> > Not sure I understand what you are trying to say here.
> 
> I'll redo the picture below.
> 
> >> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
> >> bit was set by the second SIPI, not by the first.  In fact the first
> >> SIPI was cancelled by the second INIT, and thus should not be processed
> >> at all.
> > That is called coalesced.
> 
> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
> not coalescing, it is proper detection of a cancelled SIPI.  We have:
> 
>    event sent           event processed            pending_events
>     INIT                                                 INIT
>     SIPI                                               INIT|SIPI
>                              INIT                        SIPI
> XX  INIT                                                 INIT
>     SIPI                                               INIT|SIPI
> YY                           SIPI                      INIT|SIPI
>                         failed cmpxchg                 INIT|SIPI
>                              INIT                        SIPI
>                              SIPI                          0
> 
At this point I am not even sure that you understand what problem the patch
is fixing, because the bug is not event triggered by above sequence.

> At the line I marked with YY, you're processing an interrupt that is not
> anymore in the pending_events.  It was dropped at point XX.
> 
> With my patch it is:
> 
> 
>    event sent           event processed            pending_events
>     INIT                                                 INIT
>     SIPI                                               INIT|SIPI
>                              INIT                        SIPI
> XX  INIT                                                 INIT
>     SIPI                                               INIT|SIPI
>                        failed cmpxchg                 INIT|SIPI
>                              INIT                        SIPI
>                              SIPI                         0
> 
> >> Instead, with your patch KVM will service all four events; strictly
> >> speaking it is wrong to service the first SIPI, which is why I prefer
> >> having the cmpxchg in the beginning.
> >
> > Why is it wrong?
> 
> Because the first SIPI was dropped atomically with the triggering of the
> second INIT, it's as if you were handling it twice.
> 
No, you were slow to process first SIPI, so second INIT was sent because
vcpu appears to be dead, so instead of processing both you process last.
This is exactly what coalescing means, you are just trying to present is
as something desirable.

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30 13:35                               ` Gleb Natapov
@ 2013-05-30 14:15                                 ` Paolo Bonzini
  2013-05-31  4:36                                   ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-30 14:15 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 30/05/2013 15:35, Gleb Natapov ha scritto:
> On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote:
>> Il 30/05/2013 15:10, Gleb Natapov ha scritto:
>>> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
>>>> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
>>>>>>>>>
>>>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
>>>>>>>>> lose the event.
>>>>>>>
>>>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
>>>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
>>>>>>>
>>>>> I still do not. Both of them are tricky, mine does not coalesce events
>>>>> needlessly.
>>>>
>>>> Agreed that both are tricky, but I don't think my patch is coalescing
>>>> events.  If you have
>>>>
>>>>     INIT    SIPI     INIT     SIPI
>>>>                   ^                           ^
>>>>                   INIT bit cleared here       SIPI bit checked here
>>>>
>>> Not sure I understand what you are trying to say here.
>>
>> I'll redo the picture below.
>>
>>>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
>>>> bit was set by the second SIPI, not by the first.  In fact the first
>>>> SIPI was cancelled by the second INIT, and thus should not be processed
>>>> at all.
>>> That is called coalesced.
>>
>> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
>> not coalescing, it is proper detection of a cancelled SIPI.  We have:
>>
>>    event sent           event processed            pending_events
>>     INIT                                                 INIT
>>     SIPI                                               INIT|SIPI
>>                              INIT                        SIPI
>> XX  INIT                                                 INIT
>>     SIPI                                               INIT|SIPI
>> YY                           SIPI                      INIT|SIPI
>>                         failed cmpxchg                 INIT|SIPI
>>                              INIT                        SIPI
>>                              SIPI                          0
>>
> At this point I am not even sure that you understand what problem the patch
> is fixing, because the bug is not event triggered by above sequence.

Maybe.

>> Because the first SIPI was dropped atomically with the triggering of the
>> second INIT, it's as if you were handling it twice.
>>
> No, you were slow to process first SIPI, so second INIT was sent because
> vcpu appears to be dead, so instead of processing both you process last.

Can you draw the events that happen?

What I drew above is based on the commit message.  Instead what I
understand from this explanation is:

   event sent           event processed            pending_events
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                              INIT                        SIPI
                              SIPI                         0
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                              INIT                        SIPI
                              SIPI                         0

Then my patch has absolutely no effect, the cmpxchg succeeds.  With your
patch instead I get:


   event sent           event processed              pending_events
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                              INIT                        SIPI
                              SIPI                        ...
     INIT                                                 INIT
     SIPI                                               INIT|SIPI
                         failed cmpxchg                 INIT|SIPI
                              INIT                        SIPI
                              SIPI                        ...
                        successful cmpxchg                  0

But there is no difference in the actual set of events that was processed.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-30 14:15                                 ` Paolo Bonzini
@ 2013-05-31  4:36                                   ` Gleb Natapov
  2013-05-31  8:48                                     ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-31  4:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Thu, May 30, 2013 at 04:15:35PM +0200, Paolo Bonzini wrote:
> Il 30/05/2013 15:35, Gleb Natapov ha scritto:
> > On Thu, May 30, 2013 at 03:23:35PM +0200, Paolo Bonzini wrote:
> >> Il 30/05/2013 15:10, Gleb Natapov ha scritto:
> >>> On Thu, May 30, 2013 at 02:58:09PM +0200, Paolo Bonzini wrote:
> >>>> Il 30/05/2013 14:34, Gleb Natapov ha scritto:
> >>>>>>>>>
> >>>>>>>>> Ah, we check kvm_apic_has_events() in runnable. Then yes, we will not
> >>>>>>>>> lose the event.
> >>>>>>>
> >>>>>>> Ok, then I'd prefer to have the cmpxchg directly in the if, as in
> >>>>>>> http://article.gmane.org/gmane.comp.emulators.kvm.devel/110505
> >>>>>>>
> >>>>> I still do not. Both of them are tricky, mine does not coalesce events
> >>>>> needlessly.
> >>>>
> >>>> Agreed that both are tricky, but I don't think my patch is coalescing
> >>>> events.  If you have
> >>>>
> >>>>     INIT    SIPI     INIT     SIPI
> >>>>                   ^                           ^
> >>>>                   INIT bit cleared here       SIPI bit checked here
> >>>>
> >>> Not sure I understand what you are trying to say here.
> >>
> >> I'll redo the picture below.
> >>
> >>>> my patch KVM sees apic_events = INIT | SIPI and deduces that the SIPI
> >>>> bit was set by the second SIPI, not by the first.  In fact the first
> >>>> SIPI was cancelled by the second INIT, and thus should not be processed
> >>>> at all.
> >>> That is called coalesced.
> >>
> >> Coalescing would be something like INIT SIPI SIPI -> INIT SIPI.  This is
> >> not coalescing, it is proper detection of a cancelled SIPI.  We have:
> >>
> >>    event sent           event processed            pending_events
> >>     INIT                                                 INIT
> >>     SIPI                                               INIT|SIPI
> >>                              INIT                        SIPI
> >> XX  INIT                                                 INIT
> >>     SIPI                                               INIT|SIPI
> >> YY                           SIPI                      INIT|SIPI
> >>                         failed cmpxchg                 INIT|SIPI
> >>                              INIT                        SIPI
> >>                              SIPI                          0
> >>
> > At this point I am not even sure that you understand what problem the patch
> > is fixing, because the bug is not event triggered by above sequence.
> 
> Maybe.
> 
> >> Because the first SIPI was dropped atomically with the triggering of the
> >> second INIT, it's as if you were handling it twice.
> >>
> > No, you were slow to process first SIPI, so second INIT was sent because
> > vcpu appears to be dead, so instead of processing both you process last.
> 
> Can you draw the events that happen?
> 
I did, in commit message.

> What I drew above is based on the commit message.  Instead what I
> understand from this explanation is:
> 
It is definitely not based on my commit message :)

In my commit message there is two INITs in a row:
 vpu0:                            vcpu1:
 set INIT
                                test_and_clear_bit(KVM_APIC_INIT)
                                   process INIT
 set INIT
 set SIPI
                                test_and_clear_bit(KVM_APIC_SIPI)
                                   process SIPI

Two INITs before SIPI are essential to trigger the bug and
coincidentally this is what spec advices to do.

>    event sent           event processed            pending_events
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                         0
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                         0
> 
> Then my patch has absolutely no effect, the cmpxchg succeeds.  With your
> patch instead I get:
> 
> 
>    event sent           event processed              pending_events
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                        ...
>      INIT                                                 INIT
>      SIPI                                               INIT|SIPI
>                          failed cmpxchg                 INIT|SIPI
>                               INIT                        SIPI
>                               SIPI                        ...
>                         successful cmpxchg                  0
> 
> But there is no difference in the actual set of events that was processed.
> 
I do not get what you are trying to tell with this. The scenario you are
repeatedly describing works with your path, with my patch and without any
patch at all.

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-31  4:36                                   ` Gleb Natapov
@ 2013-05-31  8:48                                     ` Paolo Bonzini
  2013-05-31  9:18                                       ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-31  8:48 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 31/05/2013 06:36, Gleb Natapov ha scritto:
> In my commit message there is two INITs in a row:
>  vpu0:                            vcpu1:
>  set INIT
>                                 test_and_clear_bit(KVM_APIC_INIT)
>                                    process INIT
>  set INIT
>  set SIPI
>                                 test_and_clear_bit(KVM_APIC_SIPI)
>                                    process SIPI
> 
> Two INITs before SIPI are essential to trigger the bug

I see now.  Let's draw pending_events as well:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      INIT                                                INIT
      SIPI                                              INIT|SIPI
                               SIPI                       INIT
                               INIT                         0

Events are reordered, there is indeed a bug if the second INIT comes at
just the right time.  With your patch:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      INIT                                                INIT
      SIPI                                              INIT|SIPI
                          SIPI, failed cmpxchg          INIT|SIPI
                               INIT                       SIPI
                               SIPI                       SIPI

The patch introduces a spurious SIPI, that's worse than coalescing.
With my patch:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      INIT                                                INIT
      SIPI                                              INIT|SIPI
                          (failed cmpxchg)              INIT|SIPI
                               INIT                       SIPI
                               SIPI                        0

My patch looks better to me for this scenario.

> and coincidentally this is what spec advices to do.

The spec advises INIT-SIPI-SIPI, not INIT-INIT-SIPI.  This is because
the first INIT may have been processed late, and SIPIs are masked if not
in wait-for-SIPI state.  You need to send the second just in case.  It
is not needed in KVM because INITs effectively unmask the SIPI
immediately, even though the INIT may take place a bit later.

The INIT-SIPI-SIPI sequence is handled correctly by KVM thanks to the
check on the mp-state.  But your patch breaks another corner case:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      SIPI                                                SIPI
                               SIPI                        0
      SIPI                                                SIPI
                           ignored SIPI                   SIPI

  set_mp_state(INIT_RECEIVED)                             SIPI
                               SIPI                        0

With my patch, or no patch at all:

    event sent           event processed            pending_events
      INIT                                                INIT
                               INIT                        0
      SIPI                                                SIPI
                               SIPI                        0
      SIPI                                                SIPI
                           ignored SIPI                    0
  set_mp_state(INIT_RECEIVED)                              0

Though perhaps the real bug here is in kvm_arch_vcpu_ioctl_setmp_state.
Setting the mp_state to anything bug SIPI_RECEIVED should clear the SIPI
event.

Paolo

>>    event sent           event processed            pending_events
>>      INIT                                                 INIT
>>      SIPI                                               INIT|SIPI
>>                               INIT                        SIPI
>>                               SIPI                         0
>>      INIT                                                 INIT
>>      SIPI                                               INIT|SIPI
>>                               INIT                        SIPI
>>                               SIPI                         0


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-31  8:48                                     ` Paolo Bonzini
@ 2013-05-31  9:18                                       ` Gleb Natapov
  2013-05-31  9:48                                         ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-05-31  9:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Fri, May 31, 2013 at 10:48:32AM +0200, Paolo Bonzini wrote:
> Il 31/05/2013 06:36, Gleb Natapov ha scritto:
> > In my commit message there is two INITs in a row:
> >  vpu0:                            vcpu1:
> >  set INIT
> >                                 test_and_clear_bit(KVM_APIC_INIT)
> >                                    process INIT
> >  set INIT
> >  set SIPI
> >                                 test_and_clear_bit(KVM_APIC_SIPI)
> >                                    process SIPI
> > 
> > Two INITs before SIPI are essential to trigger the bug
> 
> I see now.  Let's draw pending_events as well:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       INIT                                                INIT
>       SIPI                                              INIT|SIPI
>                                SIPI                       INIT
>                                INIT                         0
> 
> Events are reordered, there is indeed a bug if the second INIT comes at
> just the right time.  With your patch:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       INIT                                                INIT
>       SIPI                                              INIT|SIPI
>                           SIPI, failed cmpxchg          INIT|SIPI
>                                INIT                       SIPI
>                                SIPI                       SIPI
> 
This is incorrect. cmpxchg will fail only if another INIT cames after SIPI.
Why  would it fail?

> The patch introduces a spurious SIPI, that's worse than coalescing.
> With my patch:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       INIT                                                INIT
>       SIPI                                              INIT|SIPI
>                           (failed cmpxchg)              INIT|SIPI
>                                INIT                       SIPI
>                                SIPI                        0
> 
> My patch looks better to me for this scenario.
> 
> > and coincidentally this is what spec advices to do.
> 
> The spec advises INIT-SIPI-SIPI, not INIT-INIT-SIPI.  This is because
> the first INIT may have been processed late, and SIPIs are masked if not
> in wait-for-SIPI state.  You need to send the second just in case.  It
> is not needed in KVM because INITs effectively unmask the SIPI
> immediately, even though the INIT may take place a bit later.
> 
OK, I said this from memory since I cannot check the spec now. In this
case we need to fix unit test too.

> The INIT-SIPI-SIPI sequence is handled correctly by KVM thanks to the
> check on the mp-state.  But your patch breaks another corner case:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       SIPI                                                SIPI
>                                SIPI                        0
>       SIPI                                                SIPI
>                            ignored SIPI                   SIPI
> 
>   set_mp_state(INIT_RECEIVED)                             SIPI
>                                SIPI                        0
> 
> With my patch, or no patch at all:
> 
>     event sent           event processed            pending_events
>       INIT                                                INIT
>                                INIT                        0
>       SIPI                                                SIPI
>                                SIPI                        0
>       SIPI                                                SIPI
>                            ignored SIPI                    0
>   set_mp_state(INIT_RECEIVED)                              0
> 
> Though perhaps the real bug here is in kvm_arch_vcpu_ioctl_setmp_state.

Looks like it, also in my patch we can always call cmpxchg to clear
SIPI.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-31  9:18                                       ` Gleb Natapov
@ 2013-05-31  9:48                                         ` Paolo Bonzini
  2013-06-02 13:14                                           ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-05-31  9:48 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 31/05/2013 11:18, Gleb Natapov ha scritto:
> On Fri, May 31, 2013 at 10:48:32AM +0200, Paolo Bonzini wrote:
>> Il 31/05/2013 06:36, Gleb Natapov ha scritto:
>>> In my commit message there is two INITs in a row:
>>>  vpu0:                            vcpu1:
>>>  set INIT
>>>                                 test_and_clear_bit(KVM_APIC_INIT)
>>>                                    process INIT
>>>  set INIT
>>>  set SIPI
>>>                                 test_and_clear_bit(KVM_APIC_SIPI)
>>>                                    process SIPI
>>>
>>> Two INITs before SIPI are essential to trigger the bug
>>
>> I see now.  Let's draw pending_events as well:
>>
>>     event sent           event processed            pending_events
>>       INIT                                                INIT
>>                                INIT                        0
>>       INIT                                                INIT
>>       SIPI                                              INIT|SIPI
>>                                SIPI                       INIT
>>                                INIT                         0
>>
>> Events are reordered, there is indeed a bug if the second INIT comes at
>> just the right time.  With your patch:
>>
>>     event sent           event processed            pending_events
>>       INIT                                                INIT
>>                                INIT                        0
>>       INIT                                                INIT
>>       SIPI                                              INIT|SIPI
>>                           SIPI, failed cmpxchg          INIT|SIPI
>>                                INIT                       SIPI
>>                                SIPI                       SIPI
>
> This is incorrect. cmpxchg will fail only if another INIT cames after SIPI.
> Why  would it fail?

You're right.

Can you show what is the case in my patch where you have coalescing?  I
still prefer it because it is a smaller change, it keeps the "clear a
bit before processing" idea that you find almost everywhere.  Changing
it to "clear a bit after processing" is a bigger and more surprising
change, though both are indeed tricky.

Paolo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-05-31  9:48                                         ` Paolo Bonzini
@ 2013-06-02 13:14                                           ` Gleb Natapov
  2013-06-02 14:32                                             ` Paolo Bonzini
  0 siblings, 1 reply; 25+ messages in thread
From: Gleb Natapov @ 2013-06-02 13:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Fri, May 31, 2013 at 11:48:10AM +0200, Paolo Bonzini wrote:
> Il 31/05/2013 11:18, Gleb Natapov ha scritto:
> > On Fri, May 31, 2013 at 10:48:32AM +0200, Paolo Bonzini wrote:
> >> Il 31/05/2013 06:36, Gleb Natapov ha scritto:
> >>> In my commit message there is two INITs in a row:
> >>>  vpu0:                            vcpu1:
> >>>  set INIT
> >>>                                 test_and_clear_bit(KVM_APIC_INIT)
> >>>                                    process INIT
> >>>  set INIT
> >>>  set SIPI
> >>>                                 test_and_clear_bit(KVM_APIC_SIPI)
> >>>                                    process SIPI
> >>>
> >>> Two INITs before SIPI are essential to trigger the bug
> >>
> >> I see now.  Let's draw pending_events as well:
> >>
> >>     event sent           event processed            pending_events
> >>       INIT                                                INIT
> >>                                INIT                        0
> >>       INIT                                                INIT
> >>       SIPI                                              INIT|SIPI
> >>                                SIPI                       INIT
> >>                                INIT                         0
> >>
> >> Events are reordered, there is indeed a bug if the second INIT comes at
> >> just the right time.  With your patch:
> >>
> >>     event sent           event processed            pending_events
> >>       INIT                                                INIT
> >>                                INIT                        0
> >>       INIT                                                INIT
> >>       SIPI                                              INIT|SIPI
> >>                           SIPI, failed cmpxchg          INIT|SIPI
> >>                                INIT                       SIPI
> >>                                SIPI                       SIPI
> >
> > This is incorrect. cmpxchg will fail only if another INIT cames after SIPI.
> > Why  would it fail?
> 
> You're right.
> 
> Can you show what is the case in my patch where you have coalescing?  I
You'ev said it in some of your emails. Quoting:
"      INIT-INIT-SIPI-INIT-SIPI

 your version would do many SIPIs, while mine would do just one."


> still prefer it because it is a smaller change, it keeps the "clear a
> bit before processing" idea that you find almost everywhere.  Changing
> it to "clear a bit after processing" is a bigger and more surprising
> change, though both are indeed tricky.
> 
There is nothing "surprising" in it for me. Really it is so subjection
that arguing about it is waste of everybody time and energy. So if we
want to continue have fun arguing about it lets move to some real patch
problems/benefits. So what I didn't like from the start about
pending_events is that it introduces two locked instruction on each
interrupt injection path, your patch makes it worse by change one of
those locked instruction to cmpxchg, while mine actually removes one.
But I think we can do even better and get rid of both of them for common
case and do only one locked inst while there are events pending, but
this is slow path so less important: 


diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 9d75193..3e0e85a 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1850,11 +1850,14 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	unsigned int sipi_vector;
+	unsigned long pe;
 
-	if (!kvm_vcpu_has_lapic(vcpu))
+	if (!kvm_vcpu_has_lapic(vcpu) || !apic->pending_events)
 		return;
 
-	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
+	pe = xchg(&apic->pending_events, 0);
+
+	if (test_bit(KVM_APIC_INIT, &pe)) {
 		kvm_lapic_reset(vcpu);
 		kvm_vcpu_reset(vcpu);
 		if (kvm_vcpu_is_bsp(apic->vcpu))
@@ -1862,7 +1865,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
 		else
 			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
 	}
-	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
+	if (test_bit(KVM_APIC_SIPI, &pe) &&
 	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
 		/* evaluate pending_events before reading the vector */
 		smp_rmb();
--
			Gleb.

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-06-02 13:14                                           ` Gleb Natapov
@ 2013-06-02 14:32                                             ` Paolo Bonzini
  2013-06-02 17:33                                               ` Gleb Natapov
  0 siblings, 1 reply; 25+ messages in thread
From: Paolo Bonzini @ 2013-06-02 14:32 UTC (permalink / raw)
  To: Gleb Natapov; +Cc: kvm, Jan Kiszka

Il 02/06/2013 15:14, Gleb Natapov ha scritto:
>> Can you show what is the case in my patch where you have coalescing?  I
> You'ev said it in some of your emails. Quoting:
> "      INIT-INIT-SIPI-INIT-SIPI
> 
>  your version would do many SIPIs, while mine would do just one."

Cancelling is very different from coalescing.  In the implementation we
chose, we have "two wires" to the processor; an INIT interrupt means
"bring SIPI wire to 0 and INIT to 1", while a SIPI means "leave INIT
wire aside and bring the SIPI wire to 1".  If the target processor does
not sense in time that the SIPI wire is high, the signal is lost.

Coalescing means that INIT-INIT or SIPI-SIPI becomes a single SIPI.
That is happening no matter what, because we have two bits not a queue.

So let's settle this as a simple disagreement on terms and move on.

> There is nothing "surprising" in it for me. Really it is so subjection
> that arguing about it is waste of everybody time and energy. So if we
> want to continue have fun arguing about it lets move to some real patch
> problems/benefits.

Good idea. :)

> So what I didn't like from the start about
> pending_events is that it introduces two locked instruction on each
> interrupt injection path, your patch makes it worse by change one of
> those locked instruction to cmpxchg, while mine actually removes one.

A cmpxchg is not more expensive than a test_and_clear_bit.  A cmpxchg
loop would be worse, of course.

> But I think we can do even better and get rid of both of them for common
> case and do only one locked inst while there are events pending, but
> this is slow path so less important: 

It looks indeed better than both alternatives.  It doesn't do the
coalescing that worries you, and I can understand it relatively easily
as "latching" the contents of pending_events at the beginning of
kvm_apic_accept_events.  Very good idea!

> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 9d75193..3e0e85a 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1850,11 +1850,14 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  {
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	unsigned int sipi_vector;
> +	unsigned long pe;
>  
> -	if (!kvm_vcpu_has_lapic(vcpu))
> +	if (!kvm_vcpu_has_lapic(vcpu) || !apic->pending_events)
>  		return;

FWIW, this optimization is independent of the other change.  It would
work even on top of the current code.  But of course there is no need to
split it into a separate patch.

Paolo

> -	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> +	pe = xchg(&apic->pending_events, 0);
> +
> +	if (test_bit(KVM_APIC_INIT, &pe)) {
>  		kvm_lapic_reset(vcpu);
>  		kvm_vcpu_reset(vcpu);
>  		if (kvm_vcpu_is_bsp(apic->vcpu))
> @@ -1862,7 +1865,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
>  		else
>  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
>  	}
> -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> +	if (test_bit(KVM_APIC_SIPI, &pe) &&
>  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
>  		/* evaluate pending_events before reading the vector */
>  		smp_rmb();
> --
> 			Gleb.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC] KVM: Fix race in apic->pending_events processing
  2013-06-02 14:32                                             ` Paolo Bonzini
@ 2013-06-02 17:33                                               ` Gleb Natapov
  0 siblings, 0 replies; 25+ messages in thread
From: Gleb Natapov @ 2013-06-02 17:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvm, Jan Kiszka

On Sun, Jun 02, 2013 at 04:32:25PM +0200, Paolo Bonzini wrote:
> > So what I didn't like from the start about
> > pending_events is that it introduces two locked instruction on each
> > interrupt injection path, your patch makes it worse by change one of
> > those locked instruction to cmpxchg, while mine actually removes one.
> 
> A cmpxchg is not more expensive than a test_and_clear_bit.  A cmpxchg
> loop would be worse, of course.
> 
Not sure about it. cmpxchg is regarded to be expensive AFAIK, but of
course how expensive depends on particular cpu.

> > But I think we can do even better and get rid of both of them for common
> > case and do only one locked inst while there are events pending, but
> > this is slow path so less important: 
> 
> It looks indeed better than both alternatives.  It doesn't do the
> coalescing that worries you, and I can understand it relatively easily
> as "latching" the contents of pending_events at the beginning of
> kvm_apic_accept_events.  Very good idea!
> 
We can do the same for event processing during vcpu entry and remove a
lot of locking instruction from there.

> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 9d75193..3e0e85a 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1850,11 +1850,14 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> >  {
> >  	struct kvm_lapic *apic = vcpu->arch.apic;
> >  	unsigned int sipi_vector;
> > +	unsigned long pe;
> >  
> > -	if (!kvm_vcpu_has_lapic(vcpu))
> > +	if (!kvm_vcpu_has_lapic(vcpu) || !apic->pending_events)
> >  		return;
> 
> FWIW, this optimization is independent of the other change.  It would
> work even on top of the current code.  But of course there is no need to
> split it into a separate patch.
> 
Agree.

> Paolo
> 
> > -	if (test_and_clear_bit(KVM_APIC_INIT, &apic->pending_events)) {
> > +	pe = xchg(&apic->pending_events, 0);
> > +
> > +	if (test_bit(KVM_APIC_INIT, &pe)) {
> >  		kvm_lapic_reset(vcpu);
> >  		kvm_vcpu_reset(vcpu);
> >  		if (kvm_vcpu_is_bsp(apic->vcpu))
> > @@ -1862,7 +1865,7 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
> >  		else
> >  			vcpu->arch.mp_state = KVM_MP_STATE_INIT_RECEIVED;
> >  	}
> > -	if (test_and_clear_bit(KVM_APIC_SIPI, &apic->pending_events) &&
> > +	if (test_bit(KVM_APIC_SIPI, &pe) &&
> >  	    vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED) {
> >  		/* evaluate pending_events before reading the vector */
> >  		smp_rmb();
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 

--
			Gleb.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2013-06-02 17:33 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-05-26 13:00 [PATCH RFC] KVM: Fix race in apic->pending_events processing Gleb Natapov
2013-05-28 10:56 ` Paolo Bonzini
2013-05-28 12:56   ` Gleb Natapov
2013-05-28 13:48     ` Paolo Bonzini
2013-05-28 15:00       ` Gleb Natapov
2013-05-28 16:33         ` Paolo Bonzini
2013-05-30  1:20           ` Gleb Natapov
2013-05-30  5:41             ` Paolo Bonzini
2013-05-30  6:01               ` Gleb Natapov
2013-05-30  6:31                 ` Paolo Bonzini
2013-05-30  7:09                   ` Gleb Natapov
2013-05-30  7:30                     ` Paolo Bonzini
2013-05-30 12:34                       ` Gleb Natapov
2013-05-30 12:58                         ` Paolo Bonzini
2013-05-30 13:10                           ` Gleb Natapov
2013-05-30 13:23                             ` Paolo Bonzini
2013-05-30 13:35                               ` Gleb Natapov
2013-05-30 14:15                                 ` Paolo Bonzini
2013-05-31  4:36                                   ` Gleb Natapov
2013-05-31  8:48                                     ` Paolo Bonzini
2013-05-31  9:18                                       ` Gleb Natapov
2013-05-31  9:48                                         ` Paolo Bonzini
2013-06-02 13:14                                           ` Gleb Natapov
2013-06-02 14:32                                             ` Paolo Bonzini
2013-06-02 17:33                                               ` Gleb Natapov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).