Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Longpeng <longpeng2@huawei.com>,
	Huangweidong <weidong.huang@huawei.com>,
	Gonglei <arei.gonglei@huawei.com>,
	wangxin <wangxinxin.wang@huawei.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>
Subject: Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
Date: Thu, 8 Jun 2017 03:00:39 -0400 (EDT)	[thread overview]
Message-ID: <463359347.7021259.1496905239878.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20170608065057.GB3628@pxdev.xzpeter.org>



----- Original Message -----
> From: "Peter Xu" <peterx@redhat.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Longpeng" <longpeng2@huawei.com>, "Huangweidong"
> <weidong.huang@huawei.com>, "Gonglei" <arei.gonglei@huawei.com>, "wangxin" <wangxinxin.wang@huawei.com>, "Radim
> Krčmář" <rkrcmar@redhat.com>
> Sent: Thursday, June 8, 2017 8:50:57 AM
> Subject: Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
> 
> On Tue, Jun 06, 2017 at 12:57:05PM +0200, Paolo Bonzini wrote:
> > In some cases, for example involving hot-unplug of assigned
> > devices, pi_post_block can forget to remove the vCPU from the
> > blocked_vcpu_list.  When this happens, the next call to
> > pi_pre_block corrupts the list.
> > 
> > Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
> > and WARN instead of adding the element twice in the list.  Second,
> > always do the list removal in pi_post_block if vcpu->pre_pcpu is
> > set (not -1).
> > 
> > The new code keeps interrupts disabled for the whole duration of
> > pi_pre_block/pi_post_block.  This is not strictly necessary, but
> > easier to follow.  For the same reason, PI.ON is checked only
> > after the cmpxchg, and to handle it we just call the post-block
> > code.  This removes duplication of the list removal code.
> > 
> > Cc: Longpeng (Mike) <longpeng2@huawei.com>
> > Cc: Huangweidong <weidong.huang@huawei.com>
> > Cc: Gonglei <arei.gonglei@huawei.com>
> > Cc: wangxin <wangxinxin.wang@huawei.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> >  arch/x86/kvm/vmx.c | 62
> >  ++++++++++++++++++++++--------------------------------
> >  1 file changed, 25 insertions(+), 37 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 747d16525b45..0f4714fe4908 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu
> > *vcpu)
> >  	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> >  	struct pi_desc old, new;
> >  	unsigned int dest;
> > -	unsigned long flags;
> >  
> >  	do {
> >  		old.control = new.control = pi_desc->control;
> > +		WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
> > +		     "Wakeup handler not enabled while the VCPU is blocked\n");
> >  
> >  		dest = cpu_physical_id(vcpu->cpu);
> >  
> > @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu
> > *vcpu)
> >  	} while (cmpxchg(&pi_desc->control, old.control,
> >  			new.control) != old.control);
> >  
> > -	if(vcpu->pre_pcpu != -1) {
> > -		spin_lock_irqsave(
> > -			&per_cpu(blocked_vcpu_on_cpu_lock,
> > -			vcpu->pre_pcpu), flags);
> > +	if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> > +		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> >  		list_del(&vcpu->blocked_vcpu_list);
> > -		spin_unlock_irqrestore(
> > -			&per_cpu(blocked_vcpu_on_cpu_lock,
> > -			vcpu->pre_pcpu), flags);
> > +		spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> >  		vcpu->pre_pcpu = -1;
> >  	}
> >  }
> > @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> >   */
> >  static int pi_pre_block(struct kvm_vcpu *vcpu)
> >  {
> > -	unsigned long flags;
> >  	unsigned int dest;
> >  	struct pi_desc old, new;
> >  	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> >  		!kvm_vcpu_apicv_active(vcpu))
> >  		return 0;
> >  
> > -	vcpu->pre_pcpu = vcpu->cpu;
> > -	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > -			  vcpu->pre_pcpu), flags);
> > -	list_add_tail(&vcpu->blocked_vcpu_list,
> > -		      &per_cpu(blocked_vcpu_on_cpu,
> > -		      vcpu->pre_pcpu));
> > -	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > -			       vcpu->pre_pcpu), flags);
> > +	WARN_ON(irqs_disabled());
> > +	local_irq_disable();
> > +	if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > +		vcpu->pre_pcpu = vcpu->cpu;
> > +		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > +		list_add_tail(&vcpu->blocked_vcpu_list,
> > +			      &per_cpu(blocked_vcpu_on_cpu,
> > +				       vcpu->pre_pcpu));
> > +		spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > +	}
> >  
> >  	do {
> >  		old.control = new.control = pi_desc->control;
> >  
> > -		/*
> > -		 * We should not block the vCPU if
> > -		 * an interrupt is posted for it.
> > -		 */
> > -		if (pi_test_on(pi_desc) == 1) {
> > -			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > -					  vcpu->pre_pcpu), flags);
> > -			list_del(&vcpu->blocked_vcpu_list);
> > -			spin_unlock_irqrestore(
> > -					&per_cpu(blocked_vcpu_on_cpu_lock,
> > -					vcpu->pre_pcpu), flags);
> > -			vcpu->pre_pcpu = -1;
> > -
> > -			return 1;
> 
> [1]
> 
> > -		}
> > -
> >  		WARN((pi_desc->sn == 1),
> >  		     "Warning: SN field of posted-interrupts "
> >  		     "is set before blocking\n");
> > @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> >  	} while (cmpxchg(&pi_desc->control, old.control,
> >  			new.control) != old.control);
> >  
> > -	return 0;
> > +	/* We should not block the vCPU if an interrupt is posted for it.  */
> > +	if (pi_test_on(pi_desc) == 1)
> > +		__pi_post_block(vcpu);
> 
> A question on when pi_test_on() is set:
> 
> The old code will return 1 if detected (ses [1]), while the new code
> does not. Would that matter? (IIUC that decides whether the vcpu will
> continue to run?)

The new code does, because __pi_post_block resets vcpu->pre_pcpu to -1.

> > +
> > +	local_irq_enable();
> > +	return (vcpu->pre_pcpu == -1);
> 
> Above we have:
> 
> 	if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> 		vcpu->pre_pcpu = vcpu->cpu;
>                 ...
> 	}
> 
> Then can here vcpu->pre_pcpu really be -1?

See above. :)

> >  }
> >  
> >  static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
> >  
> >  static void pi_post_block(struct kvm_vcpu *vcpu)
> >  {
> > -	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > -		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
> > -		!kvm_vcpu_apicv_active(vcpu))
> > +	if (vcpu->pre_pcpu == -1)
> >  		return;
> >  
> > +	WARN_ON(irqs_disabled());
> > +	local_irq_disable();
> >  	__pi_post_block(vcpu);
> > +	local_irq_enable();
> >  }
> >  
> >  static void vmx_post_block(struct kvm_vcpu *vcpu)
> > --
> > 2.13.0
> > 
> > 
> 
> A general question to pre_block/post_block handling for PI:
> 
> I see that we are handling PI logic mostly in four places:
> 
> vmx_vcpu_pi_{load|put}
> pi_{pre_post}_block
> 
> But do we really need the pre_block/post_block handling? Here's how I
> understand when vcpu blocked:
> 
> - vcpu_block
>   - ->pre_block
>   - kvm_vcpu_block [1]
>     - schedule()
>       - kvm_sched_out
>         - vmx_vcpu_pi_put [3]
>       - (another process working) ...
>       - kvm_sched_in
>         - vmx_vcpu_pi_load [4]
>   - ->post_block [2]
> 
> If so, [1] & [2] will definitely be paired with [3] & [4], then why we
> need [3] & [4] at all?
> 
> (Though [3] & [4] will also be used when preemption happens, so they
>  are required)
> 
> Please kindly figure out if I missed anything important...

Oh, I see what you mean: set up the wakeup handler in vmx_vcpu_pi_put
and rely on PI.ON to wake up the sleeping process immediately.  That
should be feasible, but overall I like the current pre_block/post_block
structure, and I think it's simpler.  The only thing to be careful about
is leaving the IRTE unmodified when scheduling out a blocked VCPU, which
is cleaned up and simplified in patch 3.

So I understand that the state may seem a bit too complicated as
of this patch, but hopefully the next two make it clearer.

Paolo

next prev parent reply	other threads:[~2017-06-08  7:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-06 10:57 [PATCH CFT 0/4] VT-d PI fixes Paolo Bonzini
2017-06-06 10:57 ` [PATCH 1/4] KVM: VMX: extract __pi_post_block Paolo Bonzini
2017-06-06 21:27   ` kbuild test robot
2017-06-06 10:57 ` [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts Paolo Bonzini
2017-06-06 12:30   ` Longpeng (Mike)
2017-06-06 12:35     ` Paolo Bonzini
2017-06-06 12:45       ` Longpeng (Mike)
2017-06-06 21:49   ` kbuild test robot
2017-06-08  6:50   ` Peter Xu
2017-06-08  6:53     ` Peter Xu
2017-06-08  7:00     ` Paolo Bonzini [this message]
2017-06-08  9:16       ` Peter Xu
2017-06-08 11:24         ` Paolo Bonzini
2017-06-09  2:50           ` Peter Xu
2017-06-09  7:29             ` Paolo Bonzini
2017-06-09  7:41               ` Peter Xu
2017-07-28  2:31   ` Longpeng (Mike)
2017-07-28  6:28     ` Paolo Bonzini
2017-06-06 10:57 ` [PATCH 3/4] KVM: VMX: simplify and fix vmx_vcpu_pi_load Paolo Bonzini
2017-07-28  4:22   ` Longpeng (Mike)
2017-07-28  5:14     ` Longpeng (Mike)
2017-06-06 10:57 ` [PATCH 4/4] KVM: VMX: simplify cmpxchg of PI descriptor control field Paolo Bonzini
2017-06-07  9:33 ` [PATCH CFT 0/4] VT-d PI fixes Gonglei (Arei)
2017-06-07 14:32   ` Paolo Bonzini
2017-07-11  8:55   ` Paolo Bonzini
2017-07-11  9:16     ` Gonglei (Arei)
2017-09-21  8:23       ` Longpeng (Mike)
2017-09-21  9:42         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=463359347.7021259.1496905239878.JavaMail.zimbra@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longpeng2@huawei.com \
    --cc=peterx@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wangxinxin.wang@huawei.com \
    --cc=weidong.huang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.