All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	Longpeng <longpeng2@huawei.com>,
	Huangweidong <weidong.huang@huawei.com>,
	Gonglei <arei.gonglei@huawei.com>,
	wangxin <wangxinxin.wang@huawei.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>
Subject: Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
Date: Thu, 8 Jun 2017 17:16:44 +0800	[thread overview]
Message-ID: <20170608091644.GD3628@pxdev.xzpeter.org> (raw)
In-Reply-To: <463359347.7021259.1496905239878.JavaMail.zimbra@redhat.com>

On Thu, Jun 08, 2017 at 03:00:39AM -0400, Paolo Bonzini wrote:
> 
> 
> ----- Original Message -----
> > From: "Peter Xu" <peterx@redhat.com>
> > To: "Paolo Bonzini" <pbonzini@redhat.com>
> > Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Longpeng" <longpeng2@huawei.com>, "Huangweidong"
> > <weidong.huang@huawei.com>, "Gonglei" <arei.gonglei@huawei.com>, "wangxin" <wangxinxin.wang@huawei.com>, "Radim
> > Krčmář" <rkrcmar@redhat.com>
> > Sent: Thursday, June 8, 2017 8:50:57 AM
> > Subject: Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
> > 
> > On Tue, Jun 06, 2017 at 12:57:05PM +0200, Paolo Bonzini wrote:
> > > In some cases, for example involving hot-unplug of assigned
> > > devices, pi_post_block can forget to remove the vCPU from the
> > > blocked_vcpu_list.  When this happens, the next call to
> > > pi_pre_block corrupts the list.
> > > 
> > > Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
> > > and WARN instead of adding the element twice in the list.  Second,
> > > always do the list removal in pi_post_block if vcpu->pre_pcpu is
> > > set (not -1).
> > > 
> > > The new code keeps interrupts disabled for the whole duration of
> > > pi_pre_block/pi_post_block.  This is not strictly necessary, but
> > > easier to follow.  For the same reason, PI.ON is checked only
> > > after the cmpxchg, and to handle it we just call the post-block
> > > code.  This removes duplication of the list removal code.
> > > 
> > > Cc: Longpeng (Mike) <longpeng2@huawei.com>
> > > Cc: Huangweidong <weidong.huang@huawei.com>
> > > Cc: Gonglei <arei.gonglei@huawei.com>
> > > Cc: wangxin <wangxinxin.wang@huawei.com>
> > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > >  arch/x86/kvm/vmx.c | 62
> > >  ++++++++++++++++++++++--------------------------------
> > >  1 file changed, 25 insertions(+), 37 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 747d16525b45..0f4714fe4908 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu
> > > *vcpu)
> > >  	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > >  	struct pi_desc old, new;
> > >  	unsigned int dest;
> > > -	unsigned long flags;
> > >  
> > >  	do {
> > >  		old.control = new.control = pi_desc->control;
> > > +		WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
> > > +		     "Wakeup handler not enabled while the VCPU is blocked\n");
> > >  
> > >  		dest = cpu_physical_id(vcpu->cpu);
> > >  
> > > @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu
> > > *vcpu)
> > >  	} while (cmpxchg(&pi_desc->control, old.control,
> > >  			new.control) != old.control);
> > >  
> > > -	if(vcpu->pre_pcpu != -1) {
> > > -		spin_lock_irqsave(
> > > -			&per_cpu(blocked_vcpu_on_cpu_lock,
> > > -			vcpu->pre_pcpu), flags);
> > > +	if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> > > +		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > >  		list_del(&vcpu->blocked_vcpu_list);
> > > -		spin_unlock_irqrestore(
> > > -			&per_cpu(blocked_vcpu_on_cpu_lock,
> > > -			vcpu->pre_pcpu), flags);
> > > +		spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > >  		vcpu->pre_pcpu = -1;
> > >  	}
> > >  }
> > > @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> > >   */
> > >  static int pi_pre_block(struct kvm_vcpu *vcpu)
> > >  {
> > > -	unsigned long flags;
> > >  	unsigned int dest;
> > >  	struct pi_desc old, new;
> > >  	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > >  		!kvm_vcpu_apicv_active(vcpu))
> > >  		return 0;
> > >  
> > > -	vcpu->pre_pcpu = vcpu->cpu;
> > > -	spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > -			  vcpu->pre_pcpu), flags);
> > > -	list_add_tail(&vcpu->blocked_vcpu_list,
> > > -		      &per_cpu(blocked_vcpu_on_cpu,
> > > -		      vcpu->pre_pcpu));
> > > -	spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > -			       vcpu->pre_pcpu), flags);
> > > +	WARN_ON(irqs_disabled());
> > > +	local_irq_disable();
> > > +	if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > > +		vcpu->pre_pcpu = vcpu->cpu;
> > > +		spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > > +		list_add_tail(&vcpu->blocked_vcpu_list,
> > > +			      &per_cpu(blocked_vcpu_on_cpu,
> > > +				       vcpu->pre_pcpu));
> > > +		spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > > +	}
> > >  
> > >  	do {
> > >  		old.control = new.control = pi_desc->control;
> > >  
> > > -		/*
> > > -		 * We should not block the vCPU if
> > > -		 * an interrupt is posted for it.
> > > -		 */
> > > -		if (pi_test_on(pi_desc) == 1) {
> > > -			spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > -					  vcpu->pre_pcpu), flags);
> > > -			list_del(&vcpu->blocked_vcpu_list);
> > > -			spin_unlock_irqrestore(
> > > -					&per_cpu(blocked_vcpu_on_cpu_lock,
> > > -					vcpu->pre_pcpu), flags);
> > > -			vcpu->pre_pcpu = -1;
> > > -
> > > -			return 1;
> > 
> > [1]
> > 
> > > -		}
> > > -
> > >  		WARN((pi_desc->sn == 1),
> > >  		     "Warning: SN field of posted-interrupts "
> > >  		     "is set before blocking\n");
> > > @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > >  	} while (cmpxchg(&pi_desc->control, old.control,
> > >  			new.control) != old.control);
> > >  
> > > -	return 0;
> > > +	/* We should not block the vCPU if an interrupt is posted for it.  */
> > > +	if (pi_test_on(pi_desc) == 1)
> > > +		__pi_post_block(vcpu);
> > 
> > A question on when pi_test_on() is set:
> > 
> > The old code will return 1 if detected (ses [1]), while the new code
> > does not. Would that matter? (IIUC that decides whether the vcpu will
> > continue to run?)
> 
> The new code does, because __pi_post_block resets vcpu->pre_pcpu to -1.

Sorry I overlook that. :-)

> 
> > > +
> > > +	local_irq_enable();
> > > +	return (vcpu->pre_pcpu == -1);
> > 
> > Above we have:
> > 
> > 	if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > 		vcpu->pre_pcpu = vcpu->cpu;
> >                 ...
> > 	}
> > 
> > Then can here vcpu->pre_pcpu really be -1?
> 
> See above. :)

Yes. Then there's no problem.

> 
> > >  }
> > >  
> > >  static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > > @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > >  
> > >  static void pi_post_block(struct kvm_vcpu *vcpu)
> > >  {
> > > -	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > > -		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
> > > -		!kvm_vcpu_apicv_active(vcpu))
> > > +	if (vcpu->pre_pcpu == -1)
> > >  		return;
> > >  
> > > +	WARN_ON(irqs_disabled());
> > > +	local_irq_disable();
> > >  	__pi_post_block(vcpu);
> > > +	local_irq_enable();
> > >  }
> > >  
> > >  static void vmx_post_block(struct kvm_vcpu *vcpu)
> > > --
> > > 2.13.0
> > > 
> > > 
> > 
> > A general question to pre_block/post_block handling for PI:
> > 
> > I see that we are handling PI logic mostly in four places:
> > 
> > vmx_vcpu_pi_{load|put}
> > pi_{pre_post}_block
> > 
> > But do we really need the pre_block/post_block handling? Here's how I
> > understand when vcpu blocked:
> > 
> > - vcpu_block
> >   - ->pre_block
> >   - kvm_vcpu_block [1]
> >     - schedule()
> >       - kvm_sched_out
> >         - vmx_vcpu_pi_put [3]
> >       - (another process working) ...
> >       - kvm_sched_in
> >         - vmx_vcpu_pi_load [4]
> >   - ->post_block [2]
> > 
> > If so, [1] & [2] will definitely be paired with [3] & [4], then why we
> > need [3] & [4] at all?
> > 
> > (Though [3] & [4] will also be used when preemption happens, so they
> >  are required)
> > 
> > Please kindly figure out if I missed anything important...
> 
> Oh, I see what you mean: set up the wakeup handler in vmx_vcpu_pi_put
> and rely on PI.ON to wake up the sleeping process immediately.  That
> should be feasible, but overall I like the current pre_block/post_block
> structure, and I think it's simpler.  The only thing to be careful about
> is leaving the IRTE unmodified when scheduling out a blocked VCPU, which
> is cleaned up and simplified in patch 3.
> 
> So I understand that the state may seem a bit too complicated as
> of this patch, but hopefully the next two make it clearer.

After re-read the codes and patches I got the point. Indeed current
way should be clearer since pre/post_block are mostly handling NV/DST
while pi_load/put are for SN bit.  Thanks!

-- 
Peter Xu

  reply	other threads:[~2017-06-08  9:16 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-06 10:57 [PATCH CFT 0/4] VT-d PI fixes Paolo Bonzini
2017-06-06 10:57 ` [PATCH 1/4] KVM: VMX: extract __pi_post_block Paolo Bonzini
2017-06-06 21:27   ` kbuild test robot
2017-06-06 10:57 ` [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts Paolo Bonzini
2017-06-06 12:30   ` Longpeng (Mike)
2017-06-06 12:35     ` Paolo Bonzini
2017-06-06 12:45       ` Longpeng (Mike)
2017-06-06 21:49   ` kbuild test robot
2017-06-08  6:50   ` Peter Xu
2017-06-08  6:53     ` Peter Xu
2017-06-08  7:00     ` Paolo Bonzini
2017-06-08  9:16       ` Peter Xu [this message]
2017-06-08 11:24         ` Paolo Bonzini
2017-06-09  2:50           ` Peter Xu
2017-06-09  7:29             ` Paolo Bonzini
2017-06-09  7:41               ` Peter Xu
2017-07-28  2:31   ` Longpeng (Mike)
2017-07-28  6:28     ` Paolo Bonzini
2017-06-06 10:57 ` [PATCH 3/4] KVM: VMX: simplify and fix vmx_vcpu_pi_load Paolo Bonzini
2017-07-28  4:22   ` Longpeng (Mike)
2017-07-28  5:14     ` Longpeng (Mike)
2017-06-06 10:57 ` [PATCH 4/4] KVM: VMX: simplify cmpxchg of PI descriptor control field Paolo Bonzini
2017-06-07  9:33 ` [PATCH CFT 0/4] VT-d PI fixes Gonglei (Arei)
2017-06-07 14:32   ` Paolo Bonzini
2017-07-11  8:55   ` Paolo Bonzini
2017-07-11  9:16     ` Gonglei (Arei)
2017-09-21  8:23       ` Longpeng (Mike)
2017-09-21  9:42         ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170608091644.GD3628@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longpeng2@huawei.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wangxinxin.wang@huawei.com \
    --cc=weidong.huang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.