Re: [RFC v2] KVM: x86/vmx: Suppress posted interrupt notification when CPU is in host

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Chao Gao <chao.gao@intel.com>
To: Sean Christopherson <seanjc@google.com>
Cc: <kvm@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<jon@nutanix.com>, <kevin.tian@intel.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>, <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>
Subject: Re: [RFC v2] KVM: x86/vmx: Suppress posted interrupt notification when CPU is in host
Date: Tue, 27 Sep 2022 14:32:51 +0800	[thread overview]
Message-ID: <YzKZExaU2k7qfcS9@gao-cwp> (raw)
In-Reply-To: <YzHRKO1v5N/BIQl6@google.com>

On Mon, Sep 26, 2022 at 04:19:52PM +0000, Sean Christopherson wrote:
>On Fri, Sep 23, 2022, Chao Gao wrote:
>> Set PID.SN right after VM exits and clear it before VM entry to minimize
>> the chance of hardware issuing PINs to a CPU when it's in host.
>
>Toggling PID.SN as close to the world switch as possible is undesirable.  If KVM
>re-enters the guest without enabling IRQs, i.e. handles the VM-Exit in the fastpath,
>then the notification IRQ will be delivered in the guest.
>
>The natural location to do the toggling is when KVM "toggles" software, i.e. when
>KVM sets IN_GUEST_MODE (clear SN) and OUTSIDE_GUEST_MODE (set SN).

This makes sense to me.

>
>I believe that would also obviate the need to manually send a PI Notification IRQ,
>as the existing ->sync_pir_to_irr() call that exists to handle exactly this case
>(notification not sent or handled in host) would ensure any outstanding posted IRQ
>gets moved to the IRR and processed accordingly.
>
>> Opportunistically clean up vmx_vcpu_pi_put(); when a vCPU is preempted,
>
>Uh uh, this patch is already way, way too subtle and complex to tack on clean up.
>"Opportunistic" clean up is for cases where the clean up is a pure refactoring
>and/or has zero impact on functionality.

Got it. Will move this cleanup to a separate patch if it is still needed.

>
>> it is pointless to update PID.NV to wakeup vector since notification is
>> anyway suppressed. And since PID.SN should be already set for running
>> vCPUs, so, don't set it again for preempted vCPUs.
>
>I'm pretty sure this is wrong.  If the vCPU is preempted between prepare_to_rcuwait()
>and schedule(), then skipping pi_enable_wakeup_handler() will hang the guest if
>the wakeup event is a posted IRQ and the event arrives while the vCPU is preempted.

Thanks for pointing out this subtle case.

My understanding is finally there will be a call of vmx_vcpu_pi_put()
with preempted=false (I assume that preempted vCPUs will be scheduled
at some later point). In that case, pi_enable_wakeup_handler() can wake
up the vCPU by sending a self-ipi. Plus this patch checks PIR instead of
ON bit, I don't get why the guest will hang.

>
>> When IPI virtualization is enabled, this patch increases "perf bench" [*]
>> by 6.56%, and PIN count in 1 second drops from tens of thousands to
>> hundreds. But cpuid loop test shows this patch causes 1.58% overhead in
>> VM-exit round-trip latency.
>
>The overhead is more than likely due to pi_is_pir_empty() in the VM-Entry path,
>i.e. should in theory go away if PID.SN is clear/set at IN_GUEST_MODE and
>OUTSIDE_GUEST_MODE

I will collect perf data after implementing what you suggested.

>
>> Also honour PID.SN bit in vmx_deliver_posted_interrupt().
>
>Why?

VT-d hardware doesn't set ON bit if SN bit is set.

Enforce the same rule in KVM can skip unnecessary work, like the
following pi_test_and_set_on() and kvm_vcpu_trigger_posted_interrupt().

>
>> When IPI virtualization is enabled, this patch increases "perf bench" [*]
>> by 6.56%, and PIN count in 1 second drops from tens of thousands to
>> hundreds. But cpuid loop test shows this patch causes 1.58% overhead in
>> VM-exit round-trip latency.
>> 
>> [*] test cmd: perf bench sched pipe -T. Note that we change the source
>> code to pin two threads to two different vCPUs so that it can reproduce
>> stable results.
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> ---
>> RFC: I am not sure whether the benefits outweighs the extra VM-exit cost.
>> 
>> Changes in v2 (addressed comments from Kevin):
>> - measure/estimate the impact to non-IPC-intensive cases
>> - don't tie PID.SN to vcpu->mode. Instead, clear PID.SN
>>   right before VM-entry and set it after VM-exit.
>
>Ah, sorry, missed v1.  Rather than key off of IN_GUEST_MODE in the sync path, add
>an explicit kvm_x86_ops hook to perform the transition.  I.e. make it explict.

It is ok to add a separate hook. But the question is how to coordinate clearing
SN with ->sync_pir_to_irr(). Clearing SN bit may put PIR in a state where ON/SN
are cleared but some outstanding IRQs left there. Current ->sync_pir_to_irr()
doesn't sync those IRQs to IRR in this case. Here are two options to fix the
problem:

1) clear SN with the new hook, then set ON bit if there is any outstanding IRQ.
No change to ->sync_pir_to_irr() is needed.

2) clear SN with the new hook, add a force mode to ->sync_pir_to_irr() where
PIR is synced to IRR regardless of ON/SN bits, inovke ->sync_pir_to_irr()
on VM-entry path with force_mode=true.

Both may lead to an extra check of PIR.

>> @@ -101,11 +95,16 @@ void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>>  		new.control = old.control;
>>  
>>  		/*
>> -		 * Clear SN (as above) and refresh the destination APIC ID to
>> -		 * handle task migration (@cpu != vcpu->cpu).
>> +		 * Set SN and refresh the destination APIC ID to handle
>> +		 * task migration (@cpu != vcpu->cpu).
>> +		 *
>> +		 * SN is cleared when a vCPU goes to blocked state so that
>> +		 * the blocked vCPU can be waken up on receiving a
>> +		 * notification. For a running/runnable vCPU, such
>> +		 * notifications are useless. Set SN bit to suppress them.
>>  		 */
>>  		new.ndst = dest;
>> -		new.sn = 0;
>> +		new.sn = 1;
>
>To handle the preempted case, I believe the correct behavior is to leave SN
>as-is, although that would require setting SN=1 during vCPU creation.  Arguably
>KVM should do that anyways when APICv is enabled.
>
>Hmm, or alternatively this should do the same?
>
>		new.sn = !kvm_vcpu_is_blocking();

I don't get this. Probably I am misunderstanding something about vCPU preemption.

>
>> @@ -172,8 +160,10 @@ static void pi_enable_wakeup_handler(struct kvm_vcpu *vcpu)
>>  	 * enabled until it is safe to call try_to_wake_up() on the task being
>>  	 * scheduled out).
>>  	 */
>> -	if (pi_test_on(&new))
>> +	if (!pi_is_pir_empty(pi_desc)) {
>> +		pi_set_on(pi_desc);
>
>As much as I wish we could get rid of kvm_arch_vcpu_blocking(), I actually think
>this would be a good application of that hook.  If PID.SN is cleared during
>kvm_arch_vcpu_blocking() and set during kvm_arch_vcpu_unblocking(), then I believe
>there's no need to manually check the PIR here, as any IRQ that isn't detected by
>kvm_vcpu_check_block() is guaranteed to set PID.ON=1.

Using kvm_arch_vcpu_blocking() has the same problem as using a new hook
for the VM-entry path: we need a force mode for ->sync_pir_to_irr() or
set ON bit if there is any outstanding IRQ right after clearing SN

The former may help performance a little but since the call of
->sync_pir_to_irr() in kvm_vcpu_check_block() is so far away from the
place where SN is cleared, I think this would be a source of bugs.

The latter has no benefit compared to what this patch does here.

next prev parent reply	other threads:[~2022-09-27  6:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-23  8:58 [RFC v2] KVM: x86/vmx: Suppress posted interrupt notification when CPU is in host Chao Gao
2022-09-26 16:19 ` Sean Christopherson
2022-09-27  6:32   ` Chao Gao [this message]
2022-09-27 21:43     ` Sean Christopherson
2022-09-28 11:26       ` Chao Gao
2022-09-28  8:29 ` Tian, Kevin
2022-09-28 10:56   ` Chao Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YzKZExaU2k7qfcS9@gao-cwp \
    --to=chao.gao@intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.