Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support

From: Marc Zyngier <maz@kernel.org>
To: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Joel Fernandes <joelaf@google.com>,
	linux-kernel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	Suleiman Souhlal <suleiman@google.com>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCHv2 4/4] arm64: add host pv-vcpu-state support
Date: Wed, 21 Jul 2021 10:10:59 +0100	[thread overview]
Message-ID: <87fsw82frw.wl-maz@kernel.org> (raw)
In-Reply-To: <YPd1Q1ppmKng67tk@google.com>

On Wed, 21 Jul 2021 02:15:47 +0100,
Sergey Senozhatsky <senozhatsky@chromium.org> wrote:
> 
> On (21/07/12 17:24), Marc Zyngier wrote:
> > >  
> > >  void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > >  {
> > > +	kvm_update_vcpu_preempted(vcpu, true);
> > 
> > This doesn't look right. With this, you are now telling the guest that
> > a vcpu that is blocked on WFI is preempted. This really isn't the
> > case, as it has voluntarily entered a low-power mode while waiting for
> > an interrupt. Indeed, the vcpu isn't running. A physical CPU wouldn't
> > be running either.
> 
> I suppose you are talking about kvm_vcpu_block().

kvm_vcpu_block() is how things are implemented. WFI is the instruction
I'm concerned about.

> Well, it checks kvm_vcpu_check_block() but then it simply schedule()
> out the vcpu process, which does look like "the vcpu is
> preempted". Once we sched_in() that vcpu process again we mark it as
> non-preempted, even though it remains in kvm wfx handler. Why isn't
> it right?

Because the vcpu hasn't been "preempted". It has *voluntarily* gone
into a low-power mode, and how KVM implements this "low-power mode" is
none of the guest's business. This is exactly the same behaviour that
you will have on bare metal. From a Linux guest perspective, the vcpu
is *idle*, not doing anything, and only waiting for an interrupt to
start executing again.

This is a fundamentally different concept from preempting a vcpu
because its time-slice is up. In this second case, you can indeed
mitigate things by exposing steal time and preemption status as you
break the illusion of a machine that is completely controlled by the
guest.

If the "reched on interrupt delivery while blocked on WFI" is too slow
for you, then *that* is the thing that needs addressing. Feeding extra
state to the guest doesn't help.

> Another call path is iret:
> 
> <iret>
> __schedule()
>  context_switch()
>   prepare_task_switch()
>    fire_sched_in_preempt_notifiers()
>     kvm_sched_out()
>      kvm_arch_vcpu_put()

I'm not sure how a x86 concept is relevant here.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm