Re: [PATCH 2/7] KVM: lapic: Delay 1ns at a time when waiting for timer to "expire"

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sean Christopherson <sean.j.christopherson@intel.com>
To: Liran Alon <liran.alon@oracle.com>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	kvm@vger.kernel.org, "Wanpeng Li" <wanpengli@tencent.com>
Subject: Re: [PATCH 2/7] KVM: lapic: Delay 1ns at a time when waiting for timer to "expire"
Date: Mon, 15 Apr 2019 09:11:01 -0700	[thread overview]
Message-ID: <20190415161101.GC24010@linux.intel.com> (raw)
In-Reply-To: <A28B326E-C648-45DE-9030-3B8119DDB71B@oracle.com>

On Sun, Apr 14, 2019 at 02:25:44PM +0300, Liran Alon wrote:
> 
> 
> > On 12 Apr 2019, at 23:18, Sean Christopherson <sean.j.christopherson@intel.com> wrote:
> > 
> > To minimize the latency of timer interrupts as observed by the guest,
> > KVM adjusts the values it programs into the host timers to account for
> > the host's overhead of programming and handling the timer event.  In
> > the event that the adjustments are too aggressive, i.e. the timer fires
> > earlier than the guest expects, KVM busy waits immediately prior to
> > entering the guest.
> > 
> > Currently, KVM manually converts the delay from nanoseconds to clock
> > cycles.  But, the conversion is done in the guest's time domain, while
> > the delay occurs in the host's time domain, i.e. the delay may not be
> > accurate and could wait too little or too long.  
> 
> Just to make sure I understand the issue you see here:
> The __delay() is called with a parameter which the number of guest cycles required
> to get from current guest cycle (guest_tsc) to deadline cycle (tsc_deadline).
> But in case guest vCPU frequency is different than physical CPU frequency,
> this parameter should further be converted from guest cycles to host cycles.
> Do I understand correctly? If yes, I think you should elaborate this part a bit more in commit message.
> As this is quite confusing. :)

Heh, I didn't elaborate precisely because it was so darn confusing.  I
didn't want to say something completely incorrect so I kept it high level.

> 
> > Sidestep the headache
> > of shifting time domains by delaying 1ns at a time and breaking the loop
> > when the guest's desired timer delay has been satisfied.  Because the
> > advancement, which caps the delay to avoid unbounded busy waiting, is
> > stored in nanoseconds, the current advancement time can simply be used
> > as a loop terminator since we're delaying 1ns at a time (plus the few
> > cycles of overhead for running the code).
> 
> Looking at start_sw_tscdeadline(), we have the same issue where we require to convert guest cycles to host ns time in order to set hrtimer expiration time.
> This is simply done by:
> host_ns = guest_cycles * 1000000ULL;
> do_div(host_ns, vcpu->arch.virtual_tsc_khz);
> 
> Why shouldn’t we use the same approach for constructing __delay() parameter from guest_cycles currently pass to it as parameter?
> i.e. something such as:
> delay_guest_cycles = min(tsc_deadline - guest_tsc, nsec_to_cycles(vcpu, lapic_timer_advance_ns));
> delay_host_ns = delay_guest_cycles * 1000000ULL;
> do_div(delay_host_ns, vcpu->arch.virtual_tsc_khz);
> __delay(delay_host_ns);

That's also an option.  I opted for the loop as it felt simpler and more
obvious.  And explicitly checking rechecking kvm_read_l1_tsc() guarantees
the guest won't see an "early" TSC unless the cap kicks in.  That being
said, I don't have a strong opinion either way.

As for the code, we'd probably want to do the conversion first and cap
second so as to avoid converting lapic_timer_advance_ns to the guest's
TSC and then back to ns.  It should use ndelay(), and maybe just call it
"delay_ns" to avoid (temporarily) assigning a guest TSC calculation to a
host variable, e.g.:

  delay_ns = (tsc_deadline - guest_tsc) * 1000000ULL;
  do_div(delay_ns, vcpu->arch.virtual_tsc_khz);
  ndelay(min(delay_ns, lapic_timer_advance_ns));

That's actually nowhere near as as bad as I thought it would be now that
it's typed out...


> 
> -Liran
> 
> > 
> > Cc: Liran Alon <liran.alon@oracle.com>
> > Cc: Wanpeng Li <wanpengli@tencent.com>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
> > ---
> > arch/x86/kvm/lapic.c | 13 +++++++------
> > 1 file changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 92446cba9b24..e797e3145a8b 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1486,7 +1486,8 @@ static bool lapic_timer_int_injected(struct kvm_vcpu *vcpu)
> > void wait_lapic_expire(struct kvm_vcpu *vcpu)
> > {
> > 	struct kvm_lapic *apic = vcpu->arch.apic;
> > -	u64 guest_tsc, tsc_deadline, ns;
> > +	u32 timer_advance_ns = lapic_timer_advance_ns;
> > +	u64 guest_tsc, tmp_tsc, tsc_deadline, ns;
> 
> Nit: I would rename guest_tsc and tmp_tsc to guest_tsc_on_exit and current_guest_tsc respectively.

guest_tsc_on_exit isn't correct though, as it holds the guest's TSC at the
beginning of wait_lapic_expire(), e.g. guest_tsc_start would be more
appropriate.  My main motivation for not using verbose names is keep lines
short, i.e. avoid wrapping at 80 chars.  For me, wrapping a bunch of lines
makes the code more difficult to read than less verbose names.

> 
> > 
> > 	if (!lapic_in_kernel(vcpu))
> > 		return;
> > @@ -1499,13 +1500,13 @@ void wait_lapic_expire(struct kvm_vcpu *vcpu)
> > 
> > 	tsc_deadline = apic->lapic_timer.expired_tscdeadline;
> > 	apic->lapic_timer.expired_tscdeadline = 0;
> > -	guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> > +	tmp_tsc = guest_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> > 	trace_kvm_wait_lapic_expire(vcpu->vcpu_id, guest_tsc - tsc_deadline);
> > 
> > -	/* __delay is delay_tsc whenever the hardware has TSC, thus always.  */
> > -	if (guest_tsc < tsc_deadline)
> > -		__delay(min(tsc_deadline - guest_tsc,
> > -			nsec_to_cycles(vcpu, lapic_timer_advance_ns)));
> > +	for (ns = 0; tmp_tsc < tsc_deadline && ns < timer_advance_ns; ns++) {
> > +		ndelay(1);
> > +		tmp_tsc = kvm_read_l1_tsc(vcpu, rdtsc());
> > +	}
> > 
> > 	if (!lapic_timer_advance_adjust_done) {
> > 		/* too early */
> > -- 
> > 2.21.0
> > 
>

next prev parent reply	other threads:[~2019-04-15 16:11 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 20:18 [PATCH 0/7] KVM: lapic: Fix a variety of timer adv issues Sean Christopherson
2019-04-12 20:18 ` [PATCH 1/7] KVM: lapic: Hard cap the auto-calculated timer advancement Sean Christopherson
2019-04-14 10:22   ` Liran Alon
2019-04-12 20:18 ` [PATCH 2/7] KVM: lapic: Delay 1ns at a time when waiting for timer to "expire" Sean Christopherson
2019-04-14 11:25   ` Liran Alon
2019-04-15 16:11     ` Sean Christopherson [this message]
2019-04-15 17:06       ` Liran Alon
2019-04-16 11:02   ` Paolo Bonzini
2019-04-16 11:04     ` Liran Alon
2019-04-16 11:09       ` Paolo Bonzini
2019-04-12 20:18 ` [PATCH 3/7] KVM: lapic: Track lapic timer advance per vCPU Sean Christopherson
2019-04-14 11:29   ` Liran Alon
2019-04-12 20:18 ` [PATCH 4/7] KVM: lapic: Allow user to override auto-tuning of timer advancement Sean Christopherson
2019-04-14 11:35   ` Liran Alon
2019-04-15 16:23     ` Sean Christopherson
2019-04-15 17:10       ` Liran Alon
2019-04-12 20:18 ` [PATCH 5/7] KVM: lapic: Busy wait for timer to expire when using hv_timer Sean Christopherson
2019-04-14 11:47   ` Liran Alon
2019-04-12 20:18 ` [PATCH 6/7] KVM: lapic: Clean up the code for handling of a pre-expired hv_timer Sean Christopherson
2019-04-14 12:15   ` Liran Alon
2019-04-15 16:32     ` Sean Christopherson
2019-04-15 17:25       ` Liran Alon
2019-04-16 16:39         ` Sean Christopherson
2019-04-16 16:48           ` Liran Alon
2019-04-16 17:27             ` Sean Christopherson
2019-04-16 17:27               ` Liran Alon
2019-04-16 11:14     ` Paolo Bonzini
2019-04-12 20:18 ` [PATCH 7/7] KVM: VMX: Skip delta_tsc shift-and-divide if the dividend is zero Sean Christopherson
2019-04-14 12:21   ` Liran Alon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190415161101.GC24010@linux.intel.com \
    --to=sean.j.christopherson@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=liran.alon@oracle.com \
    --cc=pbonzini@redhat.com \
    --cc=rkrcmar@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.