public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: fuqiang wang <fuqiang.wng@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org,  "H . Peter Anvin" <hpa@zytor.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	kvm@vger.kernel.org,  linux-kernel@vger.kernel.org,
	yu chen <chen.yu@easystack.com>,
	 dongxu zhang <dongxu.zhang@easystack.com>
Subject: Re: [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period
Date: Mon, 13 Oct 2025 16:29:27 -0700	[thread overview]
Message-ID: <aO2LV-ipewL59LC6@google.com> (raw)
In-Reply-To: <20251013125117.87739-1-fuqiang.wng@gmail.com>

On Wed, Oct 01, 2025, fuqiang wang wrote:
> When the guest uses the APIC periodic timer, if the delay exceeds the
> period, the delta will be negative. 

IIUC, by "delay" you mean the time it takes for KVM to get (back) to
advance_periodic_target_expiration().  If that's correct, I think it would be
clearer to word this as:

  When the guest uses the APIC periodic timer, if the next period has already
  expired, e.g. due to the period being smaller than the delay in processing
  the timer, the delta will be negative.

> nsec_to_cycles() may then convert this
> delta into an absolute value larger than guest_l1_tsc, resulting in a
> negative tscdeadline. Since the hv timer supports a maximum bit width of
> cpu_preemption_timer_multi + 32, this causes the hv timer setup to fail and
> switch to the sw timer.
> 
> Moreover, due to the commit 98c25ead5eda ("KVM: VMX: Move preemption timer
> <=> hrtimer dance to common x86"), if the guest is using the sw timer
> before blocking, it will continue to use the sw timer after being woken up,
> and will not switch back to the hv timer until the relevant APIC timer
> register is reprogrammed.  Since the periodic timer does not require
> frequent APIC timer register programming, the guest may continue to use the
> software timer for an extended period.
> 
> The reproduction steps and patch verification results at link [1].
> 
> [1]: https://github.com/cai-fuqiang/kernel_test/tree/master/period_timer_test
> 
> Fixes: 98c25ead5eda ("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86")

I'm pretty sure this should be:

  Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode")

because that's where the bug with tsdeadline (incorrectly) wrapping was introduced.
The aforementioned commit exacerbated (and likely exposed?) the bug, but AFAICT
that commit itself didn't introduce any bugs (related to this issue).

> Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
> ---
>  arch/x86/kvm/lapic.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 5fc437341e03..afd349f4d933 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2036,6 +2036,9 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic)
>  	u64 tscl = rdtsc();
>  	ktime_t delta;
>  
> +	u64 delta_cycles_u;
> +	u64 delta_cycles_s;
> +
>  	/*
>  	 * Synchronize both deadlines to the same time source or
>  	 * differences in the periods (caused by differences in the
> @@ -2047,8 +2050,11 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic)
>  		ktime_add_ns(apic->lapic_timer.target_expiration,
>  				apic->lapic_timer.period);
>  	delta = ktime_sub(apic->lapic_timer.target_expiration, now);
> +	delta_cycles_u = nsec_to_cycles(apic->vcpu, abs(delta));
> +	delta_cycles_s = delta > 0 ? delta_cycles_u : -delta_cycles_u;
> +
>  	apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
> -		nsec_to_cycles(apic->vcpu, delta);
> +		delta_cycles_s;

This isn't quite correct either.  E.g. if delta is negative while L1 TSC is tiny,
then subtracting the delta will incorrectly result the deadline wrapping too.
Very, very, theoretically, L1 TSC could even be '0', e.g. due to a weird offset
for L1, so I don't think subtracting is ever safe.  Heh, of course we're hosed
in that case no matter what since KVM treats tscdeadline==0 as "not programmed".

Anyways, can't we just skip adding negative value?  Whether or not the TSC deadline
has expired is mostly a boolean value; for the vast majority of code it doesn't
matter exactly when the timer expired.

The only code that cares is __kvm_wait_lapic_expire(), and the only downside to
setting tscdeadline=L1.TSC is that adjust_lapic_timer_advance() won't adjust as
aggressively as it probably should.

Ha!  That's essentially what update_target_expiration() already does:

	now = ktime_get();
	remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
	if (ktime_to_ns(remaining) < 0)
		remaining = 0;

E.g.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0ae7f913d782..2fb03a8a9ae9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2131,18 +2131,26 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic)
        ktime_t delta;
 
        /*
-        * Synchronize both deadlines to the same time source or
-        * differences in the periods (caused by differences in the
-        * underlying clocks or numerical approximation errors) will
-        * cause the two to drift apart over time as the errors
-        * accumulate.
+        * Use kernel time as the time source for both deadlines so that they
+        * stay synchronized.  Computing each deadline independently will cause
+        * the two deadlines to drift apart over time as differences in the
+        * periods accumulate, e.g. due to differences in the underlying clocks
+        * or numerical approximation errors.
         */
        apic->lapic_timer.target_expiration =
                ktime_add_ns(apic->lapic_timer.target_expiration,
                                apic->lapic_timer.period);
        delta = ktime_sub(apic->lapic_timer.target_expiration, now);
-       apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
-               nsec_to_cycles(apic->vcpu, delta);
+
+       /*
+        * Don't adjust the TSC deadline if the next period has already expired,
+        * e.g. due to software overhead resulting in delays larger than the
+        * period.  Blindly adding a negative delta could cause the deadline to
+        * become excessively large due to the deadline being an unsigned value.
+        */
+       apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl);
+       if (delta > 0)
+               apic->lapic_timer.tscdeadline += nsec_to_cycles(apic->vcpu, delta);
 }
 
 static void start_sw_period(struct kvm_lapic *apic)

  reply	other threads:[~2025-10-13 23:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-13 12:51 [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period fuqiang wang
2025-10-13 23:29 ` Sean Christopherson [this message]
2025-10-17 12:21   ` fuqiang wang
2025-10-17 15:59     ` Sean Christopherson
2025-10-21 15:37       ` fuqiang wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aO2LV-ipewL59LC6@google.com \
    --to=seanjc@google.com \
    --cc=bp@alien8.de \
    --cc=chen.yu@easystack.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongxu.zhang@easystack.com \
    --cc=fuqiang.wng@gmail.com \
    --cc=hpa@zytor.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox