* [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period
@ 2025-10-13 12:51 fuqiang wang
2025-10-13 23:29 ` Sean Christopherson
0 siblings, 1 reply; 5+ messages in thread
From: fuqiang wang @ 2025-10-13 12:51 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
Borislav Petkov, Dave Hansen, x86, H . Peter Anvin,
Maxim Levitsky, kvm, linux-kernel
Cc: fuqiang wang, yu chen, dongxu zhang
When the guest uses the APIC periodic timer, if the delay exceeds the
period, the delta will be negative. nsec_to_cycles() may then convert this
delta into an absolute value larger than guest_l1_tsc, resulting in a
negative tscdeadline. Since the hv timer supports a maximum bit width of
cpu_preemption_timer_multi + 32, this causes the hv timer setup to fail and
switch to the sw timer.
Moreover, due to the commit 98c25ead5eda ("KVM: VMX: Move preemption timer
<=> hrtimer dance to common x86"), if the guest is using the sw timer
before blocking, it will continue to use the sw timer after being woken up,
and will not switch back to the hv timer until the relevant APIC timer
register is reprogrammed. Since the periodic timer does not require
frequent APIC timer register programming, the guest may continue to use the
software timer for an extended period.
The reproduction steps and patch verification results at link [1].
[1]: https://github.com/cai-fuqiang/kernel_test/tree/master/period_timer_test
Fixes: 98c25ead5eda ("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86")
Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com>
---
arch/x86/kvm/lapic.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 5fc437341e03..afd349f4d933 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2036,6 +2036,9 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic)
u64 tscl = rdtsc();
ktime_t delta;
+ u64 delta_cycles_u;
+ u64 delta_cycles_s;
+
/*
* Synchronize both deadlines to the same time source or
* differences in the periods (caused by differences in the
@@ -2047,8 +2050,11 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic)
ktime_add_ns(apic->lapic_timer.target_expiration,
apic->lapic_timer.period);
delta = ktime_sub(apic->lapic_timer.target_expiration, now);
+ delta_cycles_u = nsec_to_cycles(apic->vcpu, abs(delta));
+ delta_cycles_s = delta > 0 ? delta_cycles_u : -delta_cycles_u;
+
apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) +
- nsec_to_cycles(apic->vcpu, delta);
+ delta_cycles_s;
}
static void start_sw_period(struct kvm_lapic *apic)
--
2.47.0
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period 2025-10-13 12:51 [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period fuqiang wang @ 2025-10-13 23:29 ` Sean Christopherson 2025-10-17 12:21 ` fuqiang wang 0 siblings, 1 reply; 5+ messages in thread From: Sean Christopherson @ 2025-10-13 23:29 UTC (permalink / raw) To: fuqiang wang Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Maxim Levitsky, kvm, linux-kernel, yu chen, dongxu zhang On Wed, Oct 01, 2025, fuqiang wang wrote: > When the guest uses the APIC periodic timer, if the delay exceeds the > period, the delta will be negative. IIUC, by "delay" you mean the time it takes for KVM to get (back) to advance_periodic_target_expiration(). If that's correct, I think it would be clearer to word this as: When the guest uses the APIC periodic timer, if the next period has already expired, e.g. due to the period being smaller than the delay in processing the timer, the delta will be negative. > nsec_to_cycles() may then convert this > delta into an absolute value larger than guest_l1_tsc, resulting in a > negative tscdeadline. Since the hv timer supports a maximum bit width of > cpu_preemption_timer_multi + 32, this causes the hv timer setup to fail and > switch to the sw timer. > > Moreover, due to the commit 98c25ead5eda ("KVM: VMX: Move preemption timer > <=> hrtimer dance to common x86"), if the guest is using the sw timer > before blocking, it will continue to use the sw timer after being woken up, > and will not switch back to the hv timer until the relevant APIC timer > register is reprogrammed. Since the periodic timer does not require > frequent APIC timer register programming, the guest may continue to use the > software timer for an extended period. > > The reproduction steps and patch verification results at link [1]. > > [1]: https://github.com/cai-fuqiang/kernel_test/tree/master/period_timer_test > > Fixes: 98c25ead5eda ("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86") I'm pretty sure this should be: Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode") because that's where the bug with tsdeadline (incorrectly) wrapping was introduced. The aforementioned commit exacerbated (and likely exposed?) the bug, but AFAICT that commit itself didn't introduce any bugs (related to this issue). > Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com> > --- > arch/x86/kvm/lapic.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 5fc437341e03..afd349f4d933 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -2036,6 +2036,9 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic) > u64 tscl = rdtsc(); > ktime_t delta; > > + u64 delta_cycles_u; > + u64 delta_cycles_s; > + > /* > * Synchronize both deadlines to the same time source or > * differences in the periods (caused by differences in the > @@ -2047,8 +2050,11 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic) > ktime_add_ns(apic->lapic_timer.target_expiration, > apic->lapic_timer.period); > delta = ktime_sub(apic->lapic_timer.target_expiration, now); > + delta_cycles_u = nsec_to_cycles(apic->vcpu, abs(delta)); > + delta_cycles_s = delta > 0 ? delta_cycles_u : -delta_cycles_u; > + > apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + > - nsec_to_cycles(apic->vcpu, delta); > + delta_cycles_s; This isn't quite correct either. E.g. if delta is negative while L1 TSC is tiny, then subtracting the delta will incorrectly result the deadline wrapping too. Very, very, theoretically, L1 TSC could even be '0', e.g. due to a weird offset for L1, so I don't think subtracting is ever safe. Heh, of course we're hosed in that case no matter what since KVM treats tscdeadline==0 as "not programmed". Anyways, can't we just skip adding negative value? Whether or not the TSC deadline has expired is mostly a boolean value; for the vast majority of code it doesn't matter exactly when the timer expired. The only code that cares is __kvm_wait_lapic_expire(), and the only downside to setting tscdeadline=L1.TSC is that adjust_lapic_timer_advance() won't adjust as aggressively as it probably should. Ha! That's essentially what update_target_expiration() already does: now = ktime_get(); remaining = ktime_sub(apic->lapic_timer.target_expiration, now); if (ktime_to_ns(remaining) < 0) remaining = 0; E.g. diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0ae7f913d782..2fb03a8a9ae9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2131,18 +2131,26 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic) ktime_t delta; /* - * Synchronize both deadlines to the same time source or - * differences in the periods (caused by differences in the - * underlying clocks or numerical approximation errors) will - * cause the two to drift apart over time as the errors - * accumulate. + * Use kernel time as the time source for both deadlines so that they + * stay synchronized. Computing each deadline independently will cause + * the two deadlines to drift apart over time as differences in the + * periods accumulate, e.g. due to differences in the underlying clocks + * or numerical approximation errors. */ apic->lapic_timer.target_expiration = ktime_add_ns(apic->lapic_timer.target_expiration, apic->lapic_timer.period); delta = ktime_sub(apic->lapic_timer.target_expiration, now); - apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + - nsec_to_cycles(apic->vcpu, delta); + + /* + * Don't adjust the TSC deadline if the next period has already expired, + * e.g. due to software overhead resulting in delays larger than the + * period. Blindly adding a negative delta could cause the deadline to + * become excessively large due to the deadline being an unsigned value. + */ + apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl); + if (delta > 0) + apic->lapic_timer.tscdeadline += nsec_to_cycles(apic->vcpu, delta); } static void start_sw_period(struct kvm_lapic *apic) ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period 2025-10-13 23:29 ` Sean Christopherson @ 2025-10-17 12:21 ` fuqiang wang 2025-10-17 15:59 ` Sean Christopherson 0 siblings, 1 reply; 5+ messages in thread From: fuqiang wang @ 2025-10-17 12:21 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Maxim Levitsky, kvm, linux-kernel, yu chen, dongxu zhang On 10/14/25 7:29 AM, Sean Christopherson wrote: > On Wed, Oct 01, 2025, fuqiang wang wrote: >> When the guest uses the APIC periodic timer, if the delay exceeds the >> period, the delta will be negative. > > IIUC, by "delay" you mean the time it takes for KVM to get (back) to > advance_periodic_target_expiration(). If that's correct, I think it would be > clearer to word this as: > > When the guest uses the APIC periodic timer, if the next period has already > expired, e.g. due to the period being smaller than the delay in processing > the timer, the delta will be negative. > Indeed, this explanation is much clearer. I will adopt it in the next version patch. >> nsec_to_cycles() may then convert this >> delta into an absolute value larger than guest_l1_tsc, resulting in a >> negative tscdeadline. Since the hv timer supports a maximum bit width of >> cpu_preemption_timer_multi + 32, this causes the hv timer setup to fail and >> switch to the sw timer. >> >> Moreover, due to the commit 98c25ead5eda ("KVM: VMX: Move preemption timer >> <=> hrtimer dance to common x86"), if the guest is using the sw timer >> before blocking, it will continue to use the sw timer after being woken up, >> and will not switch back to the hv timer until the relevant APIC timer >> register is reprogrammed. Since the periodic timer does not require >> frequent APIC timer register programming, the guest may continue to use the >> software timer for an extended period. >> >> The reproduction steps and patch verification results at link [1]. >> >> [1]: https://github.com/cai-fuqiang/kernel_test/tree/master/period_timer_test >> >> Fixes: 98c25ead5eda ("KVM: VMX: Move preemption timer <=> hrtimer dance to common x86") Yes, it's better, I will fix it. > > I'm pretty sure this should be: > > Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode") > > because that's where the bug with tsdeadline (incorrectly) wrapping was introduced. > The aforementioned commit exacerbated (and likely exposed?) the bug, but AFAICT > that commit itself didn't introduce any bugs (related to this issue). > >> Signed-off-by: fuqiang wang <fuqiang.wng@gmail.com> >> --- >> arch/x86/kvm/lapic.c | 8 +++++++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c >> index 5fc437341e03..afd349f4d933 100644 >> --- a/arch/x86/kvm/lapic.c >> +++ b/arch/x86/kvm/lapic.c >> @@ -2036,6 +2036,9 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic) >> u64 tscl = rdtsc(); >> ktime_t delta; >> >> + u64 delta_cycles_u; >> + u64 delta_cycles_s; >> + >> /* >> * Synchronize both deadlines to the same time source or >> * differences in the periods (caused by differences in the >> @@ -2047,8 +2050,11 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic) >> ktime_add_ns(apic->lapic_timer.target_expiration, >> apic->lapic_timer.period); >> delta = ktime_sub(apic->lapic_timer.target_expiration, now); >> + delta_cycles_u = nsec_to_cycles(apic->vcpu, abs(delta)); >> + delta_cycles_s = delta > 0 ? delta_cycles_u : -delta_cycles_u; >> + >> apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + >> - nsec_to_cycles(apic->vcpu, delta); >> + delta_cycles_s; > > This isn't quite correct either. E.g. if delta is negative while L1 TSC is tiny, > then subtracting the delta will incorrectly result the deadline wrapping too. > Very, very, theoretically, L1 TSC could even be '0', e.g. due to a weird offset > for L1, so I don't think subtracting is ever safe. Heh, of course we're hosed > in that case no matter what since KVM treats tscdeadline==0 as "not programmed". > > Anyways, can't we just skip adding negative value? Whether or not the TSC deadline > has expired is mostly a boolean value; for the vast majority of code it doesn't > matter exactly when the timer expired. I don’t think this patch will cause tscdeadline to wrap around to zero, because the system is unlikely to start a timer when the guest tsc is zero and have it expire immediately. However, keeping the logic to skip adding negative values is a good idea, seems like there’s no downside. > > The only code that cares is __kvm_wait_lapic_expire(), and the only downside to > setting tscdeadline=L1.TSC is that adjust_lapic_timer_advance() won't adjust as > aggressively as it probably should. I am not sure which type of timers should use the "advanced tscdeadline hrtimer expiration feature". I list the history of this feature. 1. Marcelo first introduce this feature, only support the tscdeadline sw timer. 2. Yunhong introduce vmx preemption timer(hv), only support for tscdeadline. 3. Liwanpeng extend the hv timer to oneshot and period timers. 4. Liwanpeng extend this feature to hv timer. 5. Sean and liwanpeng fix some BUG extend this feature to hv period/oneshot timer. [1] d0659d946be0("KVM: x86: add option to advance tscdeadline hrtimer expiration") Marcelo Tosatti Dec 16 2014 [2] ce7a058a2117("KVM: x86: support using the vmx preemption timer for tsc deadline timer") Yunhong Jiang Jun 13 2016 [3] 8003c9ae204e("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support") liwanpeng Oct 24 2016 [4] c5ce8235cffa("KVM: VMX: Optimize tscdeadline timer latency") liwanpeng May 29 2018 [5] ee66e453db13("KVM: lapic: Busy wait for timer to expire when using hv_timer") Sean Christopherson Apr 16 2019 d981dd15498b("KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer") liwanpeng Apr 28 2021 Now, timers supported for this feature includes: - sw: tscdeadline - hv: all (tscdeadline, oneshot, period) ==== IMHO ==== 1. for period timer =================== I think for periodic timers emulation, the expiration time is already adjusted to compensate for the delays introduced by timer emulation, so don't need this feature to adjust again. But after use the feature, the first timer expiration may be relatively accurate. E.g., At time 0, start a periodic task (period: 10,000 ns) with a simulated delay of 100 ns. With this feature enabled and reasonably accurate prediction, the expiration time set seen by the guest are: 10000, 20000, 30000... With this feature not enabled, the expiration times set: 10100, 20100, 30100... But IMHO, for periodic timers, accuracy of the period seems to be the main concern, because it does not frequently start and stop. The incorrect period caused by the first timer expiration can be ignored. 2. for oneshot timer ==================== In [1], Marcelo treated oneshot and tscdeadline differently. Shouldn’t the behavior of these two timers be similar? Unlike periodic timers, both oneshot and tscdeadline timers set a specific expiration time, and what the guest cares about is whether the expiration time is accurate. Moreover, this feature is mainly intended to mitigate the latency introduced by timer virtualization. Since software timers have higher latency compared to hardware virtual timers, the need for this feature is actually more urgent for software timers. However, in the current implementation, the feature is applied to hv oneshot/period timers, but not to sw oneshot/period timers. =============== Summary of IMHO =============== The feature should be applied to the following timer types: sw/hv tscdeadline and sw/hv oneshot should not be applied to: sw/hv period However, so far I haven’t found any discussion on the mailing lists regarding the commits summarized above. Please let me know if I’ve overlooked something. > > Ha! That's essentially what update_target_expiration() already does: > > now = ktime_get(); > remaining = ktime_sub(apic->lapic_timer.target_expiration, now); > if (ktime_to_ns(remaining) < 0) > remaining = 0; > > E.g. > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c > index 0ae7f913d782..2fb03a8a9ae9 100644 > --- a/arch/x86/kvm/lapic.c > +++ b/arch/x86/kvm/lapic.c > @@ -2131,18 +2131,26 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic) > ktime_t delta; > > /* > - * Synchronize both deadlines to the same time source or > - * differences in the periods (caused by differences in the > - * underlying clocks or numerical approximation errors) will > - * cause the two to drift apart over time as the errors > - * accumulate. > + * Use kernel time as the time source for both deadlines so that they > + * stay synchronized. Computing each deadline independently will cause > + * the two deadlines to drift apart over time as differences in the > + * periods accumulate, e.g. due to differences in the underlying clocks > + * or numerical approximation errors. > */ > apic->lapic_timer.target_expiration = > ktime_add_ns(apic->lapic_timer.target_expiration, > apic->lapic_timer.period); > delta = ktime_sub(apic->lapic_timer.target_expiration, now); > - apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + > - nsec_to_cycles(apic->vcpu, delta); > + > + /* > + * Don't adjust the TSC deadline if the next period has already expired, > + * e.g. due to software overhead resulting in delays larger than the > + * period. Blindly adding a negative delta could cause the deadline to > + * become excessively large due to the deadline being an unsigned value. > + */ > + apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl); > + if (delta > 0) > + apic->lapic_timer.tscdeadline += nsec_to_cycles(apic->vcpu, delta); > } > > static void start_sw_period(struct kvm_lapic *apic) Thank you for your patient explanations and for helping me make the revisions. I will update this in the next patch version. Regards, fuqiang ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period 2025-10-17 12:21 ` fuqiang wang @ 2025-10-17 15:59 ` Sean Christopherson 2025-10-21 15:37 ` fuqiang wang 0 siblings, 1 reply; 5+ messages in thread From: Sean Christopherson @ 2025-10-17 15:59 UTC (permalink / raw) To: fuqiang wang Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Maxim Levitsky, kvm, linux-kernel, yu chen, dongxu zhang On Fri, Oct 17, 2025, fuqiang wang wrote: > On 10/14/25 7:29 AM, Sean Christopherson wrote: > > On Wed, Oct 01, 2025, fuqiang wang wrote: > > The only code that cares is __kvm_wait_lapic_expire(), and the only downside to > > setting tscdeadline=L1.TSC is that adjust_lapic_timer_advance() won't adjust as > > aggressively as it probably should. > > I am not sure which type of timers should use the "advanced tscdeadline hrtimer > expiration feature". > > I list the history of this feature. > > 1. Marcelo first introduce this feature, only support the tscdeadline sw timer. > 2. Yunhong introduce vmx preemption timer(hv), only support for tscdeadline. > 3. Liwanpeng extend the hv timer to oneshot and period timers. > 4. Liwanpeng extend this feature to hv timer. > 5. Sean and liwanpeng fix some BUG extend this feature to hv period/oneshot timer. > > [1] d0659d946be0("KVM: x86: add option to advance tscdeadline hrtimer expiration") > Marcelo Tosatti Dec 16 2014 > [2] ce7a058a2117("KVM: x86: support using the vmx preemption timer for tsc deadline timer") > Yunhong Jiang Jun 13 2016 > [3] 8003c9ae204e("KVM: LAPIC: add APIC Timer periodic/oneshot mode VMX preemption timer support") > liwanpeng Oct 24 2016 > [4] c5ce8235cffa("KVM: VMX: Optimize tscdeadline timer latency") > liwanpeng May 29 2018 > [5] ee66e453db13("KVM: lapic: Busy wait for timer to expire when using hv_timer") > Sean Christopherson Apr 16 2019 > > d981dd15498b("KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer") > liwanpeng Apr 28 2021 > > Now, timers supported for this feature includes: > - sw: tscdeadline > - hv: all (tscdeadline, oneshot, period) > > ==== > IMHO > ==== > > 1. for period timer > =================== > > I think for periodic timers emulation, the expiration time is already adjusted > to compensate for the delays introduced by timer emulation, so don't need this > feature to adjust again. But after use the feature, the first timer expiration > may be relatively accurate. > > E.g., At time 0, start a periodic task (period: 10,000 ns) with a simulated > delay of 100 ns. > > With this feature enabled and reasonably accurate prediction, the expiration > time set seen by the guest are: 10000, 20000, 30000... > > With this feature not enabled, the expiration times set: 10100, 20100, 30100... > > But IMHO, for periodic timers, accuracy of the period seems to be the main > concern, because it does not frequently start and stop. The incorrect period > caused by the first timer expiration can be ignored. I agree it's superfluous, but applying the advancement also does no harm, and avoiding it would be moreeffort than simply letting KVM predict the first expiration. > 2. for oneshot timer > ==================== > > In [1], Marcelo treated oneshot and tscdeadline differently. Shouldn’t the > behavior of these two timers be similar? Yes, but they aren't identical, and so supporting both would require additional code, complexity, and testing. > Unlike periodic timers, both oneshot and tscdeadline timers set a specific > expiration time, and what the guest cares about is whether the expiration > time is accurate. Moreover, this feature is mainly intended to mitigate the > latency introduced by timer virtualization. Since software timers have > higher latency compared to hardware virtual timers, the need for this feature > is actually more urgent for software timers. Yep. > However, in the current implementation, the feature is applied to hv > oneshot/period timers, but not to sw oneshot/period timers. > > =============== > Summary of IMHO > =============== > > The feature should be applied to the following timer types: > sw/hv tscdeadline and sw/hv oneshot In a perfect world, probably? But I don't know that it's worth changing at this time. Much of this is balancing complexity with benefit, though it's also most definitely a reflection of the initial implementation. KVM unconditionally emulates TSC-deadline mode, and AFAIK every real-world kernel prefers TSC-deadline over one-shot, and so in practice the benefits of applying the advancement to one-shot hrtimers. That was also the way the world was headed back when Marcelo first implemented the support. I don't know for sure why the initial implementation targeted only TSC-deadline mode, but I think it's safe to assume that the use case Marcelo was targeting exclusively used TSC-deadline. I'm not entirely opposed to playing the advancement games with one-shot hrtimers, but it's also not clear to me that it's worth doing. E.g. supporting one-shot hrtimers would likely require a bit of extra complexity to juggle the different time domains. And if the only use cases that are truly sensitive to timer programming latency exclusively use TSC-deadline mode (because one-shot mode is inherently "fuzzy"), then any amount of extra complexity is effectively dead weight. > should not be applied to: > sw/hv period I wouldn't say "should not be applied to", I think it's more "doesn't provide much benefit to". ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period 2025-10-17 15:59 ` Sean Christopherson @ 2025-10-21 15:37 ` fuqiang wang 0 siblings, 0 replies; 5+ messages in thread From: fuqiang wang @ 2025-10-21 15:37 UTC (permalink / raw) To: Sean Christopherson Cc: Paolo Bonzini, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86, H . Peter Anvin, Maxim Levitsky, kvm, linux-kernel, yu chen, dongxu zhang On 10/17/25 11:59 PM, Sean Christopherson wrote: >> ==== >> IMHO >> ==== >> >> 1. for period timer >> =================== >> >> I think for periodic timers emulation, the expiration time is already adjusted >> to compensate for the delays introduced by timer emulation, so don't need this >> feature to adjust again. But after use the feature, the first timer expiration >> may be relatively accurate. >> >> E.g., At time 0, start a periodic task (period: 10,000 ns) with a simulated >> delay of 100 ns. >> >> With this feature enabled and reasonably accurate prediction, the expiration >> time set seen by the guest are: 10000, 20000, 30000... >> >> With this feature not enabled, the expiration times set: 10100, 20100, 30100... >> >> But IMHO, for periodic timers, accuracy of the period seems to be the main >> concern, because it does not frequently start and stop. The incorrect period >> caused by the first timer expiration can be ignored. > > I agree it's superfluous, but applying the advancement also does no harm, and > avoiding it would be moreeffort than simply letting KVM predict the first expiration. > Yes, that’s indeed the case. > KVM unconditionally emulates TSC-deadline mode, and AFAIK every real-world kernel > prefers TSC-deadline over one-shot, and so in practice the benefits of applying > the advancement to one-shot hrtimers. That was also the way the world was headed > back when Marcelo first implemented the support. I don't know for sure why the > initial implementation targeted only TSC-deadline mode, but I think it's safe to > assume that the use case Marcelo was targeting exclusively used TSC-deadline. Yes, it appears that focusing on TSC-deadline emulation fits the current use cases. > > I'm not entirely opposed to playing the advancement games with one-shot hrtimers, > but it's also not clear to me that it's worth doing. E.g. supporting one-shot > hrtimers would likely require a bit of extra complexity to juggle the different > time domains. And if the only use cases that are truly sensitive to timer > programming latency exclusively use TSC-deadline mode (because one-shot mode is > inherently "fuzzy"), then any amount of extra complexity is effectively dead weight. > >> should not be applied to: >> sw/hv period > > I wouldn't say "should not be applied to", I think it's more "doesn't provide much > benefit to". Thanks again for your clear explanation and insights. This really helped me understand the design choices better. :) Regards, fuqiang ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-10-21 15:38 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-13 12:51 [PATCH RESEND] avoid hv timer fallback to sw timer if delay exceeds period fuqiang wang 2025-10-13 23:29 ` Sean Christopherson 2025-10-17 12:21 ` fuqiang wang 2025-10-17 15:59 ` Sean Christopherson 2025-10-21 15:37 ` fuqiang wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox