* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load [not found] ` <adf8Q1VSeAMMyCa_@google.com> @ 2026-05-06 9:48 ` Thorsten Leemhuis 2026-05-06 12:55 ` Sean Christopherson 0 siblings, 1 reply; 9+ messages in thread From: Thorsten Leemhuis @ 2026-05-06 9:48 UTC (permalink / raw) To: Sean Christopherson, Lei Chen Cc: igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini, Linux kernel regressions list On 4/9/26 21:21, Sean Christopherson wrote: > On Thu, Apr 09, 2026, Lei Chen wrote: >> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") >> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. >> >> As a result, kvm_arch_vcpu_load() can queue global clock update requests >> every time a vCPU is scheduled when the master clock is disabled or when >> the vCPU is loaded for the first time. >> >> Restore the throttling with a per-VM ratelimit state and gate >> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU >> scheduling does not generate a steady stream of redundant clock update >> requests. >> >> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") >> Signed-off-by: Lei Chen <lei.chen@smartx.com> >> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> >> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ Was this performance regression ever addressed? Looks like this fall through the cracks, but it's easy to miss something. Ciao, Thorsten >> --- >> CHANGELOG: >> v2: >> - remove comment of kvmclock_update_rs >> - make sure kvm_arch_vcpu_load make KVM_REQ_CLOCK_UPDATE for this vcpu >> - add RATELIMIT_MSG_ON_RELEASE to kvmclock_update_rs >> >> v1: >> - initial version(https://lore.kernel.org/all/20260407070046.2336-1-lei.chen@smartx.com/) >> --- >> arch/x86/include/asm/kvm_host.h | 1 + >> arch/x86/kvm/x86.c | 11 +++++++++-- >> 2 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h >> index 5a3bfa293e8b..5e750c49d21e 100644 >> --- a/arch/x86/include/asm/kvm_host.h >> +++ b/arch/x86/include/asm/kvm_host.h >> @@ -1453,6 +1453,7 @@ struct kvm_arch { >> bool use_master_clock; >> u64 master_kernel_ns; >> u64 master_cycle_now; >> + struct ratelimit_state kvmclock_update_rs; >> >> #ifdef CONFIG_KVM_HYPERV >> struct kvm_hv hyperv; >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 63afdb6bb078..a534e8391611 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -5210,8 +5210,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >> * On a host with synchronized TSC, there is no need to update >> * kvmclock on vcpu->cpu migration >> */ >> - if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) >> - kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu); >> + if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) { >> + if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs)) >> + kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu); >> + else >> + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > > What I was trying to call out in my review of v1, is that prior to commit > 446fcce2a52b, the effectively ratelimiting applied to *all* instances of > KVM_REQ_GLOBAL_CLOCK_UPDATE. Which meant that KVM's existing behavior is that > kvm_write_system_time() would be subject to the ratelimiting as well. > > That said, I don't see any obvious problems with immediately honoring writes to > MSR_KVM_SYSTEM_TIME{,_NEW}, and it's probably a much better experience for the > guest. So I'm a-ok with this approach, but we should call out that skipping the > synthetic MSR case is deliberate. No need for a v3, I'll add a blurb when > applying. > >> + } >> + >> if (vcpu->cpu != cpu) >> kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu); >> vcpu->cpu = cpu; >> @@ -13189,6 +13194,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) >> raw_spin_lock_init(&kvm->arch.tsc_write_lock); >> mutex_init(&kvm->arch.apic_map_lock); >> seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock); >> + ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10); >> + ratelimit_set_flags(&kvm->arch.kvmclock_update_rs, RATELIMIT_MSG_ON_RELEASE); > > IIUC, so long was KVM doesn't explicitly invoke ratelimit_state_exit(), setting > RATELIMIT_MSG_ON_RELEASE means we won't get dmesg spam? To be clear, I'm 100% > in favor of suppressing dmesg output. > >> kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); >> >> raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); >> -- >> 2.51.0 >> > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load 2026-05-06 9:48 ` [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load Thorsten Leemhuis @ 2026-05-06 12:55 ` Sean Christopherson 2026-05-06 14:09 ` Thorsten Leemhuis 0 siblings, 1 reply; 9+ messages in thread From: Sean Christopherson @ 2026-05-06 12:55 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Lei Chen, igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini, Linux kernel regressions list On Wed, May 06, 2026, Thorsten Leemhuis wrote: > On 4/9/26 21:21, Sean Christopherson wrote: > > On Thu, Apr 09, 2026, Lei Chen wrote: > >> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > >> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. > >> > >> As a result, kvm_arch_vcpu_load() can queue global clock update requests > >> every time a vCPU is scheduled when the master clock is disabled or when > >> the vCPU is loaded for the first time. > >> > >> Restore the throttling with a per-VM ratelimit state and gate > >> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU > >> scheduling does not generate a steady stream of redundant clock update > >> requests. > >> > >> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > >> Signed-off-by: Lei Chen <lei.chen@smartx.com> > >> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > >> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ > > Was this performance regression ever addressed? Nope, not yet. > Looks like this fall through the cracks, but it's easy to miss something. It's in my list of patches to apply (probably for 7.2?). I didn't want to squeeze it into the initial 7.1 pull request for a variety of reasons. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load 2026-05-06 12:55 ` Sean Christopherson @ 2026-05-06 14:09 ` Thorsten Leemhuis 2026-05-06 15:22 ` Sean Christopherson 0 siblings, 1 reply; 9+ messages in thread From: Thorsten Leemhuis @ 2026-05-06 14:09 UTC (permalink / raw) To: Sean Christopherson Cc: Lei Chen, igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini, Linux kernel regressions list, Linus Torvalds On 5/6/26 14:55, Sean Christopherson wrote: > On Wed, May 06, 2026, Thorsten Leemhuis wrote: >> On 4/9/26 21:21, Sean Christopherson wrote: >>> On Thu, Apr 09, 2026, Lei Chen wrote: >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. >>>> >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests >>>> every time a vCPU is scheduled when the master clock is disabled or when >>>> the vCPU is loaded for the first time. >>>> >>>> Restore the throttling with a per-VM ratelimit state and gate >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU >>>> scheduling does not generate a steady stream of redundant clock update >>>> requests. >>>> >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com> >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ >> >> Was this performance regression ever addressed? > Nope, not yet. > >> Looks like this fall through the cracks, but it's easy to miss something. > > It's in my list of patches to apply (probably for 7.2?). I didn't want to squeeze > it into the initial 7.1 pull request for a variety of reasons. Hmmm. CCing Linus so he can speak up if he wants to about the following: Given that this is a fix for a performance regression[1] I'd say it's not as urgent as a "something stopped working" case -- so I guess it's something where the "[fix] "within a week", preferably before the next rc" approach Linus recently mentioned does not need to be applied strictly. But Jaroslav OTOH reported it more than 7 weeks ago already and back then called it something that "severely impacts KVM hosts running many Firecracker microVMs."[1]; and a potential fix exists for 4 weeks already. Due to that, 7.2 feels a bit too far away for me, as that is still ~15 weeks away. But maybe that's just me. Ciao, Thorsten [1] https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load 2026-05-06 14:09 ` Thorsten Leemhuis @ 2026-05-06 15:22 ` Sean Christopherson 2026-05-06 15:58 ` Jaroslav Pulchart 0 siblings, 1 reply; 9+ messages in thread From: Sean Christopherson @ 2026-05-06 15:22 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Lei Chen, igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini, Linux kernel regressions list, Linus Torvalds On Wed, May 06, 2026, Thorsten Leemhuis wrote: > On 5/6/26 14:55, Sean Christopherson wrote: > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > >> On 4/9/26 21:21, Sean Christopherson wrote: > >>> On Thu, Apr 09, 2026, Lei Chen wrote: > >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. > >>>> > >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests > >>>> every time a vCPU is scheduled when the master clock is disabled or when > >>>> the vCPU is loaded for the first time. > >>>> > >>>> Restore the throttling with a per-VM ratelimit state and gate > >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU > >>>> scheduling does not generate a steady stream of redundant clock update > >>>> requests. > >>>> > >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com> > >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ > >> > >> Was this performance regression ever addressed? > > Nope, not yet. > > > >> Looks like this fall through the cracks, but it's easy to miss something. > > > > It's in my list of patches to apply (probably for 7.2?). I didn't want to squeeze > > it into the initial 7.1 pull request for a variety of reasons. > > Hmmm. CCing Linus so he can speak up if he wants to about the following: > > Given that this is a fix for a performance regression[1] I'd say it's > not as urgent as a "something stopped working" case -- so I guess it's > something where the "[fix] "within a week", preferably before the next > rc" approach Linus recently mentioned does not need to be applied strictly. > > But Jaroslav OTOH reported it more than 7 weeks ago already and back > then called it something that "severely impacts KVM hosts running many > Firecracker microVMs."[1]; For a setup that is likely broken. On modern hardware, the path in question should never actually be hit. I do want to resolve the bug since older hardware and funky setups do rely on the old behavior, but it's not pants-on-fire urgent. More importantly, the original reporter(s) hasn't responded to any of our questions, or to the proposed fix. I'm not going to rush in a fix if I don't actually *know* it's going to fix the original problem. > and a potential fix exists for 4 weeks already. Due to that, 7.2 feels a bit > too far away for me, as that is still ~15 weeks away. But maybe that's just > me. The "user" is also a fairly sizeable company, not some random person that's trying to use KVM and is blocked. I highly doubt they are still actually running a buggy kernel. E.g. based on a "same workload after rollback" comment in the bug report, I assume they simply rolled back to the last good kernel (6.18). Who knows, maybe they also took our hints/suggestions about theire setup being wonky and addressed whatever hiccup was sending them down the uncommon, already- slow path. All in all, AFAICT the only difference between sending this into 7.1 vs. 7.2 is that the reporter won't be able to upgrade their kernel (without patching) for an extra ~8 weeks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load 2026-05-06 15:22 ` Sean Christopherson @ 2026-05-06 15:58 ` Jaroslav Pulchart 2026-05-06 20:31 ` Sean Christopherson 0 siblings, 1 reply; 9+ messages in thread From: Jaroslav Pulchart @ 2026-05-06 15:58 UTC (permalink / raw) To: Sean Christopherson, Thorsten Leemhuis Cc: Lei Chen, igor, jan.cipa, kvm, linux-kernel, pbonzini, Linux kernel regressions list, Linus Torvalds > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > > On 5/6/26 14:55, Sean Christopherson wrote: > > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > > >> On 4/9/26 21:21, Sean Christopherson wrote: > > >>> On Thu, Apr 09, 2026, Lei Chen wrote: > > >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > > >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. > > >>>> > > >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests > > >>>> every time a vCPU is scheduled when the master clock is disabled or when > > >>>> the vCPU is loaded for the first time. > > >>>> > > >>>> Restore the throttling with a per-VM ratelimit state and gate > > >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU > > >>>> scheduling does not generate a steady stream of redundant clock update > > >>>> requests. > > >>>> > > >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > > >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com> > > >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > > >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ > > >> > > >> Was this performance regression ever addressed? > > > Nope, not yet. > > > > > >> Looks like this fall through the cracks, but it's easy to miss something. > > > > > > It's in my list of patches to apply (probably for 7.2?). I didn't want to squeeze > > > it into the initial 7.1 pull request for a variety of reasons. > > > > Hmmm. CCing Linus so he can speak up if he wants to about the following: > > > > Given that this is a fix for a performance regression[1] I'd say it's > > not as urgent as a "something stopped working" case -- so I guess it's > > something where the "[fix] "within a week", preferably before the next > > rc" approach Linus recently mentioned does not need to be applied strictly. > > > > But Jaroslav OTOH reported it more than 7 weeks ago already and back > > then called it something that "severely impacts KVM hosts running many > > Firecracker microVMs."[1]; > > For a setup that is likely broken. On modern hardware, the path in question > should never actually be hit. I do want to resolve the bug since older hardware > and funky setups do rely on the old behavior, but it's not pants-on-fire urgent. > > More importantly, the original reporter(s) hasn't responded to any of our questions, > or to the proposed fix. I'm not going to rush in a fix if I don't actually *know* > it's going to fix the original problem. Hi Sean, Thorsten, sorry for the missing response from my side, this thread unfortunately ended up in trash due to mail filters on my side and I completely missed it. I currently don't have the full context loaded back in yet, but I'll re-read the thread and follow up properly once I do. For additional context, we are currently running the latest 6.19/7.0.y kernels with a revert of the commits causing the reported regression, and the hardware is AMD EPYC 9454P 48-Core Processor. Jaroslav > > > and a potential fix exists for 4 weeks already. Due to that, 7.2 feels a bit > > too far away for me, as that is still ~15 weeks away. But maybe that's just > > me. > > The "user" is also a fairly sizeable company, not some random person that's trying > to use KVM and is blocked. I highly doubt they are still actually running a buggy > kernel. E.g. based on a "same workload after rollback" comment in the bug report, > I assume they simply rolled back to the last good kernel (6.18). > > Who knows, maybe they also took our hints/suggestions about theire setup being > wonky and addressed whatever hiccup was sending them down the uncommon, already- > slow path. > > All in all, AFAICT the only difference between sending this into 7.1 vs. 7.2 is > that the reporter won't be able to upgrade their kernel (without patching) for an > extra ~8 weeks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load 2026-05-06 15:58 ` Jaroslav Pulchart @ 2026-05-06 20:31 ` Sean Christopherson 2026-05-07 9:27 ` Jaroslav Pulchart 0 siblings, 1 reply; 9+ messages in thread From: Sean Christopherson @ 2026-05-06 20:31 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Thorsten Leemhuis, Lei Chen, igor, jan.cipa, kvm, linux-kernel, pbonzini, Linux kernel regressions list, Linus Torvalds On Wed, May 06, 2026, Jaroslav Pulchart wrote: > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > > > On 5/6/26 14:55, Sean Christopherson wrote: > > > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > > > >> On 4/9/26 21:21, Sean Christopherson wrote: > > > >>> On Thu, Apr 09, 2026, Lei Chen wrote: > > > >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > > > >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. > > > >>>> > > > >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests > > > >>>> every time a vCPU is scheduled when the master clock is disabled or when > > > >>>> the vCPU is loaded for the first time. > > > >>>> > > > >>>> Restore the throttling with a per-VM ratelimit state and gate > > > >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU > > > >>>> scheduling does not generate a steady stream of redundant clock update > > > >>>> requests. > > > >>>> > > > >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > > > >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com> > > > >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > > > >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ > > > >> > > > >> Was this performance regression ever addressed? > > > > Nope, not yet. > > > > > > > >> Looks like this fall through the cracks, but it's easy to miss something. > > > > > > > > It's in my list of patches to apply (probably for 7.2?). I didn't want to squeeze > > > > it into the initial 7.1 pull request for a variety of reasons. > > > > > > Hmmm. CCing Linus so he can speak up if he wants to about the following: > > > > > > Given that this is a fix for a performance regression[1] I'd say it's > > > not as urgent as a "something stopped working" case -- so I guess it's > > > something where the "[fix] "within a week", preferably before the next > > > rc" approach Linus recently mentioned does not need to be applied strictly. > > > > > > But Jaroslav OTOH reported it more than 7 weeks ago already and back > > > then called it something that "severely impacts KVM hosts running many > > > Firecracker microVMs."[1]; > > > > For a setup that is likely broken. On modern hardware, the path in question > > should never actually be hit. I do want to resolve the bug since older hardware > > and funky setups do rely on the old behavior, but it's not pants-on-fire urgent. > > > > More importantly, the original reporter(s) hasn't responded to any of our questions, > > or to the proposed fix. I'm not going to rush in a fix if I don't actually *know* > > it's going to fix the original problem. > > Hi Sean, Thorsten, > > sorry for the missing response from my side, this thread unfortunately > ended up in trash due to mail filters on my side and I completely > missed it. No worries, gmail's Spam filter is my nemesis :-) > I currently don't have the full context loaded back in yet, but I'll re-read > the thread and follow up properly once I do. I think the only remaining question is why/how KVM's master clock is getting disabled. But that's more of a question for your deployment than it is a question for upstream; it's possible there's a different KVM bug lurking, but it's more likely that something in your setup is incompatible with using the master clock. Note, it's certainly not "wrong" for the master clock to be disabled, but it's quite suprising, especially for Firecracker VMs. It's worth investigating as there might be an underlying issue that's very easy to address, and "fixing" it should provide (very) small performance benefits. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load 2026-05-06 20:31 ` Sean Christopherson @ 2026-05-07 9:27 ` Jaroslav Pulchart 2026-05-07 19:09 ` Sean Christopherson 0 siblings, 1 reply; 9+ messages in thread From: Jaroslav Pulchart @ 2026-05-07 9:27 UTC (permalink / raw) To: Sean Christopherson Cc: Thorsten Leemhuis, Lei Chen, igor, jan.cipa, kvm, linux-kernel, pbonzini, Linux kernel regressions list, Linus Torvalds > > On Wed, May 06, 2026, Jaroslav Pulchart wrote: > > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > > > > On 5/6/26 14:55, Sean Christopherson wrote: > > > > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > > > > >> On 4/9/26 21:21, Sean Christopherson wrote: > > > > >>> On Thu, Apr 09, 2026, Lei Chen wrote: > > > > >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > > > > >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. > > > > >>>> > > > > >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests > > > > >>>> every time a vCPU is scheduled when the master clock is disabled or when > > > > >>>> the vCPU is loaded for the first time. > > > > >>>> > > > > >>>> Restore the throttling with a per-VM ratelimit state and gate > > > > >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU > > > > >>>> scheduling does not generate a steady stream of redundant clock update > > > > >>>> requests. > > > > >>>> > > > > >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > > > > >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com> > > > > >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > > > > >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ > > > > >> > > > > >> Was this performance regression ever addressed? > > > > > Nope, not yet. > > > > > > > > > >> Looks like this fall through the cracks, but it's easy to miss something. > > > > > > > > > > It's in my list of patches to apply (probably for 7.2?). I didn't want to squeeze > > > > > it into the initial 7.1 pull request for a variety of reasons. > > > > > > > > Hmmm. CCing Linus so he can speak up if he wants to about the following: > > > > > > > > Given that this is a fix for a performance regression[1] I'd say it's > > > > not as urgent as a "something stopped working" case -- so I guess it's > > > > something where the "[fix] "within a week", preferably before the next > > > > rc" approach Linus recently mentioned does not need to be applied strictly. > > > > > > > > But Jaroslav OTOH reported it more than 7 weeks ago already and back > > > > then called it something that "severely impacts KVM hosts running many > > > > Firecracker microVMs."[1]; > > > > > > For a setup that is likely broken. On modern hardware, the path in question > > > should never actually be hit. I do want to resolve the bug since older hardware > > > and funky setups do rely on the old behavior, but it's not pants-on-fire urgent. > > > > > > More importantly, the original reporter(s) hasn't responded to any of our questions, > > > or to the proposed fix. I'm not going to rush in a fix if I don't actually *know* > > > it's going to fix the original problem. > > > > Hi Sean, Thorsten, > > > > sorry for the missing response from my side, this thread unfortunately > > ended up in trash due to mail filters on my side and I completely > > missed it. > > No worries, gmail's Spam filter is my nemesis :-) > > > I currently don't have the full context loaded back in yet, but I'll re-read > > the thread and follow up properly once I do. > > I think the only remaining question is why/how KVM's master clock is getting > disabled. But that's more of a question for your deployment than it is a question > for upstream; it's possible there's a different KVM bug lurking, but it's more > likely that something in your setup is incompatible with using the master clock. > > Note, it's certainly not "wrong" for the master clock to be disabled, but it's > quite suprising, especially for Firecracker VMs. It's worth investigating as > there might be an underlying issue that's very easy to address, and "fixing" it > should provide (very) small performance benefits. I've dug into the "master clock question" and have an idea. Our Firecracker hosts are themselves L1 KVM VMs (nested virtualisation) running on AMD EPYC 9454P and EPYC 9455 hardware. Even though the compute nodes use cpu_mode=host-passthrough in qemu kvm, the invtsc CPUID bit is filtered out by QEMU, which I hadn't realized. Without it the guest kernel marks the TSC unstable at boot: tsc: Marking TSC unstable due to TSCs unsynchronized and falls back to kvm-clock as its clocksource. I suppose that in turn prevents KVM from enabling the master clock for any L2 guests (the Firecracker microVMs), am I right? I have resolved the issue by explicitly adding +invtsc to cpu_model_extra_flags in our OpenStack nova.conf. After this change the L1 VMs now correctly show constant_tsc and nonstop_tsc in /proc/cpuinfo and switch clocksource to tsc. I also confirmed the IPI storm disappears without the v2 patch when +invtsc is present, and returns when it is absent on a vanilla 7.0.3 kernel. So could this be the answer to your question: "the master clock was disabled because QEMU silently drops invtsc even in host-passthrough mode"? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load 2026-05-07 9:27 ` Jaroslav Pulchart @ 2026-05-07 19:09 ` Sean Christopherson 0 siblings, 0 replies; 9+ messages in thread From: Sean Christopherson @ 2026-05-07 19:09 UTC (permalink / raw) To: Jaroslav Pulchart Cc: Thorsten Leemhuis, Lei Chen, igor, jan.cipa, kvm, linux-kernel, pbonzini, Linux kernel regressions list, Linus Torvalds On Thu, May 07, 2026, Jaroslav Pulchart wrote: > > On Wed, May 06, 2026, Jaroslav Pulchart wrote: > > > > On Wed, May 06, 2026, Thorsten Leemhuis wrote: > > I think the only remaining question is why/how KVM's master clock is getting > > disabled. But that's more of a question for your deployment than it is a question > > for upstream; it's possible there's a different KVM bug lurking, but it's more > > likely that something in your setup is incompatible with using the master clock. > > > > Note, it's certainly not "wrong" for the master clock to be disabled, but it's > > quite suprising, especially for Firecracker VMs. It's worth investigating as > > there might be an underlying issue that's very easy to address, and "fixing" it > > should provide (very) small performance benefits. > > I've dug into the "master clock question" and have an idea. > > Our Firecracker hosts are themselves L1 KVM VMs (nested > virtualisation) running on AMD EPYC 9454P and EPYC 9455 hardware. Even > though the compute nodes use cpu_mode=host-passthrough in qemu kvm, > the invtsc CPUID bit is filtered out by QEMU, which I hadn't realized. > Without it the guest kernel marks the TSC unstable at boot: > tsc: Marking TSC unstable due to TSCs unsynchronized > and falls back to kvm-clock as its clocksource. > > I suppose that in turn prevents KVM from enabling the master clock for > any L2 guests (the Firecracker microVMs), am I right? > > I have resolved the issue by explicitly adding +invtsc to > cpu_model_extra_flags in our OpenStack nova.conf. After this change > the L1 VMs now correctly show constant_tsc and nonstop_tsc in > /proc/cpuinfo and switch clocksource to tsc. I also confirmed the IPI > storm disappears without the v2 patch when +invtsc is present, and > returns when it is absent on a vanilla 7.0.3 kernel. > > So could this be the answer to your question: "the master clock was > disabled because QEMU silently drops invtsc even in host-passthrough > mode"? Yep, that'd do it. Linux-as-a-guest will prefer kvmclock over TSC if the TSC isnt constant and non-stop. That in turn will prevent KVM (as the L1 hypervisor) from using the master clock, since it sees the kernel clocksource as not being (directly) based on TSC. Thanks for the follow-up! ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load [not found] ` <20260409142226.2581-1-lei.chen@smartx.com> [not found] ` <adf8Q1VSeAMMyCa_@google.com> @ 2026-05-06 20:10 ` Jaroslav Pulchart 1 sibling, 0 replies; 9+ messages in thread From: Jaroslav Pulchart @ 2026-05-06 20:10 UTC (permalink / raw) To: Lei Chen; +Cc: seanjc, igor, jan.cipa, kvm, linux-kernel, pbonzini > > commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE. > > As a result, kvm_arch_vcpu_load() can queue global clock update requests > every time a vCPU is scheduled when the master clock is disabled or when > the vCPU is loaded for the first time. > > Restore the throttling with a per-VM ratelimit state and gate > KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU > scheduling does not generate a steady stream of redundant clock update > requests. > > Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"") > Signed-off-by: Lei Chen <lei.chen@smartx.com> > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com> > Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/ > --- > CHANGELOG: > v2: > - remove comment of kvmclock_update_rs > - make sure kvm_arch_vcpu_load make KVM_REQ_CLOCK_UPDATE for this vcpu > - add RATELIMIT_MSG_ON_RELEASE to kvmclock_update_rs > > v1: > - initial version(https://lore.kernel.org/all/20260407070046.2336-1-lei.chen@smartx.com/) > --- > arch/x86/include/asm/kvm_host.h | 1 + > arch/x86/kvm/x86.c | 11 +++++++++-- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 5a3bfa293e8b..5e750c49d21e 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -1453,6 +1453,7 @@ struct kvm_arch { > bool use_master_clock; > u64 master_kernel_ns; > u64 master_cycle_now; > + struct ratelimit_state kvmclock_update_rs; > > #ifdef CONFIG_KVM_HYPERV > struct kvm_hv hyperv; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 63afdb6bb078..a534e8391611 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -5210,8 +5210,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) > * On a host with synchronized TSC, there is no need to update > * kvmclock on vcpu->cpu migration > */ > - if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) > - kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu); > + if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) { > + if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs)) > + kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu); > + else > + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); > + } > + > if (vcpu->cpu != cpu) > kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu); > vcpu->cpu = cpu; > @@ -13189,6 +13194,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) > raw_spin_lock_init(&kvm->arch.tsc_write_lock); > mutex_init(&kvm->arch.apic_map_lock); > seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock); > + ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10); > + ratelimit_set_flags(&kvm->arch.kvmclock_update_rs, RATELIMIT_MSG_ON_RELEASE); > kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); > > raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); > -- > 2.51.0 > Hi Lei, I tested the v2 patch on a Firecracker host running vanilla kernel 7.0.3 with this patch applied, the IPI storm on vCPU load is gone! ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-05-07 19:09 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <adVGrJaRlRooO4su@google.com>
[not found] ` <20260409142226.2581-1-lei.chen@smartx.com>
[not found] ` <adf8Q1VSeAMMyCa_@google.com>
2026-05-06 9:48 ` [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load Thorsten Leemhuis
2026-05-06 12:55 ` Sean Christopherson
2026-05-06 14:09 ` Thorsten Leemhuis
2026-05-06 15:22 ` Sean Christopherson
2026-05-06 15:58 ` Jaroslav Pulchart
2026-05-06 20:31 ` Sean Christopherson
2026-05-07 9:27 ` Jaroslav Pulchart
2026-05-07 19:09 ` Sean Christopherson
2026-05-06 20:10 ` Jaroslav Pulchart
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox