[REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
@ 2026-03-21 14:32 Jaroslav Pulchart
  2026-03-23  2:27 ` Lei Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Jaroslav Pulchart @ 2026-03-21 14:32 UTC (permalink / raw)
  To: kvm; +Cc: LKML, seanjc, pbonzini, lei.chen, Igor Raits, Jan Cipa

Hi,

I am reporting a performance regression in Linux 6.19 that severely
impacts KVM hosts running many Firecracker microVMs.

== Bisect result ==

446fcce2a52b533c543dabba26777813c347577c is the first bad commit
commit 446fcce2a52b533c543dabba26777813c347577c
Author: Lei Chen <lei.chen@smartx.com>
Date:   Tue Aug 19 23:20:26 2025 +0800

    Revert "x86: kvm: rate-limit global clock updates"

    This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.

    Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
    intends to use a kvmclock_update_work to sync ntp corretion
    across all vcpus kvmclock, which is based on commit 0061d53daf26f
    ("KVM: x86: limit difference between kvmclock updates")

    Since kvmclock has been switched to mono raw, this commit can be
    reverted.

    Signed-off-by: Lei Chen <lei.chen@smartx.com>
    Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>

 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/x86.c              | 29 ++++-------------------------
 2 files changed, 4 insertions(+), 26 deletions(-)

==== Symptoms ====

Measured on a KVM micro VM host running many Firecracker microVMs
(node_exporter metrics, 2026-03-20):

  kernel 6.19:
    steal time inside guest VMs:    3–24% per vCPU  (sustained)
    host system CPU (kernel mode):  3–12 CPUs saturated
    host steal:                     3–8%

  kernel 6.18 (same host, same workload after rollback):
    steal time inside guest VMs:    < 0.02% per vCPU  (~200x lower)
    host system CPU (kernel mode):  2–3 CPUs
    host steal:                     0.3–0.5%

==== Root cause (by AI analyze) ====

The regressing commit removes the rate-limiting from
kvm_gen_kvmclock_update(). Previously this function deferred the
all-vCPU kick via a 100ms delayed_work:

  /* 6.18 */
  static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
      kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
      schedule_delayed_work(&kvm->arch.kvmclock_update_work,
                            KVMCLOCK_UPDATE_DELAY);  /* 100ms */
  }

After the revert it kicks every vCPU of the VM synchronously on
every call:

  /* 6.19 */
  static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
      kvm_for_each_vcpu(i, vcpu, kvm) {
          kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
          kvm_vcpu_kick(vcpu);
      }
  }

KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
is issued on every vCPU load when use_master_clock is false
(arch/x86/kvm/x86.c, kvm_vcpu_load):

    if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
        kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);

With many Firecracker microVMs, the vCPU scheduling rate is high.
Each scheduling event now IPIs every sibling vCPU of the VM, instead
of coalescing all-vCPU kicks into at most one per 100ms. This creates
a continuous IPI storm on the host, visible as high kernel (system)
CPU time and high steal time inside guest VMs.

The commit justifies the removal with "Since kvmclock has been switched
to mono raw, this commit can be reverted." That reasoning is correct
for the NTP-correction use case, but the 100ms rate-limit also
protected against IPI storms when use_master_clock is false — a
concern independent of clock source.

==== Full bisect log ====

git bisect start
# status: waiting for both good and bad commits
# good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
# status: waiting for bad commit, 1 good commit known
# bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
# good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
'hwmon-for-v6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
# bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
'tty-6.19-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
# bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
'soc-arm-6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
# good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
'tracepoints-v6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
git bisect good 36492b7141b9abc967e92c991af32c670351dc16
# good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
'pull-persistency' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
# bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
# bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
# bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
unused declaration kvm_mmu_may_ignore_guest_pat()
git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
# bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
feature to track the MMIO Stale Data mitigation
git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
# good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
Explicitly set new periodic hrtimer expiration in apic_timer_fn()
git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
# bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
comment about ntp correction sync for
git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
# good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
lapic_timer in a local variable to cleanup periodic code
git bisect good a091fe60c2d3943b058132a64682a509d55bd325
# bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
rate-limit global clock updates"
git bisect bad 446fcce2a52b533c543dabba26777813c347577c
# good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
introduce periodic global clock updates"
git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
# first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
"x86: kvm: rate-limit global clock updates"

Best regards,
Jaroslav Pulchart

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
  2026-03-21 14:32 [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time Jaroslav Pulchart
@ 2026-03-23  2:27 ` Lei Chen
  2026-04-01  6:43   ` Lei Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Lei Chen @ 2026-03-23  2:27 UTC (permalink / raw)
  To: Jaroslav Pulchart; +Cc: kvm, LKML, seanjc, pbonzini, Igor Raits, Jan Cipa

Hi Jaroslav,

Thanks for your test and report, I'm looking into this problem.

Best regards
Lei Chen

On Sat, Mar 21, 2026 at 10:33 PM Jaroslav Pulchart
<jaroslav.pulchart@gooddata.com> wrote:
>
> Hi,
>
> I am reporting a performance regression in Linux 6.19 that severely
> impacts KVM hosts running many Firecracker microVMs.
>
> == Bisect result ==
>
> 446fcce2a52b533c543dabba26777813c347577c is the first bad commit
> commit 446fcce2a52b533c543dabba26777813c347577c
> Author: Lei Chen <lei.chen@smartx.com>
> Date:   Tue Aug 19 23:20:26 2025 +0800
>
>     Revert "x86: kvm: rate-limit global clock updates"
>
>     This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
>
>     Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
>     intends to use a kvmclock_update_work to sync ntp corretion
>     across all vcpus kvmclock, which is based on commit 0061d53daf26f
>     ("KVM: x86: limit difference between kvmclock updates")
>
>     Since kvmclock has been switched to mono raw, this commit can be
>     reverted.
>
>     Signed-off-by: Lei Chen <lei.chen@smartx.com>
>     Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
>     Signed-off-by: Sean Christopherson <seanjc@google.com>
>
>  arch/x86/include/asm/kvm_host.h |  1 -
>  arch/x86/kvm/x86.c              | 29 ++++-------------------------
>  2 files changed, 4 insertions(+), 26 deletions(-)
>
> ==== Symptoms ====
>
> Measured on a KVM micro VM host running many Firecracker microVMs
> (node_exporter metrics, 2026-03-20):
>
>   kernel 6.19:
>     steal time inside guest VMs:    3–24% per vCPU  (sustained)
>     host system CPU (kernel mode):  3–12 CPUs saturated
>     host steal:                     3–8%
>
>   kernel 6.18 (same host, same workload after rollback):
>     steal time inside guest VMs:    < 0.02% per vCPU  (~200x lower)
>     host system CPU (kernel mode):  2–3 CPUs
>     host steal:                     0.3–0.5%
>
> ==== Root cause (by AI analyze) ====
>
> The regressing commit removes the rate-limiting from
> kvm_gen_kvmclock_update(). Previously this function deferred the
> all-vCPU kick via a 100ms delayed_work:
>
>   /* 6.18 */
>   static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
>       kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>       schedule_delayed_work(&kvm->arch.kvmclock_update_work,
>                             KVMCLOCK_UPDATE_DELAY);  /* 100ms */
>   }
>
> After the revert it kicks every vCPU of the VM synchronously on
> every call:
>
>   /* 6.19 */
>   static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
>       kvm_for_each_vcpu(i, vcpu, kvm) {
>           kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
>           kvm_vcpu_kick(vcpu);
>       }
>   }
>
> KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
> is issued on every vCPU load when use_master_clock is false
> (arch/x86/kvm/x86.c, kvm_vcpu_load):
>
>     if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
>         kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>
> With many Firecracker microVMs, the vCPU scheduling rate is high.
> Each scheduling event now IPIs every sibling vCPU of the VM, instead
> of coalescing all-vCPU kicks into at most one per 100ms. This creates
> a continuous IPI storm on the host, visible as high kernel (system)
> CPU time and high steal time inside guest VMs.
>
> The commit justifies the removal with "Since kvmclock has been switched
> to mono raw, this commit can be reverted." That reasoning is correct
> for the NTP-correction use case, but the 100ms rate-limit also
> protected against IPI storms when use_master_clock is false — a
> concern independent of clock source.
>
> ==== Full bisect log ====
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
> git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
> # status: waiting for bad commit, 1 good commit known
> # bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
> git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> # good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
> 'hwmon-for-v6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
> # bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
> 'tty-6.19-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
> # bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
> 'soc-arm-6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
> # good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
> 'tracepoints-v6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> git bisect good 36492b7141b9abc967e92c991af32c670351dc16
> # good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
> 'pull-persistency' of
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
> # bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
> 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
> git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
> # bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
> 'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
> git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
> # bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
> unused declaration kvm_mmu_may_ignore_guest_pat()
> git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
> # bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
> feature to track the MMIO Stale Data mitigation
> git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
> # good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
> Explicitly set new periodic hrtimer expiration in apic_timer_fn()
> git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
> # bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
> comment about ntp correction sync for
> git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
> # good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
> lapic_timer in a local variable to cleanup periodic code
> git bisect good a091fe60c2d3943b058132a64682a509d55bd325
> # bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
> rate-limit global clock updates"
> git bisect bad 446fcce2a52b533c543dabba26777813c347577c
> # good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
> introduce periodic global clock updates"
> git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
> # first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
> "x86: kvm: rate-limit global clock updates"
>
> Best regards,
> Jaroslav Pulchart

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
  2026-03-23  2:27 ` Lei Chen
@ 2026-04-01  6:43   ` Lei Chen
  2026-04-01 21:16     ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Lei Chen @ 2026-04-01  6:43 UTC (permalink / raw)
  To: Jaroslav Pulchart; +Cc: kvm, LKML, seanjc, pbonzini, Igor Raits, Jan Cipa

Hi Jaroslav,

I apologize for the late reply.

I have reviewed the code and identified two scenarios that currently
trigger the KVM_REQ_GLOBAL_CLOCK_UPDATE request:

Scenario 1: kvm_write_system_time
This code path occurs when the hypervisor (such as QEMU) adjusts the
time, or when the guest writes to the TSC.

Scenario 2: vcpu schedule in kvm_arch_vcpu_load
If this function triggers KVM_REQ_GLOBAL_CLOCK_UPDATE, it indicates
that the virtual machine is not using the master_clock.

Those two cases are uncommon. Could you please provide your dmesg and
help check which code path triggers KVM_REQ_GLOBAL_CLOCK_UPDATE?


Best regards,
Lei Chen

On Mon, Mar 23, 2026 at 10:27 AM Lei Chen <lei.chen@smartx.com> wrote:
>
> Hi Jaroslav,
>
> Thanks for your test and report, I'm looking into this problem.
>
> Best regards
> Lei Chen
>
> On Sat, Mar 21, 2026 at 10:33 PM Jaroslav Pulchart
> <jaroslav.pulchart@gooddata.com> wrote:
> >
> > Hi,
> >
> > I am reporting a performance regression in Linux 6.19 that severely
> > impacts KVM hosts running many Firecracker microVMs.
> >
> > == Bisect result ==
> >
> > 446fcce2a52b533c543dabba26777813c347577c is the first bad commit
> > commit 446fcce2a52b533c543dabba26777813c347577c
> > Author: Lei Chen <lei.chen@smartx.com>
> > Date:   Tue Aug 19 23:20:26 2025 +0800
> >
> >     Revert "x86: kvm: rate-limit global clock updates"
> >
> >     This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
> >
> >     Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
> >     intends to use a kvmclock_update_work to sync ntp corretion
> >     across all vcpus kvmclock, which is based on commit 0061d53daf26f
> >     ("KVM: x86: limit difference between kvmclock updates")
> >
> >     Since kvmclock has been switched to mono raw, this commit can be
> >     reverted.
> >
> >     Signed-off-by: Lei Chen <lei.chen@smartx.com>
> >     Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
> >     Signed-off-by: Sean Christopherson <seanjc@google.com>
> >
> >  arch/x86/include/asm/kvm_host.h |  1 -
> >  arch/x86/kvm/x86.c              | 29 ++++-------------------------
> >  2 files changed, 4 insertions(+), 26 deletions(-)
> >
> > ==== Symptoms ====
> >
> > Measured on a KVM micro VM host running many Firecracker microVMs
> > (node_exporter metrics, 2026-03-20):
> >
> >   kernel 6.19:
> >     steal time inside guest VMs:    3–24% per vCPU  (sustained)
> >     host system CPU (kernel mode):  3–12 CPUs saturated
> >     host steal:                     3–8%
> >
> >   kernel 6.18 (same host, same workload after rollback):
> >     steal time inside guest VMs:    < 0.02% per vCPU  (~200x lower)
> >     host system CPU (kernel mode):  2–3 CPUs
> >     host steal:                     0.3–0.5%
> >
> > ==== Root cause (by AI analyze) ====
> >
> > The regressing commit removes the rate-limiting from
> > kvm_gen_kvmclock_update(). Previously this function deferred the
> > all-vCPU kick via a 100ms delayed_work:
> >
> >   /* 6.18 */
> >   static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> >       kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> >       schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> >                             KVMCLOCK_UPDATE_DELAY);  /* 100ms */
> >   }
> >
> > After the revert it kicks every vCPU of the VM synchronously on
> > every call:
> >
> >   /* 6.19 */
> >   static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> >       kvm_for_each_vcpu(i, vcpu, kvm) {
> >           kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> >           kvm_vcpu_kick(vcpu);
> >       }
> >   }
> >
> > KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
> > is issued on every vCPU load when use_master_clock is false
> > (arch/x86/kvm/x86.c, kvm_vcpu_load):
> >
> >     if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> >         kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> >
> > With many Firecracker microVMs, the vCPU scheduling rate is high.
> > Each scheduling event now IPIs every sibling vCPU of the VM, instead
> > of coalescing all-vCPU kicks into at most one per 100ms. This creates
> > a continuous IPI storm on the host, visible as high kernel (system)
> > CPU time and high steal time inside guest VMs.
> >
> > The commit justifies the removal with "Since kvmclock has been switched
> > to mono raw, this commit can be reverted." That reasoning is correct
> > for the NTP-correction use case, but the 100ms rate-limit also
> > protected against IPI storms when use_master_clock is false — a
> > concern independent of clock source.
> >
> > ==== Full bisect log ====
> >
> > git bisect start
> > # status: waiting for both good and bad commits
> > # good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
> > git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
> > # status: waiting for bad commit, 1 good commit known
> > # bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
> > git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> > # good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
> > 'hwmon-for-v6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> > git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
> > # bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
> > 'tty-6.19-rc1' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
> > # bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
> > 'soc-arm-6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> > git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
> > # good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
> > 'tracepoints-v6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> > git bisect good 36492b7141b9abc967e92c991af32c670351dc16
> > # good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
> > 'pull-persistency' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> > git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
> > # bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
> > 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
> > git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
> > # bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
> > 'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
> > git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
> > # bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
> > unused declaration kvm_mmu_may_ignore_guest_pat()
> > git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
> > # bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
> > feature to track the MMIO Stale Data mitigation
> > git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
> > # good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
> > Explicitly set new periodic hrtimer expiration in apic_timer_fn()
> > git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
> > # bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
> > comment about ntp correction sync for
> > git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
> > # good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
> > lapic_timer in a local variable to cleanup periodic code
> > git bisect good a091fe60c2d3943b058132a64682a509d55bd325
> > # bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
> > rate-limit global clock updates"
> > git bisect bad 446fcce2a52b533c543dabba26777813c347577c
> > # good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
> > introduce periodic global clock updates"
> > git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
> > # first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
> > "x86: kvm: rate-limit global clock updates"
> >
> > Best regards,
> > Jaroslav Pulchart

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
  2026-04-01  6:43   ` Lei Chen
@ 2026-04-01 21:16     ` Sean Christopherson
  2026-04-07  7:00       ` [PATCH v1] KVM: x86: Rate-limit global clock updates on vCPU load Lei Chen
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2026-04-01 21:16 UTC (permalink / raw)
  To: Lei Chen; +Cc: Jaroslav Pulchart, kvm, LKML, pbonzini, Igor Raits, Jan Cipa

On Wed, Apr 01, 2026, Lei Chen wrote:
> Hi Jaroslav,
> 
> I apologize for the late reply.
> 
> I have reviewed the code and identified two scenarios that currently
> trigger the KVM_REQ_GLOBAL_CLOCK_UPDATE request:
> 
> Scenario 1: kvm_write_system_time
> This code path occurs when the hypervisor (such as QEMU) adjusts the
> time, or when the guest writes to the TSC.
> 
> Scenario 2: vcpu schedule in kvm_arch_vcpu_load
> If this function triggers KVM_REQ_GLOBAL_CLOCK_UPDATE, it indicates
> that the virtual machine is not using the master_clock.
> 
> Those two cases are uncommon. Could you please provide your dmesg and
> help check which code path triggers KVM_REQ_GLOBAL_CLOCK_UPDATE?

I'm also mildly curious as to why KVM_REQ_GLOBAL_CLOCK_UPDATE is being
triggered, but I don't know that it matters.  E.g. fixing the underlying flaw
(if one even exists) could fix Jaroslav's setup, but it won't fix setups where
the "uncommon" cases are unavoidable, e.g. on setups that _can't_ use a master
clock for whatever reason.

At a glance, explicitly ratelimiting kvm_gen_kvmclock_update() seems like the
simplest option to address the regression.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v1] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-01 21:16     ` Sean Christopherson
@ 2026-04-07  7:00       ` Lei Chen
  2026-04-07 18:02         ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Lei Chen @ 2026-04-07  7:00 UTC (permalink / raw)
  To: seanjc
  Cc: igor, jan.cipa, jaroslav.pulchart, kvm, lei.chen, linux-kernel,
	pbonzini

commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.

As a result, kvm_arch_vcpu_load() can queue global clock update requests
every time a vCPU is scheduled when the master clock is disabled or when
the vCPU is loaded for the first time.

Restore the throttling with a per-VM ratelimit state and gate
KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
scheduling does not generate a steady stream of redundant clock update
requests.

Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
---
 arch/x86/include/asm/kvm_host.h | 2 ++
 arch/x86/kvm/x86.c              | 5 ++++-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5a3bfa293e8b..6d3d3f19af01 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1453,6 +1453,8 @@ struct kvm_arch {
 	bool use_master_clock;
 	u64 master_kernel_ns;
 	u64 master_cycle_now;
+	/* how often to make KVM_REQ_GLOBAL_CLOCK_UPDATE on vcpu sched*/
+	struct ratelimit_state kvmclock_update_rs;
 
 #ifdef CONFIG_KVM_HYPERV
 	struct kvm_hv hyperv;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 63afdb6bb078..4a37027cc0b8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5211,7 +5211,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		 * kvmclock on vcpu->cpu migration
 		 */
 		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
-			kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
+			if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
+				kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
+
 		if (vcpu->cpu != cpu)
 			kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
 		vcpu->cpu = cpu;
@@ -13189,6 +13191,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
+	ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
 	kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v1] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-07  7:00       ` [PATCH v1] KVM: x86: Rate-limit global clock updates on vCPU load Lei Chen
@ 2026-04-07 18:02         ` Sean Christopherson
  2026-04-09 13:03           ` Lei Chen
                             ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Sean Christopherson @ 2026-04-07 18:02 UTC (permalink / raw)
  To: Lei Chen; +Cc: igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini

On Tue, Apr 07, 2026, Lei Chen wrote:
> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> 
> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> every time a vCPU is scheduled when the master clock is disabled or when
> the vCPU is loaded for the first time.
> 
> Restore the throttling with a per-VM ratelimit state and gate
> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> scheduling does not generate a steady stream of redundant clock update
> requests.
> 
> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> ---
>  arch/x86/include/asm/kvm_host.h | 2 ++
>  arch/x86/kvm/x86.c              | 5 ++++-
>  2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5a3bfa293e8b..6d3d3f19af01 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1453,6 +1453,8 @@ struct kvm_arch {
>  	bool use_master_clock;
>  	u64 master_kernel_ns;
>  	u64 master_cycle_now;
> +	/* how often to make KVM_REQ_GLOBAL_CLOCK_UPDATE on vcpu sched*/

Eh, I would just omit this comment.  If we want to document the ratelimit,
the function comment above kvm_gen_kvmclock_update() is the best place for it.

> +	struct ratelimit_state kvmclock_update_rs;
>  
>  #ifdef CONFIG_KVM_HYPERV
>  	struct kvm_hv hyperv;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 63afdb6bb078..4a37027cc0b8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5211,7 +5211,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		 * kvmclock on vcpu->cpu migration
>  		 */
>  		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> -			kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> +			if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
> +				kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);

To maintain pre-revert compatibility, where KVM did this:

	kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
	schedule_delayed_work(&kvm->arch.kvmclock_update_work,
			      KVMCLOCK_UPDATE_DELAY);


the ratelimit should be on blasting KVM_REQ_CLOCK_UPDATE to *all* vCPUs, but KVM
should still trigger KVM_REQ_CLOCK_UPDATE on the initiating vCPU so that the
immediate update goes through.

That will also apply the ratelimiting to kvm_write_system_time(), though if a
guest is changing system time that fast, it probably has other issues :-)

Completely untested, but this?

---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 13 +++++--------
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c470e40a00aa..f14009f25a3b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1504,6 +1504,7 @@ struct kvm_arch {
 	bool use_master_clock;
 	u64 master_kernel_ns;
 	u64 master_cycle_now;
+	struct ratelimit_state kvmclock_update_rs;
 
 #ifdef CONFIG_KVM_HYPERV
 	struct kvm_hv hyperv;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a1b63c63d1a..5dc33f207a83 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3522,16 +3522,12 @@ uint64_t kvm_get_wall_clock_epoch(struct kvm *kvm)
  * The worst case for a remote vcpu to update its kvmclock
  * is then bounded by maximum nohz sleep latency.
  */
-static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
+static void kvm_gen_kvmclock_update(struct kvm_vcpu *vcpu)
 {
-	unsigned long i;
-	struct kvm_vcpu *vcpu;
-	struct kvm *kvm = v->kvm;
-
-	kvm_for_each_vcpu(i, vcpu, kvm) {
+	if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
 		kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
-		kvm_vcpu_kick(vcpu);
-	}
+	else
+		kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_CLOCK_UPDATE);
 }
 
 /* These helpers are safe iff @msr is known to be an MCx bank MSR. */
@@ -13366,6 +13362,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
+	ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
 	kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);

base-commit: b89df297a47e641581ee67793592e5c6ae0428f4
-- 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v1] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-07 18:02         ` Sean Christopherson
@ 2026-04-09 13:03           ` Lei Chen
  2026-04-09 13:36           ` Lei Chen
  2026-04-09 14:22           ` [PATCH v2] " Lei Chen
  2 siblings, 0 replies; 17+ messages in thread
From: Lei Chen @ 2026-04-09 13:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini

On Wed, Apr 8, 2026 at 2:02 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Apr 07, 2026, Lei Chen wrote:
> > commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> >
> > As a result, kvm_arch_vcpu_load() can queue global clock update requests
> > every time a vCPU is scheduled when the master clock is disabled or when
> > the vCPU is loaded for the first time.
> >
> > Restore the throttling with a per-VM ratelimit state and gate
> > KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> > scheduling does not generate a steady stream of redundant clock update
> > requests.
> >
> > Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > Signed-off-by: Lei Chen <lei.chen@smartx.com>
> > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> > Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> > ---
> >  arch/x86/include/asm/kvm_host.h | 2 ++
> >  arch/x86/kvm/x86.c              | 5 ++++-
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 5a3bfa293e8b..6d3d3f19af01 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1453,6 +1453,8 @@ struct kvm_arch {
> >       bool use_master_clock;
> >       u64 master_kernel_ns;
> >       u64 master_cycle_now;
> > +     /* how often to make KVM_REQ_GLOBAL_CLOCK_UPDATE on vcpu sched*/
>
> Eh, I would just omit this comment.  If we want to document the ratelimit,
> the function comment above kvm_gen_kvmclock_update() is the best place for it.
>
> > +     struct ratelimit_state kvmclock_update_rs;
> >
> >  #ifdef CONFIG_KVM_HYPERV
> >       struct kvm_hv hyperv;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 63afdb6bb078..4a37027cc0b8 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5211,7 +5211,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >                * kvmclock on vcpu->cpu migration
> >                */
> >               if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> > -                     kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> > +                     if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
> > +                             kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>
> To maintain pre-revert compatibility, where KVM did this:
>
>         kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>         schedule_delayed_work(&kvm->arch.kvmclock_update_work,
>                               KVMCLOCK_UPDATE_DELAY);
>
>
> the ratelimit should be on blasting KVM_REQ_CLOCK_UPDATE to *all* vCPUs, but KVM
> should still trigger KVM_REQ_CLOCK_UPDATE on the initiating vCPU so that the
> immediate update goes through.
>
> That will also apply the ratelimiting to kvm_write_system_time(), though if a
> guest is changing system time that fast, it probably has other issues :-)
>
This patch does not impact kvm_write_system_time, which works as follows:

kvm_write_system_time
    kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);


vcpu_enter_guest
    if (kvm_check_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu))
         kvm_gen_kvmclock_update(vcpu);
               kvm_for_each_vcpu
                   kvm_vcpu_kick(vcpu);

This patch limits the rate of GLOBAL_CLOCK_UPDATE only in kvm_arch_vcpu_load.

Maybe I missed something?

> Completely untested, but this?
>
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 13 +++++--------
>  2 files changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c470e40a00aa..f14009f25a3b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1504,6 +1504,7 @@ struct kvm_arch {
>         bool use_master_clock;
>         u64 master_kernel_ns;
>         u64 master_cycle_now;
> +       struct ratelimit_state kvmclock_update_rs;
>
>  #ifdef CONFIG_KVM_HYPERV
>         struct kvm_hv hyperv;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 0a1b63c63d1a..5dc33f207a83 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3522,16 +3522,12 @@ uint64_t kvm_get_wall_clock_epoch(struct kvm *kvm)
>   * The worst case for a remote vcpu to update its kvmclock
>   * is then bounded by maximum nohz sleep latency.
>   */
> -static void kvm_gen_kvmclock_update(struct kvm_vcpu *v)
> +static void kvm_gen_kvmclock_update(struct kvm_vcpu *vcpu)
>  {
> -       unsigned long i;
> -       struct kvm_vcpu *vcpu;
> -       struct kvm *kvm = v->kvm;
> -
> -       kvm_for_each_vcpu(i, vcpu, kvm) {
> +       if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
>                 kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> -               kvm_vcpu_kick(vcpu);
> -       }
> +       else
> +               kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_CLOCK_UPDATE);
>  }
>
>  /* These helpers are safe iff @msr is known to be an MCx bank MSR. */
> @@ -13366,6 +13362,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>         raw_spin_lock_init(&kvm->arch.tsc_write_lock);
>         mutex_init(&kvm->arch.apic_map_lock);
>         seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
> +       ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
>         kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
>
>         raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
>
> base-commit: b89df297a47e641581ee67793592e5c6ae0428f4
> --

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v1] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-07 18:02         ` Sean Christopherson
  2026-04-09 13:03           ` Lei Chen
@ 2026-04-09 13:36           ` Lei Chen
  2026-04-09 14:22           ` [PATCH v2] " Lei Chen
  2 siblings, 0 replies; 17+ messages in thread
From: Lei Chen @ 2026-04-09 13:36 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini

Hi Sean,

Thanks for your review.

On Wed, Apr 8, 2026 at 2:02 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Apr 07, 2026, Lei Chen wrote:
> > commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> >
> > As a result, kvm_arch_vcpu_load() can queue global clock update requests
> > every time a vCPU is scheduled when the master clock is disabled or when
> > the vCPU is loaded for the first time.
> >
> > Restore the throttling with a per-VM ratelimit state and gate
> > KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> > scheduling does not generate a steady stream of redundant clock update
> > requests.
> >
> > Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > Signed-off-by: Lei Chen <lei.chen@smartx.com>
> > Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> > Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> > ---
> >  arch/x86/include/asm/kvm_host.h | 2 ++
> >  arch/x86/kvm/x86.c              | 5 ++++-
> >  2 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 5a3bfa293e8b..6d3d3f19af01 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -1453,6 +1453,8 @@ struct kvm_arch {
> >       bool use_master_clock;
> >       u64 master_kernel_ns;
> >       u64 master_cycle_now;
> > +     /* how often to make KVM_REQ_GLOBAL_CLOCK_UPDATE on vcpu sched*/
>
> Eh, I would just omit this comment.  If we want to document the ratelimit,
> the function comment above kvm_gen_kvmclock_update() is the best place for it.
>
OK, I'll remove this comment in the next patch.

> > +     struct ratelimit_state kvmclock_update_rs;
> >
> >  #ifdef CONFIG_KVM_HYPERV
> >       struct kvm_hv hyperv;
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 63afdb6bb078..4a37027cc0b8 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -5211,7 +5211,9 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> >                * kvmclock on vcpu->cpu migration
> >                */
> >               if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> > -                     kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> > +                     if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
> > +                             kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>
> To maintain pre-revert compatibility, where KVM did this:
>
>         kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
>         schedule_delayed_work(&kvm->arch.kvmclock_update_work,
>                               KVMCLOCK_UPDATE_DELAY);
>
>
> the ratelimit should be on blasting KVM_REQ_CLOCK_UPDATE to *all* vCPUs, but KVM
> should still trigger KVM_REQ_CLOCK_UPDATE on the initiating vCPU so that the
> immediate update goes through.
>
Yes, this is a mistake, I'll fix it in the next patch.

> That will also apply the ratelimiting to kvm_write_system_time(), though if a
> guest is changing system time that fast, it probably has other issues :-)
>

This patch does not impact kvm_write_system_time, which works as follows:

kvm_write_system_time
    kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);


vcpu_enter_guest
    if (kvm_check_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu))
         kvm_gen_kvmclock_update(vcpu);
               kvm_for_each_vcpu
                   kvm_vcpu_kick(vcpu);

This patch limits the rate of GLOBAL_CLOCK_UPDATE only in kvm_arch_vcpu_load.

Maybe I missed something?

Best Regards,
Lei Chen

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-07 18:02         ` Sean Christopherson
  2026-04-09 13:03           ` Lei Chen
  2026-04-09 13:36           ` Lei Chen
@ 2026-04-09 14:22           ` Lei Chen
  2026-04-09 19:21             ` Sean Christopherson
  2026-05-06 20:10             ` Jaroslav Pulchart
  2 siblings, 2 replies; 17+ messages in thread
From: Lei Chen @ 2026-04-09 14:22 UTC (permalink / raw)
  To: seanjc
  Cc: igor, jan.cipa, jaroslav.pulchart, kvm, lei.chen, linux-kernel,
	pbonzini

commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.

As a result, kvm_arch_vcpu_load() can queue global clock update requests
every time a vCPU is scheduled when the master clock is disabled or when
the vCPU is loaded for the first time.

Restore the throttling with a per-VM ratelimit state and gate
KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
scheduling does not generate a steady stream of redundant clock update
requests.

Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
---
CHANGELOG:
v2:
- remove comment of kvmclock_update_rs
- make sure kvm_arch_vcpu_load make KVM_REQ_CLOCK_UPDATE for this vcpu
- add RATELIMIT_MSG_ON_RELEASE to kvmclock_update_rs

v1:
- initial version(https://lore.kernel.org/all/20260407070046.2336-1-lei.chen@smartx.com/)
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/kvm/x86.c              | 11 +++++++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5a3bfa293e8b..5e750c49d21e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1453,6 +1453,7 @@ struct kvm_arch {
 	bool use_master_clock;
 	u64 master_kernel_ns;
 	u64 master_cycle_now;
+	struct ratelimit_state kvmclock_update_rs;
 
 #ifdef CONFIG_KVM_HYPERV
 	struct kvm_hv hyperv;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 63afdb6bb078..a534e8391611 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5210,8 +5210,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		 * On a host with synchronized TSC, there is no need to update
 		 * kvmclock on vcpu->cpu migration
 		 */
-		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
-			kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
+		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) {
+			if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
+				kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
+			else
+				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
+		}
+
 		if (vcpu->cpu != cpu)
 			kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
 		vcpu->cpu = cpu;
@@ -13189,6 +13194,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
 	mutex_init(&kvm->arch.apic_map_lock);
 	seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
+	ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
+	ratelimit_set_flags(&kvm->arch.kvmclock_update_rs, RATELIMIT_MSG_ON_RELEASE);
 	kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-09 14:22           ` [PATCH v2] " Lei Chen
@ 2026-04-09 19:21             ` Sean Christopherson
  2026-05-06  9:48               ` Thorsten Leemhuis
  2026-05-06 20:10             ` Jaroslav Pulchart
  1 sibling, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2026-04-09 19:21 UTC (permalink / raw)
  To: Lei Chen; +Cc: igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini

On Thu, Apr 09, 2026, Lei Chen wrote:
> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> 
> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> every time a vCPU is scheduled when the master clock is disabled or when
> the vCPU is loaded for the first time.
> 
> Restore the throttling with a per-VM ratelimit state and gate
> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> scheduling does not generate a steady stream of redundant clock update
> requests.
> 
> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> ---
> CHANGELOG:
> v2:
> - remove comment of kvmclock_update_rs
> - make sure kvm_arch_vcpu_load make KVM_REQ_CLOCK_UPDATE for this vcpu
> - add RATELIMIT_MSG_ON_RELEASE to kvmclock_update_rs
> 
> v1:
> - initial version(https://lore.kernel.org/all/20260407070046.2336-1-lei.chen@smartx.com/)
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 11 +++++++++--
>  2 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5a3bfa293e8b..5e750c49d21e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1453,6 +1453,7 @@ struct kvm_arch {
>  	bool use_master_clock;
>  	u64 master_kernel_ns;
>  	u64 master_cycle_now;
> +	struct ratelimit_state kvmclock_update_rs;
>  
>  #ifdef CONFIG_KVM_HYPERV
>  	struct kvm_hv hyperv;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 63afdb6bb078..a534e8391611 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5210,8 +5210,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		 * On a host with synchronized TSC, there is no need to update
>  		 * kvmclock on vcpu->cpu migration
>  		 */
> -		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> -			kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> +		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) {
> +			if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
> +				kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> +			else
> +				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);

What I was trying to call out in my review of v1, is that prior to commit
446fcce2a52b, the effectively ratelimiting applied to *all* instances of
KVM_REQ_GLOBAL_CLOCK_UPDATE.  Which meant that KVM's existing behavior is that
kvm_write_system_time() would be subject to the ratelimiting as well.

That said, I don't see any obvious problems with immediately honoring writes to
MSR_KVM_SYSTEM_TIME{,_NEW}, and it's probably a much better experience for the
guest.  So I'm a-ok with this approach, but we should call out that skipping the
synthetic MSR case is deliberate.  No need for a v3, I'll add a blurb when
applying.

> +		}
> +
>  		if (vcpu->cpu != cpu)
>  			kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
>  		vcpu->cpu = cpu;
> @@ -13189,6 +13194,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
>  	mutex_init(&kvm->arch.apic_map_lock);
>  	seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
> +	ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
> +	ratelimit_set_flags(&kvm->arch.kvmclock_update_rs, RATELIMIT_MSG_ON_RELEASE);

IIUC, so long was KVM doesn't explicitly invoke ratelimit_state_exit(), setting
RATELIMIT_MSG_ON_RELEASE means we won't get dmesg spam?  To be clear, I'm 100%
in favor of suppressing dmesg output.

>  	kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
>  
>  	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
> -- 
> 2.51.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-09 19:21             ` Sean Christopherson
@ 2026-05-06  9:48               ` Thorsten Leemhuis
  2026-05-06 12:55                 ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Thorsten Leemhuis @ 2026-05-06  9:48 UTC (permalink / raw)
  To: Sean Christopherson, Lei Chen
  Cc: igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel, pbonzini,
	Linux kernel regressions list

On 4/9/26 21:21, Sean Christopherson wrote:
> On Thu, Apr 09, 2026, Lei Chen wrote:
>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
>>
>> As a result, kvm_arch_vcpu_load() can queue global clock update requests
>> every time a vCPU is scheduled when the master clock is disabled or when
>> the vCPU is loaded for the first time.
>>
>> Restore the throttling with a per-VM ratelimit state and gate
>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
>> scheduling does not generate a steady stream of redundant clock update
>> requests.
>>
>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
>> Signed-off-by: Lei Chen <lei.chen@smartx.com>
>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/

Was this performance regression ever addressed? Looks like this fall
through the cracks, but it's easy to miss something.

Ciao, Thorsten

>> ---
>> CHANGELOG:
>> v2:
>> - remove comment of kvmclock_update_rs
>> - make sure kvm_arch_vcpu_load make KVM_REQ_CLOCK_UPDATE for this vcpu
>> - add RATELIMIT_MSG_ON_RELEASE to kvmclock_update_rs
>>
>> v1:
>> - initial version(https://lore.kernel.org/all/20260407070046.2336-1-lei.chen@smartx.com/)
>> ---
>>  arch/x86/include/asm/kvm_host.h |  1 +
>>  arch/x86/kvm/x86.c              | 11 +++++++++--
>>  2 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 5a3bfa293e8b..5e750c49d21e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1453,6 +1453,7 @@ struct kvm_arch {
>>  	bool use_master_clock;
>>  	u64 master_kernel_ns;
>>  	u64 master_cycle_now;
>> +	struct ratelimit_state kvmclock_update_rs;
>>  
>>  #ifdef CONFIG_KVM_HYPERV
>>  	struct kvm_hv hyperv;
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 63afdb6bb078..a534e8391611 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5210,8 +5210,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>  		 * On a host with synchronized TSC, there is no need to update
>>  		 * kvmclock on vcpu->cpu migration
>>  		 */
>> -		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
>> -			kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>> +		if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) {
>> +			if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
>> +				kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>> +			else
>> +				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> 
> What I was trying to call out in my review of v1, is that prior to commit
> 446fcce2a52b, the effectively ratelimiting applied to *all* instances of
> KVM_REQ_GLOBAL_CLOCK_UPDATE.  Which meant that KVM's existing behavior is that
> kvm_write_system_time() would be subject to the ratelimiting as well.
> 
> That said, I don't see any obvious problems with immediately honoring writes to
> MSR_KVM_SYSTEM_TIME{,_NEW}, and it's probably a much better experience for the
> guest.  So I'm a-ok with this approach, but we should call out that skipping the
> synthetic MSR case is deliberate.  No need for a v3, I'll add a blurb when
> applying.
> 
>> +		}
>> +
>>  		if (vcpu->cpu != cpu)
>>  			kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
>>  		vcpu->cpu = cpu;
>> @@ -13189,6 +13194,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>>  	raw_spin_lock_init(&kvm->arch.tsc_write_lock);
>>  	mutex_init(&kvm->arch.apic_map_lock);
>>  	seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
>> +	ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
>> +	ratelimit_set_flags(&kvm->arch.kvmclock_update_rs, RATELIMIT_MSG_ON_RELEASE);
> 
> IIUC, so long was KVM doesn't explicitly invoke ratelimit_state_exit(), setting
> RATELIMIT_MSG_ON_RELEASE means we won't get dmesg spam?  To be clear, I'm 100%
> in favor of suppressing dmesg output.
> 
>>  	kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
>>  
>>  	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
>> -- 
>> 2.51.0
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-05-06  9:48               ` Thorsten Leemhuis
@ 2026-05-06 12:55                 ` Sean Christopherson
  2026-05-06 14:09                   ` Thorsten Leemhuis
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2026-05-06 12:55 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Lei Chen, igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel,
	pbonzini, Linux kernel regressions list

On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> On 4/9/26 21:21, Sean Christopherson wrote:
> > On Thu, Apr 09, 2026, Lei Chen wrote:
> >> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> >> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> >>
> >> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> >> every time a vCPU is scheduled when the master clock is disabled or when
> >> the vCPU is loaded for the first time.
> >>
> >> Restore the throttling with a per-VM ratelimit state and gate
> >> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> >> scheduling does not generate a steady stream of redundant clock update
> >> requests.
> >>
> >> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> >> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> >> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> >> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> 
> Was this performance regression ever addressed?

Nope, not yet.

> Looks like this fall through the cracks, but it's easy to miss something.

It's in my list of patches to apply (probably for 7.2?).  I didn't want to squeeze
it into the initial 7.1 pull request for a variety of reasons.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-05-06 12:55                 ` Sean Christopherson
@ 2026-05-06 14:09                   ` Thorsten Leemhuis
  2026-05-06 15:22                     ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Thorsten Leemhuis @ 2026-05-06 14:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Lei Chen, igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel,
	pbonzini, Linux kernel regressions list, Linus Torvalds

On 5/6/26 14:55, Sean Christopherson wrote:
> On Wed, May 06, 2026, Thorsten Leemhuis wrote:
>> On 4/9/26 21:21, Sean Christopherson wrote:
>>> On Thu, Apr 09, 2026, Lei Chen wrote:
>>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
>>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
>>>>
>>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests
>>>> every time a vCPU is scheduled when the master clock is disabled or when
>>>> the vCPU is loaded for the first time.
>>>>
>>>> Restore the throttling with a per-VM ratelimit state and gate
>>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
>>>> scheduling does not generate a steady stream of redundant clock update
>>>> requests.
>>>>
>>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
>>>> Signed-off-by: Lei Chen <lei.chen@smartx.com>
>>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
>>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
>>
>> Was this performance regression ever addressed?
> Nope, not yet.
> 
>> Looks like this fall through the cracks, but it's easy to miss something.
> 
> It's in my list of patches to apply (probably for 7.2?).  I didn't want to squeeze
> it into the initial 7.1 pull request for a variety of reasons.

Hmmm. CCing Linus so he can speak up if he wants to about the following:

Given that this is a fix for a performance regression[1] I'd say it's
not as urgent as a "something stopped working" case -- so I guess it's
something where the "[fix] "within a week", preferably before the next
rc" approach Linus recently mentioned does not need to be applied strictly.

But Jaroslav OTOH reported it more than 7 weeks ago already and back
then called it something that "severely impacts KVM hosts running many
Firecracker microVMs."[1]; and a potential fix exists for 4 weeks
already. Due to that, 7.2 feels a bit too far away for me, as that is
still ~15 weeks away. But maybe that's just me.

Ciao, Thorsten

[1]
https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-05-06 14:09                   ` Thorsten Leemhuis
@ 2026-05-06 15:22                     ` Sean Christopherson
  2026-05-06 15:58                       ` Jaroslav Pulchart
  0 siblings, 1 reply; 17+ messages in thread
From: Sean Christopherson @ 2026-05-06 15:22 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Lei Chen, igor, jan.cipa, jaroslav.pulchart, kvm, linux-kernel,
	pbonzini, Linux kernel regressions list, Linus Torvalds

On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> On 5/6/26 14:55, Sean Christopherson wrote:
> > On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> >> On 4/9/26 21:21, Sean Christopherson wrote:
> >>> On Thu, Apr 09, 2026, Lei Chen wrote:
> >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> >>>>
> >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> >>>> every time a vCPU is scheduled when the master clock is disabled or when
> >>>> the vCPU is loaded for the first time.
> >>>>
> >>>> Restore the throttling with a per-VM ratelimit state and gate
> >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> >>>> scheduling does not generate a steady stream of redundant clock update
> >>>> requests.
> >>>>
> >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> >>
> >> Was this performance regression ever addressed?
> > Nope, not yet.
> > 
> >> Looks like this fall through the cracks, but it's easy to miss something.
> > 
> > It's in my list of patches to apply (probably for 7.2?).  I didn't want to squeeze
> > it into the initial 7.1 pull request for a variety of reasons.
> 
> Hmmm. CCing Linus so he can speak up if he wants to about the following:
> 
> Given that this is a fix for a performance regression[1] I'd say it's
> not as urgent as a "something stopped working" case -- so I guess it's
> something where the "[fix] "within a week", preferably before the next
> rc" approach Linus recently mentioned does not need to be applied strictly.
> 
> But Jaroslav OTOH reported it more than 7 weeks ago already and back
> then called it something that "severely impacts KVM hosts running many
> Firecracker microVMs."[1]; 

For a setup that is likely broken.  On modern hardware, the path in question
should never actually be hit.  I do want to resolve the bug since older hardware
and funky setups do rely on the old behavior, but it's not pants-on-fire urgent.

More importantly, the original reporter(s) hasn't responded to any of our questions,
or to the proposed fix.  I'm not going to rush in a fix if I don't actually *know*
it's going to fix the original problem.

> and a potential fix exists for 4 weeks already. Due to that, 7.2 feels a bit
> too far away for me, as that is still ~15 weeks away. But maybe that's just
> me.

The "user" is also a fairly sizeable company, not some random person that's trying
to use KVM and is blocked.  I highly doubt they are still actually running a buggy
kernel.  E.g. based on a "same workload after rollback" comment in the bug report,
I assume they simply rolled back to the last good kernel (6.18).

Who knows, maybe they also took our hints/suggestions about theire setup being
wonky and addressed whatever hiccup was sending them down the uncommon, already-
slow path.

All in all, AFAICT the only difference between sending this into 7.1 vs. 7.2 is
that the reporter won't be able to upgrade their kernel (without patching) for an
extra ~8 weeks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-05-06 15:22                     ` Sean Christopherson
@ 2026-05-06 15:58                       ` Jaroslav Pulchart
  2026-05-06 20:31                         ` Sean Christopherson
  0 siblings, 1 reply; 17+ messages in thread
From: Jaroslav Pulchart @ 2026-05-06 15:58 UTC (permalink / raw)
  To: Sean Christopherson, Thorsten Leemhuis
  Cc: Lei Chen, igor, jan.cipa, kvm, linux-kernel, pbonzini,
	Linux kernel regressions list, Linus Torvalds

>
> On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> > On 5/6/26 14:55, Sean Christopherson wrote:
> > > On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> > >> On 4/9/26 21:21, Sean Christopherson wrote:
> > >>> On Thu, Apr 09, 2026, Lei Chen wrote:
> > >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> > >>>>
> > >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> > >>>> every time a vCPU is scheduled when the master clock is disabled or when
> > >>>> the vCPU is loaded for the first time.
> > >>>>
> > >>>> Restore the throttling with a per-VM ratelimit state and gate
> > >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> > >>>> scheduling does not generate a steady stream of redundant clock update
> > >>>> requests.
> > >>>>
> > >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> > >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> > >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> > >>
> > >> Was this performance regression ever addressed?
> > > Nope, not yet.
> > >
> > >> Looks like this fall through the cracks, but it's easy to miss something.
> > >
> > > It's in my list of patches to apply (probably for 7.2?).  I didn't want to squeeze
> > > it into the initial 7.1 pull request for a variety of reasons.
> >
> > Hmmm. CCing Linus so he can speak up if he wants to about the following:
> >
> > Given that this is a fix for a performance regression[1] I'd say it's
> > not as urgent as a "something stopped working" case -- so I guess it's
> > something where the "[fix] "within a week", preferably before the next
> > rc" approach Linus recently mentioned does not need to be applied strictly.
> >
> > But Jaroslav OTOH reported it more than 7 weeks ago already and back
> > then called it something that "severely impacts KVM hosts running many
> > Firecracker microVMs."[1];
>
> For a setup that is likely broken.  On modern hardware, the path in question
> should never actually be hit.  I do want to resolve the bug since older hardware
> and funky setups do rely on the old behavior, but it's not pants-on-fire urgent.
>
> More importantly, the original reporter(s) hasn't responded to any of our questions,
> or to the proposed fix.  I'm not going to rush in a fix if I don't actually *know*
> it's going to fix the original problem.

Hi Sean, Thorsten,

sorry for the missing response from my side, this thread unfortunately
ended up in trash due to mail filters on my side and I completely
missed it. I currently don't have the full context loaded back in yet,
but I'll re-read the thread and follow up properly once I do.

For additional context, we are currently running the latest 6.19/7.0.y
kernels with a revert of the commits causing the reported regression,
and the hardware is AMD EPYC 9454P 48-Core Processor.

Jaroslav

>
> > and a potential fix exists for 4 weeks already. Due to that, 7.2 feels a bit
> > too far away for me, as that is still ~15 weeks away. But maybe that's just
> > me.
>
> The "user" is also a fairly sizeable company, not some random person that's trying
> to use KVM and is blocked.  I highly doubt they are still actually running a buggy
> kernel.  E.g. based on a "same workload after rollback" comment in the bug report,
> I assume they simply rolled back to the last good kernel (6.18).
>
> Who knows, maybe they also took our hints/suggestions about theire setup being
> wonky and addressed whatever hiccup was sending them down the uncommon, already-
> slow path.
>
> All in all, AFAICT the only difference between sending this into 7.1 vs. 7.2 is
> that the reporter won't be able to upgrade their kernel (without patching) for an
> extra ~8 weeks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-05-06 15:58                       ` Jaroslav Pulchart
@ 2026-05-06 20:31                         ` Sean Christopherson
  0 siblings, 0 replies; 17+ messages in thread
From: Sean Christopherson @ 2026-05-06 20:31 UTC (permalink / raw)
  To: Jaroslav Pulchart
  Cc: Thorsten Leemhuis, Lei Chen, igor, jan.cipa, kvm, linux-kernel,
	pbonzini, Linux kernel regressions list, Linus Torvalds

On Wed, May 06, 2026, Jaroslav Pulchart wrote:
> > On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> > > On 5/6/26 14:55, Sean Christopherson wrote:
> > > > On Wed, May 06, 2026, Thorsten Leemhuis wrote:
> > > >> On 4/9/26 21:21, Sean Christopherson wrote:
> > > >>> On Thu, Apr 09, 2026, Lei Chen wrote:
> > > >>>> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > > >>>> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
> > > >>>>
> > > >>>> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> > > >>>> every time a vCPU is scheduled when the master clock is disabled or when
> > > >>>> the vCPU is loaded for the first time.
> > > >>>>
> > > >>>> Restore the throttling with a per-VM ratelimit state and gate
> > > >>>> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> > > >>>> scheduling does not generate a steady stream of redundant clock update
> > > >>>> requests.
> > > >>>>
> > > >>>> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> > > >>>> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> > > >>>> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> > > >>>> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> > > >>
> > > >> Was this performance regression ever addressed?
> > > > Nope, not yet.
> > > >
> > > >> Looks like this fall through the cracks, but it's easy to miss something.
> > > >
> > > > It's in my list of patches to apply (probably for 7.2?).  I didn't want to squeeze
> > > > it into the initial 7.1 pull request for a variety of reasons.
> > >
> > > Hmmm. CCing Linus so he can speak up if he wants to about the following:
> > >
> > > Given that this is a fix for a performance regression[1] I'd say it's
> > > not as urgent as a "something stopped working" case -- so I guess it's
> > > something where the "[fix] "within a week", preferably before the next
> > > rc" approach Linus recently mentioned does not need to be applied strictly.
> > >
> > > But Jaroslav OTOH reported it more than 7 weeks ago already and back
> > > then called it something that "severely impacts KVM hosts running many
> > > Firecracker microVMs."[1];
> >
> > For a setup that is likely broken.  On modern hardware, the path in question
> > should never actually be hit.  I do want to resolve the bug since older hardware
> > and funky setups do rely on the old behavior, but it's not pants-on-fire urgent.
> >
> > More importantly, the original reporter(s) hasn't responded to any of our questions,
> > or to the proposed fix.  I'm not going to rush in a fix if I don't actually *know*
> > it's going to fix the original problem.
> 
> Hi Sean, Thorsten,
> 
> sorry for the missing response from my side, this thread unfortunately
> ended up in trash due to mail filters on my side and I completely
> missed it.

No worries, gmail's Spam filter is my nemesis :-)

> I currently don't have the full context loaded back in yet, but I'll re-read
> the thread and follow up properly once I do.

I think the only remaining question is why/how KVM's master clock is getting
disabled.  But that's more of a question for your deployment than it is a question
for upstream; it's possible there's a different KVM bug lurking, but it's more
likely that something in your setup is incompatible with using the master clock.

Note, it's certainly not "wrong" for the master clock to be disabled, but it's
quite suprising, especially for Firecracker VMs.  It's worth investigating as
there might be an underlying issue that's very easy to address, and "fixing" it
should provide (very) small performance benefits.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2] KVM: x86: Rate-limit global clock updates on vCPU load
  2026-04-09 14:22           ` [PATCH v2] " Lei Chen
  2026-04-09 19:21             ` Sean Christopherson
@ 2026-05-06 20:10             ` Jaroslav Pulchart
  1 sibling, 0 replies; 17+ messages in thread
From: Jaroslav Pulchart @ 2026-05-06 20:10 UTC (permalink / raw)
  To: Lei Chen; +Cc: seanjc, igor, jan.cipa, kvm, linux-kernel, pbonzini

>
> commit 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> dropped the rate limiting for KVM_REQ_GLOBAL_CLOCK_UPDATE.
>
> As a result, kvm_arch_vcpu_load() can queue global clock update requests
> every time a vCPU is scheduled when the master clock is disabled or when
> the vCPU is loaded for the first time.
>
> Restore the throttling with a per-VM ratelimit state and gate
> KVM_REQ_GLOBAL_CLOCK_UPDATE through __ratelimit(), so frequent vCPU
> scheduling does not generate a steady stream of redundant clock update
> requests.
>
> Fixes: 446fcce2a52b ("Revert "x86: kvm: rate-limit global clock updates"")
> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> Reported-by: Jaroslav Pulchart <jaroslav.pulchart@gooddata.com>
> Closes: https://lore.kernel.org/all/CAK8fFZ5gY8_Mw2A=iZVFNVKQNrXQzVsn-HTd+Me9K6ZfmdgA+Q@mail.gmail.com/
> ---
> CHANGELOG:
> v2:
> - remove comment of kvmclock_update_rs
> - make sure kvm_arch_vcpu_load make KVM_REQ_CLOCK_UPDATE for this vcpu
> - add RATELIMIT_MSG_ON_RELEASE to kvmclock_update_rs
>
> v1:
> - initial version(https://lore.kernel.org/all/20260407070046.2336-1-lei.chen@smartx.com/)
> ---
>  arch/x86/include/asm/kvm_host.h |  1 +
>  arch/x86/kvm/x86.c              | 11 +++++++++--
>  2 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 5a3bfa293e8b..5e750c49d21e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1453,6 +1453,7 @@ struct kvm_arch {
>         bool use_master_clock;
>         u64 master_kernel_ns;
>         u64 master_cycle_now;
> +       struct ratelimit_state kvmclock_update_rs;
>
>  #ifdef CONFIG_KVM_HYPERV
>         struct kvm_hv hyperv;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 63afdb6bb078..a534e8391611 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5210,8 +5210,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>                  * On a host with synchronized TSC, there is no need to update
>                  * kvmclock on vcpu->cpu migration
>                  */
> -               if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> -                       kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> +               if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1) {
> +                       if (__ratelimit(&vcpu->kvm->arch.kvmclock_update_rs))
> +                               kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> +                       else
> +                               kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> +               }
> +
>                 if (vcpu->cpu != cpu)
>                         kvm_make_request(KVM_REQ_MIGRATE_TIMER, vcpu);
>                 vcpu->cpu = cpu;
> @@ -13189,6 +13194,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>         raw_spin_lock_init(&kvm->arch.tsc_write_lock);
>         mutex_init(&kvm->arch.apic_map_lock);
>         seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
> +       ratelimit_state_init(&kvm->arch.kvmclock_update_rs, HZ, 10);
> +       ratelimit_set_flags(&kvm->arch.kvmclock_update_rs, RATELIMIT_MSG_ON_RELEASE);
>         kvm->arch.kvmclock_offset = -get_kvmclock_base_ns();
>
>         raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
> --
> 2.51.0
>

Hi Lei, I tested the v2 patch on a Firecracker host running vanilla
kernel 7.0.3 with this patch applied, the IPI storm on vCPU load is
gone!

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2026-05-06 20:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-21 14:32 [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time Jaroslav Pulchart
2026-03-23  2:27 ` Lei Chen
2026-04-01  6:43   ` Lei Chen
2026-04-01 21:16     ` Sean Christopherson
2026-04-07  7:00       ` [PATCH v1] KVM: x86: Rate-limit global clock updates on vCPU load Lei Chen
2026-04-07 18:02         ` Sean Christopherson
2026-04-09 13:03           ` Lei Chen
2026-04-09 13:36           ` Lei Chen
2026-04-09 14:22           ` [PATCH v2] " Lei Chen
2026-04-09 19:21             ` Sean Christopherson
2026-05-06  9:48               ` Thorsten Leemhuis
2026-05-06 12:55                 ` Sean Christopherson
2026-05-06 14:09                   ` Thorsten Leemhuis
2026-05-06 15:22                     ` Sean Christopherson
2026-05-06 15:58                       ` Jaroslav Pulchart
2026-05-06 20:31                         ` Sean Christopherson
2026-05-06 20:10             ` Jaroslav Pulchart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox