* [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
@ 2026-03-21 14:32 Jaroslav Pulchart
2026-03-23 2:27 ` Lei Chen
0 siblings, 1 reply; 5+ messages in thread
From: Jaroslav Pulchart @ 2026-03-21 14:32 UTC (permalink / raw)
To: kvm; +Cc: LKML, seanjc, pbonzini, lei.chen, Igor Raits, Jan Cipa
Hi,
I am reporting a performance regression in Linux 6.19 that severely
impacts KVM hosts running many Firecracker microVMs.
== Bisect result ==
446fcce2a52b533c543dabba26777813c347577c is the first bad commit
commit 446fcce2a52b533c543dabba26777813c347577c
Author: Lei Chen <lei.chen@smartx.com>
Date: Tue Aug 19 23:20:26 2025 +0800
Revert "x86: kvm: rate-limit global clock updates"
This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
intends to use a kvmclock_update_work to sync ntp corretion
across all vcpus kvmclock, which is based on commit 0061d53daf26f
("KVM: x86: limit difference between kvmclock updates")
Since kvmclock has been switched to mono raw, this commit can be
reverted.
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/x86.c | 29 ++++-------------------------
2 files changed, 4 insertions(+), 26 deletions(-)
==== Symptoms ====
Measured on a KVM micro VM host running many Firecracker microVMs
(node_exporter metrics, 2026-03-20):
kernel 6.19:
steal time inside guest VMs: 3–24% per vCPU (sustained)
host system CPU (kernel mode): 3–12 CPUs saturated
host steal: 3–8%
kernel 6.18 (same host, same workload after rollback):
steal time inside guest VMs: < 0.02% per vCPU (~200x lower)
host system CPU (kernel mode): 2–3 CPUs
host steal: 0.3–0.5%
==== Root cause (by AI analyze) ====
The regressing commit removes the rate-limiting from
kvm_gen_kvmclock_update(). Previously this function deferred the
all-vCPU kick via a 100ms delayed_work:
/* 6.18 */
static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
schedule_delayed_work(&kvm->arch.kvmclock_update_work,
KVMCLOCK_UPDATE_DELAY); /* 100ms */
}
After the revert it kicks every vCPU of the VM synchronously on
every call:
/* 6.19 */
static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
kvm_vcpu_kick(vcpu);
}
}
KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
is issued on every vCPU load when use_master_clock is false
(arch/x86/kvm/x86.c, kvm_vcpu_load):
if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
With many Firecracker microVMs, the vCPU scheduling rate is high.
Each scheduling event now IPIs every sibling vCPU of the VM, instead
of coalescing all-vCPU kicks into at most one per 100ms. This creates
a continuous IPI storm on the host, visible as high kernel (system)
CPU time and high steal time inside guest VMs.
The commit justifies the removal with "Since kvmclock has been switched
to mono raw, this commit can be reverted." That reasoning is correct
for the NTP-correction use case, but the 100ms rate-limit also
protected against IPI storms when use_master_clock is false — a
concern independent of clock source.
==== Full bisect log ====
git bisect start
# status: waiting for both good and bad commits
# good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
# status: waiting for bad commit, 1 good commit known
# bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
# good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
'hwmon-for-v6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
# bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
'tty-6.19-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
# bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
'soc-arm-6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
# good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
'tracepoints-v6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
git bisect good 36492b7141b9abc967e92c991af32c670351dc16
# good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
'pull-persistency' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
# bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
# bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
# bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
unused declaration kvm_mmu_may_ignore_guest_pat()
git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
# bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
feature to track the MMIO Stale Data mitigation
git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
# good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
Explicitly set new periodic hrtimer expiration in apic_timer_fn()
git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
# bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
comment about ntp correction sync for
git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
# good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
lapic_timer in a local variable to cleanup periodic code
git bisect good a091fe60c2d3943b058132a64682a509d55bd325
# bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
rate-limit global clock updates"
git bisect bad 446fcce2a52b533c543dabba26777813c347577c
# good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
introduce periodic global clock updates"
git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
# first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
"x86: kvm: rate-limit global clock updates"
Best regards,
Jaroslav Pulchart
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
2026-03-21 14:32 [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time Jaroslav Pulchart
@ 2026-03-23 2:27 ` Lei Chen
2026-04-01 6:43 ` Lei Chen
0 siblings, 1 reply; 5+ messages in thread
From: Lei Chen @ 2026-03-23 2:27 UTC (permalink / raw)
To: Jaroslav Pulchart; +Cc: kvm, LKML, seanjc, pbonzini, Igor Raits, Jan Cipa
Hi Jaroslav,
Thanks for your test and report, I'm looking into this problem.
Best regards
Lei Chen
On Sat, Mar 21, 2026 at 10:33 PM Jaroslav Pulchart
<jaroslav.pulchart@gooddata.com> wrote:
>
> Hi,
>
> I am reporting a performance regression in Linux 6.19 that severely
> impacts KVM hosts running many Firecracker microVMs.
>
> == Bisect result ==
>
> 446fcce2a52b533c543dabba26777813c347577c is the first bad commit
> commit 446fcce2a52b533c543dabba26777813c347577c
> Author: Lei Chen <lei.chen@smartx.com>
> Date: Tue Aug 19 23:20:26 2025 +0800
>
> Revert "x86: kvm: rate-limit global clock updates"
>
> This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
>
> Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
> intends to use a kvmclock_update_work to sync ntp corretion
> across all vcpus kvmclock, which is based on commit 0061d53daf26f
> ("KVM: x86: limit difference between kvmclock updates")
>
> Since kvmclock has been switched to mono raw, this commit can be
> reverted.
>
> Signed-off-by: Lei Chen <lei.chen@smartx.com>
> Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
> Signed-off-by: Sean Christopherson <seanjc@google.com>
>
> arch/x86/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/x86.c | 29 ++++-------------------------
> 2 files changed, 4 insertions(+), 26 deletions(-)
>
> ==== Symptoms ====
>
> Measured on a KVM micro VM host running many Firecracker microVMs
> (node_exporter metrics, 2026-03-20):
>
> kernel 6.19:
> steal time inside guest VMs: 3–24% per vCPU (sustained)
> host system CPU (kernel mode): 3–12 CPUs saturated
> host steal: 3–8%
>
> kernel 6.18 (same host, same workload after rollback):
> steal time inside guest VMs: < 0.02% per vCPU (~200x lower)
> host system CPU (kernel mode): 2–3 CPUs
> host steal: 0.3–0.5%
>
> ==== Root cause (by AI analyze) ====
>
> The regressing commit removes the rate-limiting from
> kvm_gen_kvmclock_update(). Previously this function deferred the
> all-vCPU kick via a 100ms delayed_work:
>
> /* 6.18 */
> static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> KVMCLOCK_UPDATE_DELAY); /* 100ms */
> }
>
> After the revert it kicks every vCPU of the VM synchronously on
> every call:
>
> /* 6.19 */
> static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> kvm_for_each_vcpu(i, vcpu, kvm) {
> kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> kvm_vcpu_kick(vcpu);
> }
> }
>
> KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
> is issued on every vCPU load when use_master_clock is false
> (arch/x86/kvm/x86.c, kvm_vcpu_load):
>
> if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
>
> With many Firecracker microVMs, the vCPU scheduling rate is high.
> Each scheduling event now IPIs every sibling vCPU of the VM, instead
> of coalescing all-vCPU kicks into at most one per 100ms. This creates
> a continuous IPI storm on the host, visible as high kernel (system)
> CPU time and high steal time inside guest VMs.
>
> The commit justifies the removal with "Since kvmclock has been switched
> to mono raw, this commit can be reverted." That reasoning is correct
> for the NTP-correction use case, but the 100ms rate-limit also
> protected against IPI storms when use_master_clock is false — a
> concern independent of clock source.
>
> ==== Full bisect log ====
>
> git bisect start
> # status: waiting for both good and bad commits
> # good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
> git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
> # status: waiting for bad commit, 1 good commit known
> # bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
> git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> # good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
> 'hwmon-for-v6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
> # bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
> 'tty-6.19-rc1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
> # bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
> 'soc-arm-6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
> # good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
> 'tracepoints-v6.19' of
> git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> git bisect good 36492b7141b9abc967e92c991af32c670351dc16
> # good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
> 'pull-persistency' of
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
> # bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
> 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
> git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
> # bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
> 'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
> git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
> # bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
> unused declaration kvm_mmu_may_ignore_guest_pat()
> git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
> # bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
> feature to track the MMIO Stale Data mitigation
> git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
> # good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
> Explicitly set new periodic hrtimer expiration in apic_timer_fn()
> git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
> # bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
> comment about ntp correction sync for
> git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
> # good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
> lapic_timer in a local variable to cleanup periodic code
> git bisect good a091fe60c2d3943b058132a64682a509d55bd325
> # bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
> rate-limit global clock updates"
> git bisect bad 446fcce2a52b533c543dabba26777813c347577c
> # good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
> introduce periodic global clock updates"
> git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
> # first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
> "x86: kvm: rate-limit global clock updates"
>
> Best regards,
> Jaroslav Pulchart
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
2026-03-23 2:27 ` Lei Chen
@ 2026-04-01 6:43 ` Lei Chen
2026-04-01 21:16 ` Sean Christopherson
0 siblings, 1 reply; 5+ messages in thread
From: Lei Chen @ 2026-04-01 6:43 UTC (permalink / raw)
To: Jaroslav Pulchart; +Cc: kvm, LKML, seanjc, pbonzini, Igor Raits, Jan Cipa
Hi Jaroslav,
I apologize for the late reply.
I have reviewed the code and identified two scenarios that currently
trigger the KVM_REQ_GLOBAL_CLOCK_UPDATE request:
Scenario 1: kvm_write_system_time
This code path occurs when the hypervisor (such as QEMU) adjusts the
time, or when the guest writes to the TSC.
Scenario 2: vcpu schedule in kvm_arch_vcpu_load
If this function triggers KVM_REQ_GLOBAL_CLOCK_UPDATE, it indicates
that the virtual machine is not using the master_clock.
Those two cases are uncommon. Could you please provide your dmesg and
help check which code path triggers KVM_REQ_GLOBAL_CLOCK_UPDATE?
Best regards,
Lei Chen
On Mon, Mar 23, 2026 at 10:27 AM Lei Chen <lei.chen@smartx.com> wrote:
>
> Hi Jaroslav,
>
> Thanks for your test and report, I'm looking into this problem.
>
> Best regards
> Lei Chen
>
> On Sat, Mar 21, 2026 at 10:33 PM Jaroslav Pulchart
> <jaroslav.pulchart@gooddata.com> wrote:
> >
> > Hi,
> >
> > I am reporting a performance regression in Linux 6.19 that severely
> > impacts KVM hosts running many Firecracker microVMs.
> >
> > == Bisect result ==
> >
> > 446fcce2a52b533c543dabba26777813c347577c is the first bad commit
> > commit 446fcce2a52b533c543dabba26777813c347577c
> > Author: Lei Chen <lei.chen@smartx.com>
> > Date: Tue Aug 19 23:20:26 2025 +0800
> >
> > Revert "x86: kvm: rate-limit global clock updates"
> >
> > This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
> >
> > Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
> > intends to use a kvmclock_update_work to sync ntp corretion
> > across all vcpus kvmclock, which is based on commit 0061d53daf26f
> > ("KVM: x86: limit difference between kvmclock updates")
> >
> > Since kvmclock has been switched to mono raw, this commit can be
> > reverted.
> >
> > Signed-off-by: Lei Chen <lei.chen@smartx.com>
> > Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> >
> > arch/x86/include/asm/kvm_host.h | 1 -
> > arch/x86/kvm/x86.c | 29 ++++-------------------------
> > 2 files changed, 4 insertions(+), 26 deletions(-)
> >
> > ==== Symptoms ====
> >
> > Measured on a KVM micro VM host running many Firecracker microVMs
> > (node_exporter metrics, 2026-03-20):
> >
> > kernel 6.19:
> > steal time inside guest VMs: 3–24% per vCPU (sustained)
> > host system CPU (kernel mode): 3–12 CPUs saturated
> > host steal: 3–8%
> >
> > kernel 6.18 (same host, same workload after rollback):
> > steal time inside guest VMs: < 0.02% per vCPU (~200x lower)
> > host system CPU (kernel mode): 2–3 CPUs
> > host steal: 0.3–0.5%
> >
> > ==== Root cause (by AI analyze) ====
> >
> > The regressing commit removes the rate-limiting from
> > kvm_gen_kvmclock_update(). Previously this function deferred the
> > all-vCPU kick via a 100ms delayed_work:
> >
> > /* 6.18 */
> > static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> > kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
> > schedule_delayed_work(&kvm->arch.kvmclock_update_work,
> > KVMCLOCK_UPDATE_DELAY); /* 100ms */
> > }
> >
> > After the revert it kicks every vCPU of the VM synchronously on
> > every call:
> >
> > /* 6.19 */
> > static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
> > kvm_for_each_vcpu(i, vcpu, kvm) {
> > kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
> > kvm_vcpu_kick(vcpu);
> > }
> > }
> >
> > KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
> > is issued on every vCPU load when use_master_clock is false
> > (arch/x86/kvm/x86.c, kvm_vcpu_load):
> >
> > if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
> > kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
> >
> > With many Firecracker microVMs, the vCPU scheduling rate is high.
> > Each scheduling event now IPIs every sibling vCPU of the VM, instead
> > of coalescing all-vCPU kicks into at most one per 100ms. This creates
> > a continuous IPI storm on the host, visible as high kernel (system)
> > CPU time and high steal time inside guest VMs.
> >
> > The commit justifies the removal with "Since kvmclock has been switched
> > to mono raw, this commit can be reverted." That reasoning is correct
> > for the NTP-correction use case, but the 100ms rate-limit also
> > protected against IPI storms when use_master_clock is false — a
> > concern independent of clock source.
> >
> > ==== Full bisect log ====
> >
> > git bisect start
> > # status: waiting for both good and bad commits
> > # good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
> > git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
> > # status: waiting for bad commit, 1 good commit known
> > # bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
> > git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
> > # good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
> > 'hwmon-for-v6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
> > git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
> > # bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
> > 'tty-6.19-rc1' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
> > git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
> > # bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
> > 'soc-arm-6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
> > git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
> > # good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
> > 'tracepoints-v6.19' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
> > git bisect good 36492b7141b9abc967e92c991af32c670351dc16
> > # good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
> > 'pull-persistency' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> > git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
> > # bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
> > 'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
> > git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
> > # bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
> > 'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
> > git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
> > # bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
> > unused declaration kvm_mmu_may_ignore_guest_pat()
> > git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
> > # bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
> > feature to track the MMIO Stale Data mitigation
> > git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
> > # good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
> > Explicitly set new periodic hrtimer expiration in apic_timer_fn()
> > git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
> > # bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
> > comment about ntp correction sync for
> > git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
> > # good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
> > lapic_timer in a local variable to cleanup periodic code
> > git bisect good a091fe60c2d3943b058132a64682a509d55bd325
> > # bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
> > rate-limit global clock updates"
> > git bisect bad 446fcce2a52b533c543dabba26777813c347577c
> > # good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
> > introduce periodic global clock updates"
> > git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
> > # first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
> > "x86: kvm: rate-limit global clock updates"
> >
> > Best regards,
> > Jaroslav Pulchart
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
2026-04-01 6:43 ` Lei Chen
@ 2026-04-01 21:16 ` Sean Christopherson
0 siblings, 0 replies; 5+ messages in thread
From: Sean Christopherson @ 2026-04-01 21:16 UTC (permalink / raw)
To: Lei Chen; +Cc: Jaroslav Pulchart, kvm, LKML, pbonzini, Igor Raits, Jan Cipa
On Wed, Apr 01, 2026, Lei Chen wrote:
> Hi Jaroslav,
>
> I apologize for the late reply.
>
> I have reviewed the code and identified two scenarios that currently
> trigger the KVM_REQ_GLOBAL_CLOCK_UPDATE request:
>
> Scenario 1: kvm_write_system_time
> This code path occurs when the hypervisor (such as QEMU) adjusts the
> time, or when the guest writes to the TSC.
>
> Scenario 2: vcpu schedule in kvm_arch_vcpu_load
> If this function triggers KVM_REQ_GLOBAL_CLOCK_UPDATE, it indicates
> that the virtual machine is not using the master_clock.
>
> Those two cases are uncommon. Could you please provide your dmesg and
> help check which code path triggers KVM_REQ_GLOBAL_CLOCK_UPDATE?
I'm also mildly curious as to why KVM_REQ_GLOBAL_CLOCK_UPDATE is being
triggered, but I don't know that it matters. E.g. fixing the underlying flaw
(if one even exists) could fix Jaroslav's setup, but it won't fix setups where
the "uncommon" cases are unavoidable, e.g. on setups that _can't_ use a master
clock for whatever reason.
At a glance, explicitly ratelimiting kvm_gen_kvmclock_update() seems like the
simplest option to address the regression.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time
@ 2026-03-21 14:59 Jaroslav Pulchart
0 siblings, 0 replies; 5+ messages in thread
From: Jaroslav Pulchart @ 2026-03-21 14:59 UTC (permalink / raw)
To: kvm; +Cc: LKML, seanjc, pbonzini, lei.chen, Igor Raits, Jan Cipa
Hi,
I am reporting a performance regression in Linux 6.19 that severely
impacts KVM hosts running many Firecracker microVMs.
==== Bisect result ====
446fcce2a52b533c543dabba26777813c347577c is the first bad commit
commit 446fcce2a52b533c543dabba26777813c347577c
Author: Lei Chen <lei.chen@smartx.com>
Date: Tue Aug 19 23:20:26 2025 +0800
Revert "x86: kvm: rate-limit global clock updates"
This reverts commit 7e44e4495a398eb553ce561f29f9148f40a3448f.
Commit 7e44e4495a39 ("x86: kvm: rate-limit global clock updates")
intends to use a kvmclock_update_work to sync ntp corretion
across all vcpus kvmclock, which is based on commit 0061d53daf26f
("KVM: x86: limit difference between kvmclock updates")
Since kvmclock has been switched to mono raw, this commit can be
reverted.
Signed-off-by: Lei Chen <lei.chen@smartx.com>
Link: https://patch.msgid.link/20250819152027.1687487-3-lei.chen@smartx.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/x86.c | 29 ++++-------------------------
2 files changed, 4 insertions(+), 26 deletions(-)
== Symptoms ==
Measured on a KVM micro VM host running many Firecracker microVMs
(node_exporter metrics, 2026-03-20):
kernel 6.19:
steal time inside guest VMs: 3–24% per vCPU (sustained)
host system CPU (kernel mode): 3–12 CPUs saturated
host steal: 3–8%
kernel 6.18 (same host, same workload after rollback):
steal time inside guest VMs: < 0.02% per vCPU (~200x lower)
host system CPU (kernel mode): 2–3 CPUs
host steal: 0.3–0.5%
==== Root cause analyze by AI ====
The regressing commit removes the rate-limiting from
kvm_gen_kvmclock_update(). Previously this function deferred the
all-vCPU kick via a 100ms delayed_work:
/* 6.18 */
static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
kvm_make_request(KVM_REQ_CLOCK_UPDATE, v);
schedule_delayed_work(&kvm->arch.kvmclock_update_work,
KVMCLOCK_UPDATE_DELAY); /* 100ms */
}
After the revert it kicks every vCPU of the VM synchronously on
every call:
/* 6.19 */
static void kvm_gen_kvmclock_update(struct kvm_vcpu *v) {
kvm_for_each_vcpu(i, vcpu, kvm) {
kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
kvm_vcpu_kick(vcpu);
}
}
KVM_REQ_GLOBAL_CLOCK_UPDATE, which calls kvm_gen_kvmclock_update(),
is issued on every vCPU load when use_master_clock is false
(arch/x86/kvm/x86.c, kvm_vcpu_load):
if (!vcpu->kvm->arch.use_master_clock || vcpu->cpu == -1)
kvm_make_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu);
With many Firecracker microVMs, the vCPU scheduling rate is high.
Each scheduling event now IPIs every sibling vCPU of the VM, instead
of coalescing all-vCPU kicks into at most one per 100ms. This creates
a continuous IPI storm on the host, visible as high kernel (system)
CPU time and high steal time inside guest VMs.
The commit justifies the removal with "Since kvmclock has been switched
to mono raw, this commit can be reverted." That reasoning is correct
for the NTP-correction use case, but the 100ms rate-limit also
protected against IPI storms when use_master_clock is false — a
concern independent of clock source.
==== Full bisect log ====
git bisect start
# status: waiting for both good and bad commits
# good: [7d0a66e4bb9081d75c82ec4957c50034cb0ea449] Linux 6.18
git bisect good 7d0a66e4bb9081d75c82ec4957c50034cb0ea449
# status: waiting for bad commit, 1 good commit known
# bad: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
git bisect bad 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
# good: [02892f90a9851f508e557b3c75e93fc178310d5f] Merge tag
'hwmon-for-v6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
git bisect good 02892f90a9851f508e557b3c75e93fc178310d5f
# bad: [edf602a17b03e6bca31c48f34ac8fc3341503ac1] Merge tag
'tty-6.19-rc1' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect bad edf602a17b03e6bca31c48f34ac8fc3341503ac1
# bad: [09cab48db950b6fb8c114314a20c0fd5a80cf990] Merge tag
'soc-arm-6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 09cab48db950b6fb8c114314a20c0fd5a80cf990
# good: [36492b7141b9abc967e92c991af32c670351dc16] Merge tag
'tracepoints-v6.19' of
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
git bisect good 36492b7141b9abc967e92c991af32c670351dc16
# good: [7cd122b55283d3ceef71a5b723ccaa03a72284b4] Merge tag
'pull-persistency' of
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect good 7cd122b55283d3ceef71a5b723ccaa03a72284b4
# bad: [63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5] Merge tag
'kvm-riscv-6.19-1' of https://github.com/kvm-riscv/linux into HEAD
git bisect bad 63a9b0bc65d5d3ea96a57e7985ea22a8582fbbe5
# bad: [adc99a6cfcf76d670272dea64bbc2d43ecd12a2f] Merge tag
'kvm-x86-mmu-6.19' of https://github.com/kvm-x86/linux into HEAD
git bisect bad adc99a6cfcf76d670272dea64bbc2d43ecd12a2f
# bad: [c09816f2afce0f89f176c4bc58dc57ec9f204998] KVM: x86: Remove
unused declaration kvm_mmu_may_ignore_guest_pat()
git bisect bad c09816f2afce0f89f176c4bc58dc57ec9f204998
# bad: [f6106d41ec84e552a5e8adda1f8741cab96a5425] x86/bugs: Use an x86
feature to track the MMIO Stale Data mitigation
git bisect bad f6106d41ec84e552a5e8adda1f8741cab96a5425
# good: [9633f180ce994ab293ce4924a9b7aaf4673aa114] KVM: x86:
Explicitly set new periodic hrtimer expiration in apic_timer_fn()
git bisect good 9633f180ce994ab293ce4924a9b7aaf4673aa114
# bad: [e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98] KVM: x86: remove
comment about ntp correction sync for
git bisect bad e78fb96b41c6ac85c1a02c7e9610d1ebaa9b5d98
# good: [a091fe60c2d3943b058132a64682a509d55bd325] KVM: x86: Grab
lapic_timer in a local variable to cleanup periodic code
git bisect good a091fe60c2d3943b058132a64682a509d55bd325
# bad: [446fcce2a52b533c543dabba26777813c347577c] Revert "x86: kvm:
rate-limit global clock updates"
git bisect bad 446fcce2a52b533c543dabba26777813c347577c
# good: [43ddbf16edf5c1790684b32d5eb920a1b0eea285] Revert "x86: kvm:
introduce periodic global clock updates"
git bisect good 43ddbf16edf5c1790684b32d5eb920a1b0eea285
# first bad commit: [446fcce2a52b533c543dabba26777813c347577c] Revert
"x86: kvm: rate-limit global clock updates"
Best regards
--
Jaroslav Pulchart
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-04-01 21:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-21 14:32 [REGRESSION 6.19, BISECTED] KVM: x86: kvmclock rate-limit removal causes IPI storm and high guest steal time Jaroslav Pulchart
2026-03-23 2:27 ` Lei Chen
2026-04-01 6:43 ` Lei Chen
2026-04-01 21:16 ` Sean Christopherson
-- strict thread matches above, loose matches on Subject: below --
2026-03-21 14:59 Jaroslav Pulchart
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox