Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH v2 0/1] KVM: powerpc: Use generic xfer to guest work function
@ 2026-07-01 18:30 Vishal Chourasia
  2026-07-01 18:30 ` [PATCH v2 1/1] " Vishal Chourasia
  0 siblings, 1 reply; 4+ messages in thread
From: Vishal Chourasia @ 2026-07-01 18:30 UTC (permalink / raw)
  To: maddy
  Cc: npiggin, mpe, chleroy, sshegde, amachhiw, vaibhav, harshpb,
	gautam, linuxppc-dev, kvm, linux-kernel, Vishal Chourasia

This series fixes a KVM scheduling bug on Book3S HV where a guest VM
under a cpu.max bandwidth limit can run arbitrarily past its quota and
then appear frozen for minutes afterwards. 

== Problem ==

Since commit 2cd571245b43 ("sched/fair: Add related data structure for
task based throttle"), merged in v6.18, CFS bandwidth throttling no
longer dequeues a task directly. Instead it queues a task_work item via
task_work_add(..., TWA_RESUME), sets TIF_NOTIFY_RESUME, and relies on
that work running on the return path to actually dequeue the task.

The powerpc KVM run loops only test TIF_SIGPENDING and TIF_NEED_RESCHED
before re-entering the guest; TIF_NOTIFY_RESUME is never checked. For a
CPU-bound guest that generates few KVM exits back to userspace, the vCPU
thread never returns to user mode, so the deferred throttle task_work
never runs. The guest keeps running unchecked while its
runtime_remaining goes increasingly negative, and once it finally does
exit to userspace it is legitimately throttled for minutes while the
accrued debt is repaid at the bandwidth-timer replenishment rate.

The generic xfer-to-guest-mode infrastructure (commit 935ace2fb5cc,
"entry: Provide infrastructure for work before transitioning to guest
mode") exists precisely to handle this kind of work before each guest
entry. A full trace-backed root-cause analysis was posted with v1 [2].

== Fix ==

Opt powerpc KVM into VIRT_XFER_TO_GUEST_WORK and use the generic
xfer_to_guest_mode helpers to check for and handle pending guest-mode
work (reschedule, signals, and TIF_NOTIFY_RESUME task_work such as the
deferred CFS throttle) on every guest re-entry:

- Book3S HV: both run loops — kvmhv_run_single_vcpu() for POWER9+ and
  kvmppc_run_vcpu() for pre-POWER9.
- Book3S PR and BookE: the common kvmppc_prepare_to_enter(), which
  likewise only checked need_resched()/signal_pending().

== Changes from v1 ==

- Extend the fix beyond Book3S HV to the shared powerpc KVM entry path:
  also convert the common kvmppc_prepare_to_enter() used by Book3S PR
  and BookE. (Shrikanth Shegde)
- Move "select VIRT_XFER_TO_GUEST_WORK" from KVM_BOOK3S_64_HV up to the
  common "config KVM" so every powerpc KVM variant gets the
  infrastructure.
- Drop the redundant signal_pending() recheck and its sigpend label in
  kvmhv_run_single_vcpu(); xfer_to_guest_mode_work_pending() is a
  superset of it.
- Preserve the E500 CONFIG_KVM_EXIT_TIMING histogram on the signal path
  via an explicit kvmppc_set_exit_type(SIGNAL_EXITS).

[1] https://lore.kernel.org/all/20250421102837.78515-2-sshegde@linux.ibm.com/
[2] https://lore.kernel.org/all/20260626105449.2897924-2-vishalc@linux.ibm.com/

Vishal Chourasia (1):
  KVM: powerpc: Use generic xfer to guest work function

 arch/powerpc/kvm/Kconfig     |  1 +
 arch/powerpc/kvm/book3s_hv.c | 64 ++++++++++++++++++++++++++++--------
 arch/powerpc/kvm/powerpc.c   | 34 ++++++++++++++-----
 3 files changed, 77 insertions(+), 22 deletions(-)

-- 
2.54.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-07-02  7:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-07-01 18:30 [PATCH v2 0/1] KVM: powerpc: Use generic xfer to guest work function Vishal Chourasia
2026-07-01 18:30 ` [PATCH v2 1/1] " Vishal Chourasia
2026-07-01 18:42   ` sashiko-bot
2026-07-02  7:47   ` Shrikanth Hegde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox