* [PATCH v2 0/1] KVM: powerpc: Use generic xfer to guest work function
@ 2026-07-01 18:30 Vishal Chourasia
2026-07-01 18:30 ` [PATCH v2 1/1] " Vishal Chourasia
0 siblings, 1 reply; 4+ messages in thread
From: Vishal Chourasia @ 2026-07-01 18:30 UTC (permalink / raw)
To: maddy
Cc: npiggin, mpe, chleroy, sshegde, amachhiw, vaibhav, harshpb,
gautam, linuxppc-dev, kvm, linux-kernel, Vishal Chourasia
This series fixes a KVM scheduling bug on Book3S HV where a guest VM
under a cpu.max bandwidth limit can run arbitrarily past its quota and
then appear frozen for minutes afterwards.
== Problem ==
Since commit 2cd571245b43 ("sched/fair: Add related data structure for
task based throttle"), merged in v6.18, CFS bandwidth throttling no
longer dequeues a task directly. Instead it queues a task_work item via
task_work_add(..., TWA_RESUME), sets TIF_NOTIFY_RESUME, and relies on
that work running on the return path to actually dequeue the task.
The powerpc KVM run loops only test TIF_SIGPENDING and TIF_NEED_RESCHED
before re-entering the guest; TIF_NOTIFY_RESUME is never checked. For a
CPU-bound guest that generates few KVM exits back to userspace, the vCPU
thread never returns to user mode, so the deferred throttle task_work
never runs. The guest keeps running unchecked while its
runtime_remaining goes increasingly negative, and once it finally does
exit to userspace it is legitimately throttled for minutes while the
accrued debt is repaid at the bandwidth-timer replenishment rate.
The generic xfer-to-guest-mode infrastructure (commit 935ace2fb5cc,
"entry: Provide infrastructure for work before transitioning to guest
mode") exists precisely to handle this kind of work before each guest
entry. A full trace-backed root-cause analysis was posted with v1 [2].
== Fix ==
Opt powerpc KVM into VIRT_XFER_TO_GUEST_WORK and use the generic
xfer_to_guest_mode helpers to check for and handle pending guest-mode
work (reschedule, signals, and TIF_NOTIFY_RESUME task_work such as the
deferred CFS throttle) on every guest re-entry:
- Book3S HV: both run loops — kvmhv_run_single_vcpu() for POWER9+ and
kvmppc_run_vcpu() for pre-POWER9.
- Book3S PR and BookE: the common kvmppc_prepare_to_enter(), which
likewise only checked need_resched()/signal_pending().
== Changes from v1 ==
- Extend the fix beyond Book3S HV to the shared powerpc KVM entry path:
also convert the common kvmppc_prepare_to_enter() used by Book3S PR
and BookE. (Shrikanth Shegde)
- Move "select VIRT_XFER_TO_GUEST_WORK" from KVM_BOOK3S_64_HV up to the
common "config KVM" so every powerpc KVM variant gets the
infrastructure.
- Drop the redundant signal_pending() recheck and its sigpend label in
kvmhv_run_single_vcpu(); xfer_to_guest_mode_work_pending() is a
superset of it.
- Preserve the E500 CONFIG_KVM_EXIT_TIMING histogram on the signal path
via an explicit kvmppc_set_exit_type(SIGNAL_EXITS).
[1] https://lore.kernel.org/all/20250421102837.78515-2-sshegde@linux.ibm.com/
[2] https://lore.kernel.org/all/20260626105449.2897924-2-vishalc@linux.ibm.com/
Vishal Chourasia (1):
KVM: powerpc: Use generic xfer to guest work function
arch/powerpc/kvm/Kconfig | 1 +
arch/powerpc/kvm/book3s_hv.c | 64 ++++++++++++++++++++++++++++--------
arch/powerpc/kvm/powerpc.c | 34 ++++++++++++++-----
3 files changed, 77 insertions(+), 22 deletions(-)
--
2.54.0
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH v2 1/1] KVM: powerpc: Use generic xfer to guest work function 2026-07-01 18:30 [PATCH v2 0/1] KVM: powerpc: Use generic xfer to guest work function Vishal Chourasia @ 2026-07-01 18:30 ` Vishal Chourasia 2026-07-01 18:42 ` sashiko-bot 2026-07-02 7:47 ` Shrikanth Hegde 0 siblings, 2 replies; 4+ messages in thread From: Vishal Chourasia @ 2026-07-01 18:30 UTC (permalink / raw) To: maddy Cc: npiggin, mpe, chleroy, sshegde, amachhiw, vaibhav, harshpb, gautam, linuxppc-dev, kvm, linux-kernel, Vishal Chourasia Since commit 2cd571245b43 ("sched/fair: Add related data structure for task based throttle") in v6.18, CFS bandwidth throttling no longer dequeues a task directly; it queues task_work via TWA_RESUME and sets TIF_NOTIFY_RESUME, relying on that work running before the task returns to guest/user mode. The powerpc KVM run loops only checked for reschedule and signals, never TIF_NOTIFY_RESUME, so the deferred throttle never ran while a vCPU stayed in the run loop: a CPU-bound guest that rarely exits to userspace ran far past its cpu.max quota and then appeared frozen for minutes while the accrued throttle debt was repaid. Use the generic infrastructure to check for and handle pending work before transitioning into guest mode, replacing the open-coded need_resched() and cond_resched() checks in the Book3S HV run loops and in the common kvmppc_prepare_to_enter() used by the Book3S PR and BookE run loops. The redundant signal_pending() recheck (and its sigpend label) in kvmhv_run_single_vcpu() is also dropped, as xfer_to_guest_mode_work_pending() is a superset of it. This picks up handling for TIF_NOTIFY_RESUME, which was previously ignored, meaning task work will now be correctly handled on every guest re-entry. In kvmppc_prepare_to_enter() the generic helper accounts the signal exit (vcpu->stat.signal_exits and KVM_EXIT_INTR) but does not set the exit type, so kvmppc_set_exit_type(SIGNAL_EXITS) is retained on the signal path to preserve the E500 CONFIG_KVM_EXIT_TIMING histogram; it is a no-op otherwise. Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> --- arch/powerpc/kvm/Kconfig | 1 + arch/powerpc/kvm/book3s_hv.c | 64 ++++++++++++++++++++++++++++-------- arch/powerpc/kvm/powerpc.c | 34 ++++++++++++++----- 3 files changed, 77 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig index 9a0d1c1aca6c..b6bc2fc86dca 100644 --- a/arch/powerpc/kvm/Kconfig +++ b/arch/powerpc/kvm/Kconfig @@ -22,6 +22,7 @@ config KVM select KVM_COMMON select KVM_VFIO select HAVE_KVM_IRQ_BYPASS + select VIRT_XFER_TO_GUEST_WORK config KVM_BOOK3S_HANDLER bool diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 61dbeea317f3..a1b2077561bb 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -3850,10 +3850,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) * and return without going into the guest(s). * If the mmu_ready flag has been cleared, don't go into the * guest because that means a HPT resize operation is in progress. + * + * xfer_to_guest_mode_work_pending() is the IRQs-disabled recheck for + * pending guest-mode work (reschedule, signals, and TIF_NOTIFY_RESUME + * task_work such as the deferred CFS throttle). It is the pre-POWER9 + * analog of the final gate in kvmhv_run_single_vcpu(), and a superset + * of the old need_resched() check: it catches work that raced in after + * the drain in kvmppc_run_vcpu(), so a CPU-bound vCPU is throttled here + * instead of running one more guest dispatch past its quota. IRQs are + * hard-disabled just above, so the non-__ variant (which asserts that) + * is the correct one. */ local_irq_disable(); hard_irq_disable(); - if (lazy_irq_pending() || need_resched() || + if (lazy_irq_pending() || xfer_to_guest_mode_work_pending() || recheck_signals_and_mmu(&core_info)) { local_irq_enable(); vc->vcore_state = VCORE_INACTIVE; @@ -4824,10 +4834,24 @@ static int kvmppc_run_vcpu(struct kvm_vcpu *vcpu) vc->runner = vcpu; if (n_ceded == vc->n_runnable) { kvmppc_vcore_blocked(vc); - } else if (need_resched()) { + } else if (__xfer_to_guest_mode_work_pending()) { kvmppc_vcore_preempt(vc); - /* Let something else run */ - cond_resched_lock(&vc->lock); + /* + * Let something else run, and run pending guest-mode + * work (reschedule, and TIF_NOTIFY_RESUME task_work such + * as the deferred CFS throttle) before we would re-enter + * the guest, so a CPU-bound vCPU is actually throttled + * here instead of running past its quota. This is a + * superset of the old need_resched() check. Use the raw + * helper, not the kvm_ wrapper: signals (KVM_EXIT_INTR + * and the signal_exits stat) are accounted by this path's + * existing handling below, so going through the wrapper + * here would double-count them. The helper may schedule(), + * so the vcore lock is dropped around it. + */ + spin_unlock(&vc->lock); + xfer_to_guest_mode_handle_work(); + spin_lock(&vc->lock); if (vc->vcore_state == VCORE_PREEMPT) kvmppc_vcore_end_preempt(vc); } else { @@ -4899,8 +4923,21 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, } } - if (need_resched()) - cond_resched(); + /* + * Run pending work before (re-)entering the guest, most importantly + * task_work queued via TWA_RESUME (e.g. the deferred CFS bandwidth + * throttle, which only sets TIF_NOTIFY_RESUME). Without this a CPU-bound + * vCPU that keeps returning RESUME_GUEST never reaches an exit-to-user + * point, so the throttle is never enforced and the task runs far beyond + * its quota. The helper also handles reschedule and signals, replacing + * the cond_resched() that was here. It may schedule(), so it runs before + * preemption and IRQs are disabled, with no vcore/KVM locks held. This + * is the per-reentry site shared by the bare-metal and pseries (nested) + * paths, so both are covered. + */ + r = kvm_xfer_to_guest_mode_handle_work(vcpu); + if (r) /* -EINTR: signal pending, exit to userspace (KVM_EXIT_INTR) */ + return r; kvmppc_update_vpas(vcpu); @@ -4914,9 +4951,14 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, vcpu->arch.state = KVMPPC_VCPU_RUNNABLE; - if (signal_pending(current)) - goto sigpend; - if (need_resched() || !kvm->arch.mmu_ready) + /* + * Final IRQs-disabled check for pending guest-mode work or an MMU that + * is not ready. IRQs are disabled here, so bail to the outer loop, + * which re-enters and handles the pending work via + * kvm_xfer_to_guest_mode_handle_work() above (exiting with -EINTR on a + * signal). + */ + if (xfer_to_guest_mode_work_pending() || !kvm->arch.mmu_ready) goto out; vcpu->cpu = pcpu; @@ -5068,10 +5110,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, return vcpu->arch.ret; - sigpend: - vcpu->stat.signal_exits++; - run->exit_reason = KVM_EXIT_INTR; - vcpu->arch.ret = -EINTR; out: vcpu->cpu = -1; vcpu->arch.thread_cpu = -1; diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 00302399fc37..ff1a9a8de5e0 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -84,20 +84,36 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) hard_irq_disable(); while (true) { - if (need_resched()) { + if (__xfer_to_guest_mode_work_pending()) { + /* + * Handle pending guest-mode work before entering the + * guest: reschedule, signals, and TIF_NOTIFY_RESUME + * task_work such as the deferred CFS bandwidth throttle. + * The helper must run with interrupts enabled and may + * schedule(). This is a superset of the open-coded + * need_resched()/signal_pending() checks it replaces. On + * a pending signal it returns -EINTR after + * kvm_xfer_to_guest_mode_handle_work() has set + * run->exit_reason (KVM_EXIT_INTR) and bumped + * vcpu->stat.signal_exits, so just return to userspace. + */ local_irq_enable(); - cond_resched(); + r = kvm_xfer_to_guest_mode_handle_work(vcpu); hard_irq_disable(); + if (r) { + /* + * -EINTR: the generic helper does not set the + * exit type, so record it here for the E500 + * CONFIG_KVM_EXIT_TIMING histogram (a no-op + * otherwise). Only the exit type is set; + * signal_exits was already accounted above. + */ + kvmppc_set_exit_type(vcpu, SIGNAL_EXITS); + break; + } continue; } - if (signal_pending(current)) { - kvmppc_account_exit(vcpu, SIGNAL_EXITS); - vcpu->run->exit_reason = KVM_EXIT_INTR; - r = -EINTR; - break; - } - vcpu->mode = IN_GUEST_MODE; /* -- 2.54.0 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] KVM: powerpc: Use generic xfer to guest work function 2026-07-01 18:30 ` [PATCH v2 1/1] " Vishal Chourasia @ 2026-07-01 18:42 ` sashiko-bot 2026-07-02 7:47 ` Shrikanth Hegde 1 sibling, 0 replies; 4+ messages in thread From: sashiko-bot @ 2026-07-01 18:42 UTC (permalink / raw) To: Vishal Chourasia; +Cc: kvm Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [Medium] Early return on pending signals bypasses the vCPU exit tracepoint -- commit 902f7514807a6e7be76bed1f71c7055f873978b8 Author: Vishal Chourasia <vishalc@linux.ibm.com> KVM: powerpc: Use generic xfer to guest work function This commit replaces open-coded need_resched() and cond_resched() checks with the generic xfer_to_guest_mode_handle_work() infrastructure in powerpc KVM run loops to properly handle deferred task work like CFS bandwidth throttling. > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 61dbeea317f3b..a1b2077561bb5 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c [ ... ] > @@ -4899,8 +4923,21 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, > } > } > > - if (need_resched()) > - cond_resched(); > + /* > + * Run pending work before (re-)entering the guest, most importantly > + * task_work queued via TWA_RESUME (e.g. the deferred CFS bandwidth > + * throttle, which only sets TIF_NOTIFY_RESUME). Without this a CPU-bound > + * vCPU that keeps returning RESUME_GUEST never reaches an exit-to-user > + * point, so the throttle is never enforced and the task runs far beyond > + * its quota. The helper also handles reschedule and signals, replacing > + * the cond_resched() that was here. It may schedule(), so it runs before > + * preemption and IRQs are disabled, with no vcore/KVM locks held. This > + * is the per-reentry site shared by the bare-metal and pseries (nested) > + * paths, so both are covered. > + */ > + r = kvm_xfer_to_guest_mode_handle_work(vcpu); > + if (r) /* -EINTR: signal pending, exit to userspace (KVM_EXIT_INTR) */ > + return r; [Severity: Medium] When kvm_xfer_to_guest_mode_handle_work() returns a pending signal, this direct return bypasses the done: label at the end of the function. Since trace_kvmppc_run_vcpu_enter() is executed unconditionally at the start of kvmhv_run_single_vcpu(), does this early return cause unbalanced tracepoints on signal-driven userspace exits? This might break performance monitoring tools that rely on symmetric enter/exit events. -- Sashiko AI review · https://sashiko.dev/#/patchset/20260701183030.3610451-2-vishalc@linux.ibm.com?part=1 ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v2 1/1] KVM: powerpc: Use generic xfer to guest work function 2026-07-01 18:30 ` [PATCH v2 1/1] " Vishal Chourasia 2026-07-01 18:42 ` sashiko-bot @ 2026-07-02 7:47 ` Shrikanth Hegde 1 sibling, 0 replies; 4+ messages in thread From: Shrikanth Hegde @ 2026-07-02 7:47 UTC (permalink / raw) To: Vishal Chourasia, maddy, Sebastian Andrzej Siewior Cc: npiggin, mpe, chleroy, amachhiw, vaibhav, harshpb, gautam, linuxppc-dev, kvm, linux-kernel On 7/2/26 12:00 AM, Vishal Chourasia wrote: > Since commit 2cd571245b43 ("sched/fair: Add related data structure for > task based throttle") in v6.18, CFS bandwidth throttling no longer > dequeues a task directly; it queues task_work via TWA_RESUME and sets > TIF_NOTIFY_RESUME, relying on that work running before the task returns > to guest/user mode. The powerpc KVM run loops only checked for reschedule > and signals, never TIF_NOTIFY_RESUME, so the deferred throttle never ran > while a vCPU stayed in the run loop: a CPU-bound guest that rarely exits > to userspace ran far past its cpu.max quota and then appeared frozen for > minutes while the accrued throttle debt was repaid. > > Use the generic infrastructure to check for and handle pending work > before transitioning into guest mode, replacing the open-coded > need_resched() and cond_resched() checks in the Book3S HV run loops and > in the common kvmppc_prepare_to_enter() used by the Book3S PR and BookE > run loops. The redundant signal_pending() recheck (and its sigpend label) > in kvmhv_run_single_vcpu() is also dropped, as > xfer_to_guest_mode_work_pending() is a superset of it. > > This picks up handling for TIF_NOTIFY_RESUME, which was previously > ignored, meaning task work will now be correctly handled on every > guest re-entry. > > In kvmppc_prepare_to_enter() the generic helper accounts the signal exit > (vcpu->stat.signal_exits and KVM_EXIT_INTR) but does not set the exit > type, so kvmppc_set_exit_type(SIGNAL_EXITS) is retained on the signal > path to preserve the E500 CONFIG_KVM_EXIT_TIMING histogram; it is a no-op > otherwise. > With VIRT_XFER_TO_GUEST_WORK support, you can take the HAVE_POSIX_CPU_TIMERS_TASK_WORK patch too? https://lore.kernel.org/all/20250421102837.78515-3-sshegde@linux.ibm.com/ That would help in adding some of those remaining patches for PREEMPT_RT on powernv system. the remaining ones are straighforward or already merged in some form. +Sebastian. > Signed-off-by: Vishal Chourasia <vishalc@linux.ibm.com> > --- > arch/powerpc/kvm/Kconfig | 1 + > arch/powerpc/kvm/book3s_hv.c | 64 ++++++++++++++++++++++++++++-------- > arch/powerpc/kvm/powerpc.c | 34 ++++++++++++++----- > 3 files changed, 77 insertions(+), 22 deletions(-) > > diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig > index 9a0d1c1aca6c..b6bc2fc86dca 100644 > --- a/arch/powerpc/kvm/Kconfig > +++ b/arch/powerpc/kvm/Kconfig > @@ -22,6 +22,7 @@ config KVM > select KVM_COMMON > select KVM_VFIO > select HAVE_KVM_IRQ_BYPASS > + select VIRT_XFER_TO_GUEST_WORK > > config KVM_BOOK3S_HANDLER > bool > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c > index 61dbeea317f3..a1b2077561bb 100644 > --- a/arch/powerpc/kvm/book3s_hv.c > +++ b/arch/powerpc/kvm/book3s_hv.c > @@ -3850,10 +3850,20 @@ static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) > * and return without going into the guest(s). > * If the mmu_ready flag has been cleared, don't go into the > * guest because that means a HPT resize operation is in progress. > + * > + * xfer_to_guest_mode_work_pending() is the IRQs-disabled recheck for > + * pending guest-mode work (reschedule, signals, and TIF_NOTIFY_RESUME > + * task_work such as the deferred CFS throttle). It is the pre-POWER9 > + * analog of the final gate in kvmhv_run_single_vcpu(), and a superset > + * of the old need_resched() check: it catches work that raced in after > + * the drain in kvmppc_run_vcpu(), so a CPU-bound vCPU is throttled here > + * instead of running one more guest dispatch past its quota. IRQs are > + * hard-disabled just above, so the non-__ variant (which asserts that) > + * is the correct one. IMO, These are good to go in a changelog. But not for comments. Could you trim the comments not to describe what xfer_to_guest_mode_work_pending does? Similar comment for other big fat comments. > */ > local_irq_disable(); > hard_irq_disable(); > - if (lazy_irq_pending() || need_resched() || > + if (lazy_irq_pending() || xfer_to_guest_mode_work_pending() || > recheck_signals_and_mmu(&core_info)) { > local_irq_enable(); > vc->vcore_state = VCORE_INACTIVE; > @@ -4824,10 +4834,24 @@ static int kvmppc_run_vcpu(struct kvm_vcpu *vcpu) > vc->runner = vcpu; > if (n_ceded == vc->n_runnable) { > kvmppc_vcore_blocked(vc); > - } else if (need_resched()) { > + } else if (__xfer_to_guest_mode_work_pending()) { > kvmppc_vcore_preempt(vc); > - /* Let something else run */ > - cond_resched_lock(&vc->lock); > + /* > + * Let something else run, and run pending guest-mode > + * work (reschedule, and TIF_NOTIFY_RESUME task_work such > + * as the deferred CFS throttle) before we would re-enter > + * the guest, so a CPU-bound vCPU is actually throttled > + * here instead of running past its quota. This is a > + * superset of the old need_resched() check. Use the raw > + * helper, not the kvm_ wrapper: signals (KVM_EXIT_INTR > + * and the signal_exits stat) are accounted by this path's > + * existing handling below, so going through the wrapper > + * here would double-count them. The helper may schedule(), > + * so the vcore lock is dropped around it. > + */ reduce this blob. > + spin_unlock(&vc->lock); > + xfer_to_guest_mode_handle_work(); > + spin_lock(&vc->lock); > if (vc->vcore_state == VCORE_PREEMPT) > kvmppc_vcore_end_preempt(vc); > } else { > @@ -4899,8 +4923,21 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, > } > } > > - if (need_resched()) > - cond_resched(); > + /* > + * Run pending work before (re-)entering the guest, most importantly > + * task_work queued via TWA_RESUME (e.g. the deferred CFS bandwidth > + * throttle, which only sets TIF_NOTIFY_RESUME). Without this a CPU-bound > + * vCPU that keeps returning RESUME_GUEST never reaches an exit-to-user > + * point, so the throttle is never enforced and the task runs far beyond > + * its quota. The helper also handles reschedule and signals, replacing > + * the cond_resched() that was here. It may schedule(), so it runs before > + * preemption and IRQs are disabled, with no vcore/KVM locks held. This > + * is the per-reentry site shared by the bare-metal and pseries (nested) > + * paths, so both are covered. > + */ reduce this blob. > + r = kvm_xfer_to_guest_mode_handle_work(vcpu); > + if (r) /* -EINTR: signal pending, exit to userspace (KVM_EXIT_INTR) */ > + return r; > > kvmppc_update_vpas(vcpu); > > @@ -4914,9 +4951,14 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, > > vcpu->arch.state = KVMPPC_VCPU_RUNNABLE; > > - if (signal_pending(current)) > - goto sigpend; > - if (need_resched() || !kvm->arch.mmu_ready) > + /* > + * Final IRQs-disabled check for pending guest-mode work or an MMU that > + * is not ready. IRQs are disabled here, so bail to the outer loop, > + * which re-enters and handles the pending work via > + * kvm_xfer_to_guest_mode_handle_work() above (exiting with -EINTR on a > + * signal). > + */ > + if (xfer_to_guest_mode_work_pending() || !kvm->arch.mmu_ready) > goto out; > > vcpu->cpu = pcpu; > @@ -5068,10 +5110,6 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit, > > return vcpu->arch.ret; > > - sigpend: > - vcpu->stat.signal_exits++; > - run->exit_reason = KVM_EXIT_INTR; > - vcpu->arch.ret = -EINTR; > out: > vcpu->cpu = -1; > vcpu->arch.thread_cpu = -1; > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 00302399fc37..ff1a9a8de5e0 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -84,20 +84,36 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu) > hard_irq_disable(); > > while (true) { > - if (need_resched()) { > + if (__xfer_to_guest_mode_work_pending()) { > + /* > + * Handle pending guest-mode work before entering the > + * guest: reschedule, signals, and TIF_NOTIFY_RESUME > + * task_work such as the deferred CFS bandwidth throttle. > + * The helper must run with interrupts enabled and may > + * schedule(). This is a superset of the open-coded > + * need_resched()/signal_pending() checks it replaces. On > + * a pending signal it returns -EINTR after > + * kvm_xfer_to_guest_mode_handle_work() has set > + * run->exit_reason (KVM_EXIT_INTR) and bumped > + * vcpu->stat.signal_exits, so just return to userspace. > + */ Make this comment precise instead of describing __xfer_to_guest_mode_work_pending > local_irq_enable(); > - cond_resched(); > + r = kvm_xfer_to_guest_mode_handle_work(vcpu); > hard_irq_disable(); > + if (r) { > + /* > + * -EINTR: the generic helper does not set the > + * exit type, so record it here for the E500 > + * CONFIG_KVM_EXIT_TIMING histogram (a no-op > + * otherwise). Only the exit type is set; > + * signal_exits was already accounted above. > + */ > + kvmppc_set_exit_type(vcpu, SIGNAL_EXITS); > + break; > + } > continue; > } > > - if (signal_pending(current)) { > - kvmppc_account_exit(vcpu, SIGNAL_EXITS); > - vcpu->run->exit_reason = KVM_EXIT_INTR; > - r = -EINTR; > - break; > - } > - > vcpu->mode = IN_GUEST_MODE; > > /* ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-07-02 7:48 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-07-01 18:30 [PATCH v2 0/1] KVM: powerpc: Use generic xfer to guest work function Vishal Chourasia 2026-07-01 18:30 ` [PATCH v2 1/1] " Vishal Chourasia 2026-07-01 18:42 ` sashiko-bot 2026-07-02 7:47 ` Shrikanth Hegde
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox