[PATCH 0/1] KVM: arm64: Make kvm_s2_fault_pin

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1] KVM: arm64: Make kvm_s2_fault_pin_pfn() fault-in interruptible
@ 2026-06-08 10:43 Jia He
  2026-06-08 10:43 ` [PATCH 1/1] " Jia He
  0 siblings, 1 reply; 3+ messages in thread
From: Jia He @ 2026-06-08 10:43 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, Jia He

Hi,

I hit this while looking at long vCPU hung-task stalls with nested
virtualization enabled.

The vCPU was blocked in the fault-in path on a userspace folio which was
under migration:

  __kvm_faultin_pfn()
    -> get_user_pages_unlocked()
      -> do_swap_page()              /* PTE is a migration entry */
        -> migration_entry_wait()
           set_current_state(TASK_UNINTERRUPTIBLE);
           io_schedule();            /* no timeout */

At this point the vCPU cannot observe a pending signal. If userspace is
trying to tear down the VM, the vCPU thread can still remain stuck until
migration finishes and the folio lock is released.

Nested virtualization makes this much easier to hit. The migration path
can hold the folio lock while the MMU notifier runs kvm_nested_s2_unmap(),
and that can be expensive today since it may walk all active shadow
stage-2s. So the long delay is really caused by the unmap/migration side;
the vCPU is just waiting for it to finish.

This patch does not try to make that unmap path cheaper. That is a
separate issue, and range-aware shadow stage-2 unmap work, such as
Wei-Lin Chang's "KVM: arm64: nv: Avoid full shadow s2 unmap" series,
is aimed at that problem.

What this patch does is narrower: use FOLL_INTERRUPTIBLE when faulting in
the userspace page for the arm64 stage-2 fault path. If the vCPU thread
has a pending signal, GUP can return and KVM can leave KVM_RUN through the
normal KVM_EXIT_INTR path. If there is no pending signal, the behaviour is
unchanged.

Jia He (1):
  KVM: arm64: Make kvm_s2_fault_pin_pfn() fault-in interruptible

 arch/arm64/kvm/mmu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] KVM: arm64: Make kvm_s2_fault_pin_pfn() fault-in interruptible
  2026-06-08 10:43 [PATCH 0/1] KVM: arm64: Make kvm_s2_fault_pin_pfn() fault-in interruptible Jia He
@ 2026-06-08 10:43 ` Jia He
  2026-06-08 15:28   ` Marc Zyngier
  0 siblings, 1 reply; 3+ messages in thread
From: Jia He @ 2026-06-08 10:43 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: Joey Gouly, Steffen Eiden, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, linux-arm-kernel, kvmarm,
	linux-kernel, Jia He

arm64 KVM faults guest memory into the host in kvm_s2_fault_pin_pfn() via
__kvm_faultin_pfn(). Today this request is made non-interruptible, so if
the host fault-in path blocks for a long time, a vCPU thread that already
has a pending signal cannot leave the fault-in path until GUP eventually
completes.

This is particularly painful during VM teardown, where userspace may
signal vCPU threads while they are blocked faulting in guest memory. In
that case there is no benefit in continuing to wait for the fault to
complete; the vCPU should return to userspace and let the pending signal
be handled.

Ask the generic KVM fault-in helper to use FOLL_INTERRUPTIBLE. When GUP
reports a pending signal it returns KVM_PFN_ERR_SIGPENDING; handle it by
calling kvm_handle_signal_exit() and returning -EINTR. This matches the
behaviour expected by the generic KVM fault-in path and mirrors the
signal-exit handling already done by the arm64 run loop, which sets
run->exit_reason = KVM_EXIT_INTR before returning to userspace. It is
also consistent with architectures such as x86 that already allow the
fault-in to be interrupted by pending signals.

The interrupted fault does not install a partial stage-2 mapping: the
-EINTR is returned before any mapping is created, so the fault is simply
retried on a subsequent vCPU entry once userspace re-enters KVM_RUN. The
only observable effect in the absence of a pending signal is none; this
does not make ordinary stage-2 faults abortable.

Signed-off-by: Jia He <justin.he@arm.com>
---
 arch/arm64/kvm/mmu.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 4da9281312eb..dfb779e6d792 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1872,19 +1872,27 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
 				struct kvm_s2_fault_vma_info *s2vi)
 {
 	int ret;
+	unsigned int flags = FOLL_INTERRUPTIBLE;

 	ret = kvm_s2_fault_get_vma_info(s2fd, s2vi);
 	if (ret)
 		return ret;

+	if (kvm_is_write_fault(s2fd->vcpu))
+		flags |= FOLL_WRITE;
+
 	s2vi->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
-				      kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
+				      flags,
 				      &s2vi->map_writable, &s2vi->page);
 	if (unlikely(is_error_noslot_pfn(s2vi->pfn))) {
 		if (s2vi->pfn == KVM_PFN_ERR_HWPOISON) {
 			kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
 			return 0;
 		}
+		if (is_sigpending_pfn(s2vi->pfn)) {
+			kvm_handle_signal_exit(s2fd->vcpu);
+			return -EINTR;
+		}
 		return -EFAULT;
 	}

-- 
2.34.1

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] KVM: arm64: Make kvm_s2_fault_pin_pfn() fault-in interruptible
  2026-06-08 10:43 ` [PATCH 1/1] " Jia He
@ 2026-06-08 15:28   ` Marc Zyngier
  0 siblings, 0 replies; 3+ messages in thread
From: Marc Zyngier @ 2026-06-08 15:28 UTC (permalink / raw)
  To: Jia He
  Cc: Oliver Upton, Joey Gouly, Steffen Eiden, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, linux-arm-kernel,
	kvmarm, linux-kernel

On Mon, 08 Jun 2026 11:43:36 +0100,
Jia He <justin.he@arm.com> wrote:
> 
> arm64 KVM faults guest memory into the host in kvm_s2_fault_pin_pfn() via
> __kvm_faultin_pfn(). Today this request is made non-interruptible, so if
> the host fault-in path blocks for a long time, a vCPU thread that already
> has a pending signal cannot leave the fault-in path until GUP eventually
> completes.
> 
> This is particularly painful during VM teardown, where userspace may
> signal vCPU threads while they are blocked faulting in guest memory. In
> that case there is no benefit in continuing to wait for the fault to
> complete; the vCPU should return to userspace and let the pending signal
> be handled.
> 
> Ask the generic KVM fault-in helper to use FOLL_INTERRUPTIBLE. When GUP
> reports a pending signal it returns KVM_PFN_ERR_SIGPENDING; handle it by
> calling kvm_handle_signal_exit() and returning -EINTR. This matches the
> behaviour expected by the generic KVM fault-in path and mirrors the
> signal-exit handling already done by the arm64 run loop, which sets
> run->exit_reason = KVM_EXIT_INTR before returning to userspace. It is
> also consistent with architectures such as x86 that already allow the
> fault-in to be interrupted by pending signals.

Only x86, AFAICT. s390 handles signals, but doesn't set GUP as
interruptible.

> 
> The interrupted fault does not install a partial stage-2 mapping: the
> -EINTR is returned before any mapping is created, so the fault is simply
> retried on a subsequent vCPU entry once userspace re-enters KVM_RUN. The
> only observable effect in the absence of a pending signal is none; this

This sentence reads bizarrely. Do you mean to say "there is no
observable effect in the absence of a pending signal"?

> does not make ordinary stage-2 faults abortable.
> 
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  arch/arm64/kvm/mmu.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 4da9281312eb..dfb779e6d792 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1872,19 +1872,27 @@ static int kvm_s2_fault_pin_pfn(const struct kvm_s2_fault_desc *s2fd,
>  				struct kvm_s2_fault_vma_info *s2vi)
>  {
>  	int ret;
> +	unsigned int flags = FOLL_INTERRUPTIBLE;
>  
>  	ret = kvm_s2_fault_get_vma_info(s2fd, s2vi);
>  	if (ret)
>  		return ret;
>  
> +	if (kvm_is_write_fault(s2fd->vcpu))
> +		flags |= FOLL_WRITE;
> +

nit: you might as well keep the assignment and the update together.

>  	s2vi->pfn = __kvm_faultin_pfn(s2fd->memslot, get_canonical_gfn(s2fd, s2vi),
> -				      kvm_is_write_fault(s2fd->vcpu) ? FOLL_WRITE : 0,
> +				      flags,
>  				      &s2vi->map_writable, &s2vi->page);
>  	if (unlikely(is_error_noslot_pfn(s2vi->pfn))) {
>  		if (s2vi->pfn == KVM_PFN_ERR_HWPOISON) {
>  			kvm_send_hwpoison_signal(s2fd->hva, __ffs(s2vi->vma_pagesize));
>  			return 0;
>  		}
> +		if (is_sigpending_pfn(s2vi->pfn)) {
> +			kvm_handle_signal_exit(s2fd->vcpu);
> +			return -EINTR;
> +		}
>  		return -EFAULT;
>  	}
>  

The VNCR handling code also uses __kvm_faultin_pfn(). Why isn't it
similarly updated, given that you are specifically singling out NV as
an area of concern?

Similarly. pkvm_mem_abort() is using FOLL* flags and could benefit
from the same optimisation.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-08 15:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-08 10:43 [PATCH 0/1] KVM: arm64: Make kvm_s2_fault_pin_pfn() fault-in interruptible Jia He
2026-06-08 10:43 ` [PATCH 1/1] " Jia He
2026-06-08 15:28   ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox