From: Raghavendra Rao Ananta <rananta@google.com>
To: Oliver Upton <oupton@kernel.org>, Marc Zyngier <maz@kernel.org>
Cc: Raghavendra Rao Anata <rananta@google.com>,
Mingwei Zhang <mizhang@google.com>,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: [PATCH 0/3] KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
Date: Thu, 13 Nov 2025 05:24:49 +0000 [thread overview]
Message-ID: <20251113052452.975081-1-rananta@google.com> (raw)
Hello,
When destroying a fully-mapped 128G VM abruptly, the following scheduler
warning is observed:
sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule
CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE
Tainted: [O]=OOT_MODULE
Call trace:
show_stack+0x20/0x38 (C)
dump_stack_lvl+0x3c/0xb8
dump_stack+0x18/0x30
resched_latency_warn+0x7c/0x88
sched_tick+0x1c4/0x268
update_process_times+0xa8/0xd8
tick_nohz_handler+0xc8/0x168
__hrtimer_run_queues+0x11c/0x338
hrtimer_interrupt+0x104/0x308
arch_timer_handler_phys+0x40/0x58
handle_percpu_devid_irq+0x8c/0x1b0
generic_handle_domain_irq+0x48/0x78
gic_handle_irq+0x1b8/0x408
call_on_irq_stack+0x24/0x30
do_interrupt_handler+0x54/0x78
el1_interrupt+0x44/0x88
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x84/0x88
stage2_free_walker+0x30/0xa0 (P)
__kvm_pgtable_walk+0x11c/0x258
__kvm_pgtable_walk+0x180/0x258
__kvm_pgtable_walk+0x180/0x258
__kvm_pgtable_walk+0x180/0x258
kvm_pgtable_walk+0xc4/0x140
kvm_pgtable_stage2_destroy+0x5c/0xf0
kvm_free_stage2_pgd+0x6c/0xe8
kvm_uninit_stage2_mmu+0x24/0x48
kvm_arch_flush_shadow_all+0x80/0xa0
kvm_mmu_notifier_release+0x38/0x78
__mmu_notifier_release+0x15c/0x250
exit_mmap+0x68/0x400
__mmput+0x38/0x1c8
mmput+0x30/0x68
exit_mm+0xd4/0x198
do_exit+0x1a4/0xb00
do_group_exit+0x8c/0x120
get_signal+0x6d4/0x778
do_signal+0x90/0x718
do_notify_resume+0x70/0x170
el0_svc+0x74/0xd8
el0t_64_sync_handler+0x60/0xc8
el0t_64_sync+0x1b0/0x1b8
The host kernel was running with CONFIG_PREEMPT_NONE=y, and since the
page-table walk operation takes considerable amount of time for a VM
with such a large number of PTEs mapped, the warning is seen.
To mitigate this, split the walk into smaller ranges, by checking for
cond_resched() between each range. Since the path is executed during
VM destruction, after the page-table structure is unlinked from the
KVM MMU, relying on cond_resched_rwlock_write() isn't necessary.
Patch-1 kills the assumption that the page-table hierarchy under the
table is free (in stage2_free_walker()). Instead, drop and clear the
references only on empty tables.
Patch-2 splits the kvm_pgtable_stage2_destroy() function into separate
'walk' and 'free PGD' parts.
Patch-3 leverages the split and performs the walk periodically over
smaller ranges and calls cond_resched() between them.
The series was originally posted and merged [1], but was later reverted
due to syzkaller catching a UAF bug [2]. This series fixes the issue, and
the original need_resched warning is addressed.
[1]: https://lore.kernel.org/all/175582091313.1266576.4329884314263043118.b4-ty@linux.dev/
[2]: https://lore.kernel.org/all/20250910180930.3679473-1-oliver.upton@linux.dev/
Oliver Upton (1):
KVM: arm64: Only drop references on empty tables in stage2_free_walker
Raghavendra Rao Ananta (2):
KVM: arm64: Split kvm_pgtable_stage2_destroy()
KVM: arm64: Reschedule as needed when destroying the stage-2
page-tables
arch/arm64/include/asm/kvm_pgtable.h | 30 +++++++++++++
arch/arm64/include/asm/kvm_pkvm.h | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 63 +++++++++++++++++++++++-----
arch/arm64/kvm/mmu.c | 36 +++++++++++++++-
arch/arm64/kvm/pkvm.c | 11 ++++-
5 files changed, 129 insertions(+), 15 deletions(-)
base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa
--
2.51.2.1041.gc1ab5b90ca-goog
next reply other threads:[~2025-11-13 5:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 5:24 Raghavendra Rao Ananta [this message]
2025-11-13 5:24 ` [PATCH 1/3] KVM: arm64: Only drop references on empty tables in stage2_free_walker Raghavendra Rao Ananta
2025-11-13 5:24 ` [PATCH 2/3] KVM: arm64: Split kvm_pgtable_stage2_destroy() Raghavendra Rao Ananta
2025-11-13 5:24 ` [PATCH 3/3] KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables Raghavendra Rao Ananta
2025-11-19 22:35 ` [PATCH 0/3] " Oliver Upton
2026-01-28 16:47 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251113052452.975081-1-rananta@google.com \
--to=rananta@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mizhang@google.com \
--cc=oupton@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.