From: Raghavendra Rao Ananta <rananta@google.com>
To: Oliver Upton <oupton@kernel.org>, Marc Zyngier <maz@kernel.org>
Cc: Raghavendra Rao Anata <rananta@google.com>,
Mingwei Zhang <mizhang@google.com>,
linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Subject: [PATCH 0/3] KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables
Date: Thu, 13 Nov 2025 05:24:49 +0000 [thread overview]
Message-ID: <20251113052452.975081-1-rananta@google.com> (raw)
Hello,
When destroying a fully-mapped 128G VM abruptly, the following scheduler
warning is observed:
sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without schedule
CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV #3 NONE
Tainted: [O]=OOT_MODULE
Call trace:
show_stack+0x20/0x38 (C)
dump_stack_lvl+0x3c/0xb8
dump_stack+0x18/0x30
resched_latency_warn+0x7c/0x88
sched_tick+0x1c4/0x268
update_process_times+0xa8/0xd8
tick_nohz_handler+0xc8/0x168
__hrtimer_run_queues+0x11c/0x338
hrtimer_interrupt+0x104/0x308
arch_timer_handler_phys+0x40/0x58
handle_percpu_devid_irq+0x8c/0x1b0
generic_handle_domain_irq+0x48/0x78
gic_handle_irq+0x1b8/0x408
call_on_irq_stack+0x24/0x30
do_interrupt_handler+0x54/0x78
el1_interrupt+0x44/0x88
el1h_64_irq_handler+0x18/0x28
el1h_64_irq+0x84/0x88
stage2_free_walker+0x30/0xa0 (P)
__kvm_pgtable_walk+0x11c/0x258
__kvm_pgtable_walk+0x180/0x258
__kvm_pgtable_walk+0x180/0x258
__kvm_pgtable_walk+0x180/0x258
kvm_pgtable_walk+0xc4/0x140
kvm_pgtable_stage2_destroy+0x5c/0xf0
kvm_free_stage2_pgd+0x6c/0xe8
kvm_uninit_stage2_mmu+0x24/0x48
kvm_arch_flush_shadow_all+0x80/0xa0
kvm_mmu_notifier_release+0x38/0x78
__mmu_notifier_release+0x15c/0x250
exit_mmap+0x68/0x400
__mmput+0x38/0x1c8
mmput+0x30/0x68
exit_mm+0xd4/0x198
do_exit+0x1a4/0xb00
do_group_exit+0x8c/0x120
get_signal+0x6d4/0x778
do_signal+0x90/0x718
do_notify_resume+0x70/0x170
el0_svc+0x74/0xd8
el0t_64_sync_handler+0x60/0xc8
el0t_64_sync+0x1b0/0x1b8
The host kernel was running with CONFIG_PREEMPT_NONE=y, and since the
page-table walk operation takes considerable amount of time for a VM
with such a large number of PTEs mapped, the warning is seen.
To mitigate this, split the walk into smaller ranges, by checking for
cond_resched() between each range. Since the path is executed during
VM destruction, after the page-table structure is unlinked from the
KVM MMU, relying on cond_resched_rwlock_write() isn't necessary.
Patch-1 kills the assumption that the page-table hierarchy under the
table is free (in stage2_free_walker()). Instead, drop and clear the
references only on empty tables.
Patch-2 splits the kvm_pgtable_stage2_destroy() function into separate
'walk' and 'free PGD' parts.
Patch-3 leverages the split and performs the walk periodically over
smaller ranges and calls cond_resched() between them.
The series was originally posted and merged [1], but was later reverted
due to syzkaller catching a UAF bug [2]. This series fixes the issue, and
the original need_resched warning is addressed.
[1]: https://lore.kernel.org/all/175582091313.1266576.4329884314263043118.b4-ty@linux.dev/
[2]: https://lore.kernel.org/all/20250910180930.3679473-1-oliver.upton@linux.dev/
Oliver Upton (1):
KVM: arm64: Only drop references on empty tables in stage2_free_walker
Raghavendra Rao Ananta (2):
KVM: arm64: Split kvm_pgtable_stage2_destroy()
KVM: arm64: Reschedule as needed when destroying the stage-2
page-tables
arch/arm64/include/asm/kvm_pgtable.h | 30 +++++++++++++
arch/arm64/include/asm/kvm_pkvm.h | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 63 +++++++++++++++++++++++-----
arch/arm64/kvm/mmu.c | 36 +++++++++++++++-
arch/arm64/kvm/pkvm.c | 11 ++++-
5 files changed, 129 insertions(+), 15 deletions(-)
base-commit: dcb6fa37fd7bc9c3d2b066329b0d27dedf8becaa
--
2.51.2.1041.gc1ab5b90ca-goog
next reply other threads:[~2025-11-13 5:25 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-13 5:24 Raghavendra Rao Ananta [this message]
2025-11-13 5:24 ` [PATCH 1/3] KVM: arm64: Only drop references on empty tables in stage2_free_walker Raghavendra Rao Ananta
2025-11-13 5:24 ` [PATCH 2/3] KVM: arm64: Split kvm_pgtable_stage2_destroy() Raghavendra Rao Ananta
2025-11-13 5:24 ` [PATCH 3/3] KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables Raghavendra Rao Ananta
2025-11-19 22:35 ` [PATCH 0/3] " Oliver Upton
2026-01-28 16:47 ` Marc Zyngier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251113052452.975081-1-rananta@google.com \
--to=rananta@google.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mizhang@google.com \
--cc=oupton@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox