From: Will Deacon <will@kernel.org>
To: Quentin Perret <qperret@google.com>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
Marc Zyngier <maz@kernel.org>, Oliver Upton <oupton@kernel.org>,
Joey Gouly <joey.gouly@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Zenghui Yu <yuzenghui@huawei.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Fuad Tabba <tabba@google.com>,
Vincent Donnefort <vdonnefort@google.com>,
Mostafa Saleh <smostafa@google.com>
Subject: Re: [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers
Date: Fri, 9 Jan 2026 17:31:20 +0000 [thread overview]
Message-ID: <aWE7aMOqQUUAyi3X@willie-the-truck> (raw)
In-Reply-To: <aWERVxw-57pzi6ko@willie-the-truck>
WARNING: long and boring reply, but I might find this helpful in future
so I'm sending it anyway. The short answer is, you're right and this
patch is busted :)
On Fri, Jan 09, 2026 at 02:31:51PM +0000, Will Deacon wrote:
> On Tue, Jan 06, 2026 at 02:32:12PM +0000, Quentin Perret wrote:
> > On Monday 05 Jan 2026 at 15:49:10 (+0000), Will Deacon wrote:
> > > The MMU notifiers are registered by kvm_init_mmu_notifier() only after
> > > kvm_arch_init_vm() has returned successfully. Since the latter function
> > > initialises the 'kvm->arch.mmu.pgt' pointer (and allocates the VM handle
> > > when pKVM is enabled), the initialisation checks in the MMU notifiers
> > > are not required.
> >
> > It took me a while to remember, but I think these checks are needed for
> > the free path rather than init. In particular, the doc for
> > mmu_notifier_ops::release() (from which we free the pgt) says that it
> > "can run concurrently with other mmu notifier" (see mmu_notifier.h), which
> > is fun.
> >
> > Had you considered that path? If so, probably worth expanding in the
> > commit description why this is safe?
>
>
> Urgh....
>
> Let me get back to you on that :)
Ok, so the 'pgt' is set to NULL by kvm_free_stage2_pgd() with
'kvm->mmu_lock' held for write. There are three notifier callbacks that
I'm changing not to check for a NULL pgt:
1. kvm_unmap_gfn_range()
=> Runs with the MMU lock held for write. See how
kvm_mmu_notifier_invalidate_range_start() calls
kvm_handle_hva_range() with '.lockless' unset (i.e. false) in
'hva_range'
2. kvm_age_gfn()
=> Takes the lock unless KVM_MMU_LOCKLESS_AGING, which isn't
selected for arm64
3. kvm_test_age_gfn()
=> Same as above.
so I think that means we're serialised with teardown.
Looking at it a different way, suppose that the notifiers *could* run
concurrently with kvm_free_stage2_pgd(). In that case, the check of the
pgt wouldn't buy us anything because the pgt could be cleared (and the
page-table freed) right after we checked that it wasn't NULL but before
we tried touching the page-table.
The next question is whether we can simply run _after_ the page-table
teardown?
For the usual teardown case via exit_mmap(), it looks pretty hard to
trigger since the release notifier is only called once 'mm_users' hits
zero and the 'mm' has been NULLified.
The more obvious case is when the release notifier is run from
mmu_notifier_unregister() by kvm_destroy_vm(). In this case, there's a
synchronize_srcu() but only _after_ calling ->release() and so there is
a small window where another notifier could run into the NULL pgt.
I hacked in an mdelay(5000) between calling ->release and removing
the MMU notifier subscription and then bodged kvmtool to use
MAP_POPULATE for guest memory and to close the VM fd after creating the
memslots. Sure enough, that was enough to trigger the crash (see below).
So I'll rework this patch to preserve the NULL checks on the pgt but
I'll keep the part moving the handle check out of
__pkvm_pgtable_stage2_unmap(), as the handle is cleared by
kvm_arch_destroy_vm() after the notifiers really have gone away.
Cheers,
Will
--->8
[ 1607.241011] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[ 1607.241145] Mem abort info:
[ 1607.241395] ESR = 0x0000000096000004
[ 1607.241441] EC = 0x25: DABT (current EL), IL = 32 bits
[ 1607.241597] SET = 0, FnV = 0
[ 1607.241618] EA = 0, S1PTW = 0
[ 1607.241643] FSC = 0x04: level 0 translation fault
[ 1607.241686] Data abort info:
[ 1607.241711] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 1607.241733] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 1607.241759] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 1607.241883] user pgtable: 4k pages, 52-bit VAs, pgdp=00000000400ad400
[ 1607.241915] [0000000000000000] pgd=080000004550e403, p4d=0000000000000000
[ 1607.242828] Internal error: Oops: 0000000096000004 [#1] SMP
[ 1607.246029] Modules linked in:
[ 1607.246394] CPU: 0 UID: 0 PID: 51 Comm: kswapd0 Not tainted 6.19.0-rc4-00030-g93ba3d7c3693-dirty #3 PREEMPT
[ 1607.246823] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
[ 1607.247213] pstate: 81402009 (Nzcv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1607.247584] pc : kvm_pgtable_walk+0x54/0x150
[ 1607.247988] lr : kvm_pgtable_stage2_test_clear_young+0x48/0x80
[ 1607.248434] sp : ffff800080463510
[ 1607.248717] x29: ffff800080463540 x28: fff0000004512488 x27: 0000000000000001
[ 1607.249129] x26: fff0000001241d50 x25: 0000ffff869fffff x24: 0000ffff86800000
[ 1607.249455] x23: 0000000082400000 x22: 0000000082200000 x21: 0000000000000001
[ 1607.249796] x20: 0000000000000000 x19: ffff800080463588 x18: ffff9bea62668cc0
[ 1607.250154] x17: 001ada185af13066 x16: 0000000000000028 x15: ffffc1ffc0000000
[ 1607.250500] x14: 0010000000000000 x13: 0000ffff86800000 x12: 0000000000080000
[ 1607.250827] x11: 0000000000080000 x10: 0000ffff84600000 x9 : 0000000000000001
[ 1607.251186] x8 : 0000000082400fff x7 : 0c8a13c0ffff8660 x6 : 0c8a43e0ffff867f
[ 1607.251731] x5 : fff00000070fa1c0 x4 : 0000000000000000 x3 : ffff800080463588
[ 1607.252312] x2 : 0000000000200000 x1 : 0000000082200000 x0 : 0000000000000000
[ 1607.252719] Call trace:
[ 1607.252909] kvm_pgtable_walk+0x54/0x150 (P)
[ 1607.253133] kvm_pgtable_stage2_test_clear_young+0x48/0x80
[ 1607.253391] kvm_age_gfn+0x44/0x88
[ 1607.253580] kvm_mmu_notifier_clear_flush_young+0xc4/0x240
[ 1607.253842] __mmu_notifier_clear_flush_young+0xa0/0xb0
[ 1607.254096] folio_referenced_one+0x260/0x3b8
[ 1607.254312] rmap_walk_anon+0x148/0x1d4
[ 1607.254501] folio_referenced+0x118/0x200
[ 1607.254691] shrink_folio_list+0x3c8/0x1100
[ 1607.254896] shrink_lruvec+0x720/0xbec
[ 1607.255122] shrink_node+0x4cc/0x8e4
[ 1607.255311] balance_pgdat+0x488/0x77c
[ 1607.255521] kswapd+0x260/0x2e0
[ 1607.255691] kthread+0x1bc/0x200
[ 1607.255898] ret_from_fork+0x10/0x20
[ 1607.256326] Code: 9274cd17 a901dff6 36180049 9402bc60 (b9400288)
[ 1607.257126] ---[ end trace 0000000000000000 ]---
next prev parent reply other threads:[~2026-01-09 17:31 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-05 15:49 [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-01-05 15:49 ` [PATCH 01/30] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
2026-01-06 14:33 ` Quentin Perret
2026-01-10 10:22 ` (subset) " Oliver Upton
2026-01-05 15:49 ` [PATCH 02/30] KVM: arm64: Remove redundant 'pgt' pointer checks from MMU notifiers Will Deacon
2026-01-06 14:32 ` Quentin Perret
2026-01-09 14:31 ` Will Deacon
2026-01-09 17:31 ` Will Deacon [this message]
2026-01-05 15:49 ` [PATCH 03/30] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
2026-01-05 15:49 ` [PATCH 04/30] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
2026-01-05 15:49 ` [PATCH 05/30] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
2026-01-05 15:49 ` [PATCH 06/30] KVM: arm64: Remove pointless is_protected_kvm_enabled() checks from hyp Will Deacon
2026-01-06 14:40 ` Quentin Perret
2026-01-09 14:23 ` Will Deacon
2026-01-05 15:49 ` [PATCH 07/30] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
2026-01-05 15:49 ` [PATCH 08/30] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
2026-01-05 15:49 ` [PATCH 09/30] KVM: arm64: Split teardown hypercall into two phases Will Deacon
2026-01-05 15:49 ` [PATCH 10/30] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
2026-01-06 14:48 ` Quentin Perret
2026-01-09 14:30 ` Will Deacon
2026-01-09 15:10 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 11/30] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
2026-01-05 15:49 ` [PATCH 12/30] KVM: arm64: Handle aborts from protected VMs Will Deacon
2026-01-05 15:49 ` [PATCH 13/30] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
2026-01-06 16:26 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
2026-01-06 14:59 ` Quentin Perret
2026-01-09 14:35 ` Will Deacon
2026-01-09 14:57 ` Quentin Perret
2026-01-05 15:49 ` [PATCH 15/30] KVM: arm64: Refactor enter_exception64() Will Deacon
2026-01-05 15:49 ` [PATCH 16/30] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
2026-01-05 15:49 ` [PATCH 17/30] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
2026-01-06 15:20 ` Quentin Perret
2026-01-09 18:46 ` Will Deacon
2026-01-17 0:03 ` Will Deacon
2026-01-05 15:49 ` [PATCH 18/30] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
2026-01-05 15:49 ` [PATCH 19/30] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
2026-01-06 16:01 ` Fuad Tabba
2026-01-09 14:42 ` Will Deacon
2026-01-12 9:25 ` Fuad Tabba
2026-01-05 15:49 ` [PATCH 20/30] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
2026-01-06 15:44 ` Quentin Perret
2026-01-09 17:47 ` Will Deacon
2026-01-05 15:49 ` [PATCH 21/30] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
2026-01-05 15:49 ` [PATCH 22/30] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
2026-01-06 15:54 ` Quentin Perret
2026-01-09 14:57 ` Will Deacon
2026-01-09 15:29 ` Quentin Perret
2026-01-09 17:35 ` Will Deacon
2026-01-05 15:49 ` [PATCH 23/30] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
2026-01-06 15:52 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 24/30] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
2026-01-06 15:45 ` Vincent Donnefort
2026-01-09 15:01 ` Will Deacon
2026-01-05 15:49 ` [PATCH 25/30] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
2026-01-06 15:50 ` Vincent Donnefort
2026-01-05 15:49 ` [PATCH 26/30] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
2026-01-05 15:49 ` [PATCH 27/30] KVM: arm64: Add some initial documentation for pKVM Will Deacon
2026-01-06 15:59 ` Vincent Donnefort
2026-01-09 15:04 ` Will Deacon
2026-01-05 15:49 ` [PATCH 28/30] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
2026-01-05 15:49 ` [PATCH 29/30] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
2026-01-05 15:49 ` [PATCH 30/30] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
2026-03-13 15:31 ` [PATCH 00/30] KVM: arm64: Add support for protected guest memory with pKVM Mostafa Saleh
2026-04-20 8:02 ` Pavan Kondeti
2026-04-20 10:00 ` Will Deacon
2026-04-20 11:26 ` Pavan Kondeti
2026-04-21 4:15 ` Pavan Kondeti
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aWE7aMOqQUUAyi3X@willie-the-truck \
--to=will@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=qperret@google.com \
--cc=smostafa@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vdonnefort@google.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox