From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Will Deacon <will@kernel.org>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
Marc Zyngier <maz@kernel.org>, Oliver Upton <oupton@kernel.org>,
Joey Gouly <joey.gouly@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Zenghui Yu <yuzenghui@huawei.com>,
Catalin Marinas <catalin.marinas@arm.com>,
Quentin Perret <qperret@google.com>,
Fuad Tabba <tabba@google.com>,
Vincent Donnefort <vdonnefort@google.com>,
Mostafa Saleh <smostafa@google.com>
Subject: Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs
Date: Fri, 6 Mar 2026 11:34:37 +0000 [thread overview]
Message-ID: <aaq7zdxyyphiXhRG@raptor> (raw)
In-Reply-To: <aag8edDkKgfTr_hD@willie-the-truck>
Hi Will,
On Wed, Mar 04, 2026 at 02:06:49PM +0000, Will Deacon wrote:
> On Thu, Feb 12, 2026 at 10:37:19AM +0000, Alexandru Elisei wrote:
> > On Mon, Jan 19, 2026 at 12:46:07PM +0000, Will Deacon wrote:
> > > Introduce a new abort handler for resolving stage-2 page faults from
> > > protected VMs by pinning and donating anonymous memory. This is
> > > considerably simpler than the infamous user_mem_abort() as we only have
> > > to deal with translation faults at the pte level.
> > >
> > > Signed-off-by: Will Deacon <will@kernel.org>
> > > ---
> > > arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
> > > 1 file changed, 81 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > > index a23a4b7f108c..b21a5bf3d104 100644
> > > --- a/arch/arm64/kvm/mmu.c
> > > +++ b/arch/arm64/kvm/mmu.c
> > > @@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > return ret != -EAGAIN ? ret : 0;
> > > }
> > >
> > > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > > + struct kvm_memory_slot *memslot, unsigned long hva)
> > > +{
> > > + unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> > > + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> > > + struct mm_struct *mm = current->mm;
> > > + struct kvm *kvm = vcpu->kvm;
> > > + void *hyp_memcache;
> > > + struct page *page;
> > > + int ret;
> > > +
> > > + ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
> > > + if (ret)
> > > + return -ENOMEM;
> > > +
> > > + ret = account_locked_vm(mm, 1, true);
> > > + if (ret)
> > > + return ret;
> > > +
> > > + mmap_read_lock(mm);
> > > + ret = pin_user_pages(hva, 1, flags, &page);
> > > + mmap_read_unlock(mm);
> >
> > If the page is part of a large folio, the entire folio gets pinned here, not
> > just the page returned by pin_user_pages(). Do you reckon that should be
> > considered when calling account_locked_vm()?
>
> I don't _think_ so.
>
> Since we only ask for a single page when we call pin_user_pages(), the
> folio refcount will be adjusted by 1, even for large folios. Trying to
The large folios, **_pincount** is adjusted by 1 with FOLL_LONGTERM. For
non-large folio, the refcount is increased by GUP_PIN_COUNTING_BIAS == 1024
(try_grab_folio() is where the magic happens).
> adjust the accounting based on whether the pinned page forms part of a
> large folio feels error-prone, not least because the migration triggered
> by the longterm pin could actually end up splitting the folio but also
Hmm.. as far as I can tell pin_user_pages() uses MIGRATE_SYNC to migrate folios
not suitable for longterm pinning, and after migration has completed it attemps
to pin the userspace address again.
Also, split_folio() and friends cannot split folio_maybe_dma_pinned_folio(),
according to the comments for the various functions.
> because we'd have to avoid double accounting on subsequent faults to the
> same folio. It also feels fragile if the mm code is able to split
> partially pinned folios in future (like it appears to be able to for
> partially mapped folios).
I'm not sure why mm would want to split a folio_maybe_dma_pinned_folio(). But
I'm far from being a mm expert, so I do understand why relying on this might
feel fragile.
>
> > > + if (ret == -EHWPOISON) {
> > > + kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
> > > + ret = 0;
> > > + goto dec_account;
> > > + } else if (ret != 1) {
> > > + ret = -EFAULT;
> > > + goto dec_account;
> > > + } else if (!folio_test_swapbacked(page_folio(page))) {
> > > + /*
> > > + * We really can't deal with page-cache pages returned by GUP
> > > + * because (a) we may trigger writeback of a page for which we
> > > + * no longer have access and (b) page_mkclean() won't find the
> > > + * stage-2 mapping in the rmap so we can get out-of-whack with
> > > + * the filesystem when marking the page dirty during unpinning
> > > + * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
> > > + * without asking ext4 first")).
> >
> > I've been trying to wrap my head around this. Would you mind providing a few
> > more hints about what the issue is? I'm sure the approach is correct, it's
> > likely just me not being familiar with the code.
>
> The fundamental problem is that unmapping page-cache pages from the host
> stage-2 can confuse filesystems who don't know that either the page is
> now inaccessible (and so may attempt to access it) or that the page can
> be accessed concurrently by the guest without updating the page state.
>
> To fix those issues, we would need to support MMU notifiers for protected
> memory but that would allow the host to mess with the guest stage-2
> page-table, which breaks the security model that we're trying to uphold.
Aha, got it, thanks for the explanation!
Alex
>
> > > @@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > > goto out_unlock;
> > > }
> > >
> > > - VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> > > - !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
> > > + if (kvm_vm_is_protected(vcpu->kvm)) {
> > > + ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
> >
> > I guess the reason this comes after handling an access fault is because you want
> > the WARN_ON() to trigger in pkvm_pgtable_stage2_mkyoung().
>
> Right, we should only ever see translation faults for protected guests
> and that's all that pkvm_mem_abort() is prepared to handle, so we call
> it last.
>
> Will
next prev parent reply other threads:[~2026-03-06 11:34 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-01-19 12:45 ` [PATCH v2 01/35] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
2026-01-19 12:45 ` [PATCH v2 02/35] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
2026-01-19 12:45 ` [PATCH v2 03/35] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
2026-01-19 12:45 ` [PATCH v2 04/35] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
2026-01-19 12:45 ` [PATCH v2 05/35] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
2026-01-19 12:45 ` [PATCH v2 06/35] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
2026-01-19 12:46 ` [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
2026-02-10 14:53 ` Alexandru Elisei
2026-03-03 15:45 ` Will Deacon
2026-03-06 11:33 ` Alexandru Elisei
2026-01-19 12:46 ` [PATCH v2 08/35] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
2026-01-19 12:46 ` [PATCH v2 09/35] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
2026-01-19 12:46 ` [PATCH v2 10/35] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
2026-01-19 12:46 ` [PATCH v2 11/35] KVM: arm64: Split teardown hypercall into two phases Will Deacon
2026-01-19 12:46 ` [PATCH v2 12/35] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
2026-01-19 12:46 ` [PATCH v2 13/35] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
2026-01-19 12:46 ` [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Will Deacon
2026-02-12 10:37 ` Alexandru Elisei
2026-03-04 14:06 ` Will Deacon
2026-03-06 11:34 ` Alexandru Elisei [this message]
2026-03-11 10:24 ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 15/35] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
2026-01-19 12:46 ` [PATCH v2 16/35] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
2026-01-19 12:46 ` [PATCH v2 17/35] KVM: arm64: Refactor enter_exception64() Will Deacon
2026-01-19 12:46 ` [PATCH v2 18/35] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
2026-01-19 12:46 ` [PATCH v2 19/35] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
2026-01-19 12:46 ` [PATCH v2 20/35] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
2026-01-19 12:46 ` [PATCH v2 21/35] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
2026-01-19 12:46 ` [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
2026-01-28 10:28 ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
2026-01-28 10:29 ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
2026-02-12 17:18 ` Alexandru Elisei
2026-03-04 14:08 ` Will Deacon
2026-01-19 12:46 ` [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
2026-02-12 17:22 ` Alexandru Elisei
2026-03-04 14:06 ` Will Deacon
2026-01-19 12:46 ` [PATCH v2 26/35] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
2026-01-19 12:46 ` [PATCH v2 27/35] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
2026-01-19 12:46 ` [PATCH v2 28/35] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
2026-01-19 12:46 ` [PATCH v2 29/35] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
2026-01-19 12:46 ` [PATCH v2 30/35] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
2026-01-19 12:46 ` [PATCH v2 31/35] KVM: arm64: Add some initial documentation for pKVM Will Deacon
2026-01-19 12:46 ` [PATCH v2 32/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
2026-01-19 12:46 ` [PATCH v2 33/35] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
2026-01-19 12:46 ` [PATCH v2 34/35] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
2026-01-19 12:46 ` [PATCH v2 35/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
2026-02-10 18:58 ` [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Trilok Soni
2026-02-10 19:03 ` Fuad Tabba
2026-02-16 10:58 ` Venkata Rao Kakani
2026-02-16 11:00 ` Fuad Tabba
2026-02-17 10:43 ` Venkata Rao Kakani
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaq7zdxyyphiXhRG@raptor \
--to=alexandru.elisei@arm.com \
--cc=catalin.marinas@arm.com \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=maz@kernel.org \
--cc=oupton@kernel.org \
--cc=qperret@google.com \
--cc=smostafa@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vdonnefort@google.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.