Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will@kernel.org>
To: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	Marc Zyngier <maz@kernel.org>, Oliver Upton <oupton@kernel.org>,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Quentin Perret <qperret@google.com>,
	Fuad Tabba <tabba@google.com>,
	Vincent Donnefort <vdonnefort@google.com>,
	Mostafa Saleh <smostafa@google.com>
Subject: Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs
Date: Wed, 4 Mar 2026 14:06:49 +0000	[thread overview]
Message-ID: <aag8edDkKgfTr_hD@willie-the-truck> (raw)
In-Reply-To: <aY2tX6V0pCqwGth5@raptor>

On Thu, Feb 12, 2026 at 10:37:19AM +0000, Alexandru Elisei wrote:
> On Mon, Jan 19, 2026 at 12:46:07PM +0000, Will Deacon wrote:
> > Introduce a new abort handler for resolving stage-2 page faults from
> > protected VMs by pinning and donating anonymous memory. This is
> > considerably simpler than the infamous user_mem_abort() as we only have
> > to deal with translation faults at the pte level.
> > 
> > Signed-off-by: Will Deacon <will@kernel.org>
> > ---
> >  arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 81 insertions(+), 8 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index a23a4b7f108c..b21a5bf3d104 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >  	return ret != -EAGAIN ? ret : 0;
> >  }
> >  
> > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> > +		struct kvm_memory_slot *memslot, unsigned long hva)
> > +{
> > +	unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> > +	struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> > +	struct mm_struct *mm = current->mm;
> > +	struct kvm *kvm = vcpu->kvm;
> > +	void *hyp_memcache;
> > +	struct page *page;
> > +	int ret;
> > +
> > +	ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache);
> > +	if (ret)
> > +		return -ENOMEM;
> > +
> > +	ret = account_locked_vm(mm, 1, true);
> > +	if (ret)
> > +		return ret;
> > +
> > +	mmap_read_lock(mm);
> > +	ret = pin_user_pages(hva, 1, flags, &page);
> > +	mmap_read_unlock(mm);
> 
> If the page is part of a large folio, the entire folio gets pinned here, not
> just the page returned by pin_user_pages(). Do you reckon that should be
> considered when calling account_locked_vm()?

I don't _think_ so.

Since we only ask for a single page when we call pin_user_pages(), the
folio refcount will be adjusted by 1, even for large folios. Trying to
adjust the accounting based on whether the pinned page forms part of a
large folio feels error-prone, not least because the migration triggered
by the longterm pin could actually end up splitting the folio but also
because we'd have to avoid double accounting on subsequent faults to the
same folio. It also feels fragile if the mm code is able to split
partially pinned folios in future (like it appears to be able to for
partially mapped folios).

> > +	if (ret == -EHWPOISON) {
> > +		kvm_send_hwpoison_signal(hva, PAGE_SHIFT);
> > +		ret = 0;
> > +		goto dec_account;
> > +	} else if (ret != 1) {
> > +		ret = -EFAULT;
> > +		goto dec_account;
> > +	} else if (!folio_test_swapbacked(page_folio(page))) {
> > +		/*
> > +		 * We really can't deal with page-cache pages returned by GUP
> > +		 * because (a) we may trigger writeback of a page for which we
> > +		 * no longer have access and (b) page_mkclean() won't find the
> > +		 * stage-2 mapping in the rmap so we can get out-of-whack with
> > +		 * the filesystem when marking the page dirty during unpinning
> > +		 * (see cc5095747edf ("ext4: don't BUG if someone dirty pages
> > +		 * without asking ext4 first")).
> 
> I've been trying to wrap my head around this. Would you mind providing a few
> more hints about what the issue is? I'm sure the approach is correct, it's
> likely just me not being familiar with the code.

The fundamental problem is that unmapping page-cache pages from the host
stage-2 can confuse filesystems who don't know that either the page is
now inaccessible (and so may attempt to access it) or that the page can
be accessed concurrently by the guest without updating the page state.

To fix those issues, we would need to support MMU notifiers for protected
memory but that would allow the host to mess with the guest stage-2
page-table, which breaks the security model that we're trying to uphold.

> > @@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> >  		goto out_unlock;
> >  	}
> >  
> > -	VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) &&
> > -			!write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
> > +	if (kvm_vm_is_protected(vcpu->kvm)) {
> > +		ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva);
> 
> I guess the reason this comes after handling an access fault is because you want
> the WARN_ON() to trigger in pkvm_pgtable_stage2_mkyoung().

Right, we should only ever see translation faults for protected guests
and that's all that pkvm_mem_abort() is prepared to handle, so we call
it last.

Will

next prev parent reply	other threads:[~2026-03-04 14:07 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19 12:45 [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Will Deacon
2026-01-19 12:45 ` [PATCH v2 01/35] KVM: arm64: Invert KVM_PGTABLE_WALK_HANDLE_FAULT to fix pKVM walkers Will Deacon
2026-01-19 12:45 ` [PATCH v2 02/35] KVM: arm64: Don't leak stage-2 page-table if VM fails to init under pKVM Will Deacon
2026-01-19 12:45 ` [PATCH v2 03/35] KVM: arm64: Move handle check into pkvm_pgtable_stage2_destroy_range() Will Deacon
2026-01-19 12:45 ` [PATCH v2 04/35] KVM: arm64: Rename __pkvm_pgtable_stage2_unmap() Will Deacon
2026-01-19 12:45 ` [PATCH v2 05/35] KVM: arm64: Don't advertise unsupported features for protected guests Will Deacon
2026-01-19 12:45 ` [PATCH v2 06/35] KVM: arm64: Expose self-hosted debug regs as RAZ/WI " Will Deacon
2026-01-19 12:46 ` [PATCH v2 07/35] KVM: arm64: Remove is_protected_kvm_enabled() checks from hypercalls Will Deacon
2026-02-10 14:53   ` Alexandru Elisei
2026-03-03 15:45     ` Will Deacon
2026-03-06 11:33       ` Alexandru Elisei
2026-01-19 12:46 ` [PATCH v2 08/35] KVM: arm64: Ignore MMU notifier callbacks for protected VMs Will Deacon
2026-01-19 12:46 ` [PATCH v2 09/35] KVM: arm64: Prevent unsupported memslot operations on " Will Deacon
2026-01-19 12:46 ` [PATCH v2 10/35] KVM: arm64: Ignore -EAGAIN when mapping in pages for the pKVM host Will Deacon
2026-01-19 12:46 ` [PATCH v2 11/35] KVM: arm64: Split teardown hypercall into two phases Will Deacon
2026-01-19 12:46 ` [PATCH v2 12/35] KVM: arm64: Introduce __pkvm_host_donate_guest() Will Deacon
2026-01-19 12:46 ` [PATCH v2 13/35] KVM: arm64: Hook up donation hypercall to pkvm_pgtable_stage2_map() Will Deacon
2026-01-19 12:46 ` [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Will Deacon
2026-02-12 10:37   ` Alexandru Elisei
2026-03-04 14:06     ` Will Deacon [this message]
2026-03-06 11:34       ` Alexandru Elisei
2026-03-11 10:24   ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 15/35] KVM: arm64: Introduce __pkvm_reclaim_dying_guest_page() Will Deacon
2026-01-19 12:46 ` [PATCH v2 16/35] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy() Will Deacon
2026-01-19 12:46 ` [PATCH v2 17/35] KVM: arm64: Refactor enter_exception64() Will Deacon
2026-01-19 12:46 ` [PATCH v2 18/35] KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon
2026-01-19 12:46 ` [PATCH v2 19/35] KVM: arm64: Avoid pointless annotation when mapping host-owned pages Will Deacon
2026-01-19 12:46 ` [PATCH v2 20/35] KVM: arm64: Generalise kvm_pgtable_stage2_set_owner() Will Deacon
2026-01-19 12:46 ` [PATCH v2 21/35] KVM: arm64: Introduce host_stage2_set_owner_metadata_locked() Will Deacon
2026-01-19 12:46 ` [PATCH v2 22/35] KVM: arm64: Change 'pkvm_handle_t' to u16 Will Deacon
2026-01-28 10:28   ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 23/35] KVM: arm64: Annotate guest donations with handle and gfn in host stage-2 Will Deacon
2026-01-28 10:29   ` Fuad Tabba
2026-01-19 12:46 ` [PATCH v2 24/35] KVM: arm64: Introduce hypercall to force reclaim of a protected page Will Deacon
2026-02-12 17:18   ` Alexandru Elisei
2026-03-04 14:08     ` Will Deacon
2026-01-19 12:46 ` [PATCH v2 25/35] KVM: arm64: Reclaim faulting page from pKVM in spurious fault handler Will Deacon
2026-02-12 17:22   ` Alexandru Elisei
2026-03-04 14:06     ` Will Deacon
2026-01-19 12:46 ` [PATCH v2 26/35] KVM: arm64: Return -EFAULT from VCPU_RUN on access to a poisoned pte Will Deacon
2026-01-19 12:46 ` [PATCH v2 27/35] KVM: arm64: Add hvc handler at EL2 for hypercalls from protected VMs Will Deacon
2026-01-19 12:46 ` [PATCH v2 28/35] KVM: arm64: Implement the MEM_SHARE hypercall for " Will Deacon
2026-01-19 12:46 ` [PATCH v2 29/35] KVM: arm64: Implement the MEM_UNSHARE " Will Deacon
2026-01-19 12:46 ` [PATCH v2 30/35] KVM: arm64: Allow userspace to create protected VMs when pKVM is enabled Will Deacon
2026-01-19 12:46 ` [PATCH v2 31/35] KVM: arm64: Add some initial documentation for pKVM Will Deacon
2026-01-19 12:46 ` [PATCH v2 32/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest donation Will Deacon
2026-01-19 12:46 ` [PATCH v2 33/35] KVM: arm64: Register 'selftest_vm' in the VM table Will Deacon
2026-01-19 12:46 ` [PATCH v2 34/35] KVM: arm64: Extend pKVM page ownership selftests to cover forced reclaim Will Deacon
2026-01-19 12:46 ` [PATCH v2 35/35] KVM: arm64: Extend pKVM page ownership selftests to cover guest hvcs Will Deacon
2026-02-10 18:58 ` [PATCH v2 00/35] KVM: arm64: Add support for protected guest memory with pKVM Trilok Soni
2026-02-10 19:03   ` Fuad Tabba
2026-02-16 10:58   ` Venkata Rao Kakani
2026-02-16 11:00     ` Fuad Tabba
2026-02-17 10:43       ` Venkata Rao Kakani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aag8edDkKgfTr_hD@willie-the-truck \
    --to=will@kernel.org \
    --cc=alexandru.elisei@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oupton@kernel.org \
    --cc=qperret@google.com \
    --cc=smostafa@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tabba@google.com \
    --cc=vdonnefort@google.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox