From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CAEE2EF8FEA for ; Wed, 4 Mar 2026 14:07:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=RyuNkplEmWWWfsHzLrnUmsW89o+IaiV9QyQ60TTkBGw=; b=dZEZsBYvteEAvEyEc2S2FnXYPs HsBptOpkzTQAQkuOczCaf9JlJ2gRa5WvvfmzicK19cm4PKHhpRBIySJwfsJwYDq833r83N8tC1/+H z2FqVM8PVwH08bWB1vo4j1eX15B6fksPdaJQRCvWsF62XIlnUvIE+MnxKJx+xIG4YHi/mkf71wux8 pSMpupO7oOL1AFbe0/uDQeom7K6EFkM0b0t/IZOpM+uU2ia5YBhAR4Ph49nmO1LYdKBMHDn8Eh+hE ki8/vNz2dJv5DT2bqWaB/wpGAhbVkIgFTj9whGrumHsBID7Iw0Jn/w+n0ZG2yl0rw3t7cu4wALNqN mWbyMdRg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vxmsM-0000000HKDj-2OU5; Wed, 04 Mar 2026 14:06:58 +0000 Received: from sea.source.kernel.org ([2600:3c0a:e001:78e:0:1991:8:25]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1vxmsK-0000000HKCI-1ziV for linux-arm-kernel@lists.infradead.org; Wed, 04 Mar 2026 14:06:57 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id 43152437DB; Wed, 4 Mar 2026 14:06:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E2A2C4CEF7; Wed, 4 Mar 2026 14:06:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772633215; bh=gdTvGR1HAMd/24WUkklJSA4hbg82Mz6vxLyASbCCMXM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=DWWb/RRBfBLwLwQbLS5nh17ddUUFQWLNr6KxAw89b5/xIMYwGwGjyn30FiOQh7pph DMxIGtMrs6rvgJ+/W9zzB1YTG77TA/ObRGTDOee6pNfi6zokQ+reHp+vpYMP0Zx/9Y OZTkJw7QzkB4w57+0FtW5NqI0yepogsF4NdEIGxW4Eugig3baYlkQmjysRGyTtKZvG /eY8J60Q4SkETWyfpMmGsBPXlSgInpSgyU1KbIoMrvm2c/KgOggUIlMiS0AkN1kPM9 QBz0Z8I4rchcao+wJSYsrBdeUCpymaGJFw9DwkGO/agkWEDJpg1oePyhg/plK3OyW3 96vHhIKLARSGQ== Date: Wed, 4 Mar 2026 14:06:49 +0000 From: Will Deacon To: Alexandru Elisei Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Marc Zyngier , Oliver Upton , Joey Gouly , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Quentin Perret , Fuad Tabba , Vincent Donnefort , Mostafa Saleh Subject: Re: [PATCH v2 14/35] KVM: arm64: Handle aborts from protected VMs Message-ID: References: <20260119124629.2563-1-will@kernel.org> <20260119124629.2563-15-will@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20260304_060656_558821_0764BF31 X-CRM114-Status: GOOD ( 39.06 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, Feb 12, 2026 at 10:37:19AM +0000, Alexandru Elisei wrote: > On Mon, Jan 19, 2026 at 12:46:07PM +0000, Will Deacon wrote: > > Introduce a new abort handler for resolving stage-2 page faults from > > protected VMs by pinning and donating anonymous memory. This is > > considerably simpler than the infamous user_mem_abort() as we only have > > to deal with translation faults at the pte level. > > > > Signed-off-by: Will Deacon > > --- > > arch/arm64/kvm/mmu.c | 89 ++++++++++++++++++++++++++++++++++++++++---- > > 1 file changed, 81 insertions(+), 8 deletions(-) > > > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > > index a23a4b7f108c..b21a5bf3d104 100644 > > --- a/arch/arm64/kvm/mmu.c > > +++ b/arch/arm64/kvm/mmu.c > > @@ -1641,6 +1641,74 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > return ret != -EAGAIN ? ret : 0; > > } > > > > +static int pkvm_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > + struct kvm_memory_slot *memslot, unsigned long hva) > > +{ > > + unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE; > > + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt; > > + struct mm_struct *mm = current->mm; > > + struct kvm *kvm = vcpu->kvm; > > + void *hyp_memcache; > > + struct page *page; > > + int ret; > > + > > + ret = prepare_mmu_memcache(vcpu, true, &hyp_memcache); > > + if (ret) > > + return -ENOMEM; > > + > > + ret = account_locked_vm(mm, 1, true); > > + if (ret) > > + return ret; > > + > > + mmap_read_lock(mm); > > + ret = pin_user_pages(hva, 1, flags, &page); > > + mmap_read_unlock(mm); > > If the page is part of a large folio, the entire folio gets pinned here, not > just the page returned by pin_user_pages(). Do you reckon that should be > considered when calling account_locked_vm()? I don't _think_ so. Since we only ask for a single page when we call pin_user_pages(), the folio refcount will be adjusted by 1, even for large folios. Trying to adjust the accounting based on whether the pinned page forms part of a large folio feels error-prone, not least because the migration triggered by the longterm pin could actually end up splitting the folio but also because we'd have to avoid double accounting on subsequent faults to the same folio. It also feels fragile if the mm code is able to split partially pinned folios in future (like it appears to be able to for partially mapped folios). > > + if (ret == -EHWPOISON) { > > + kvm_send_hwpoison_signal(hva, PAGE_SHIFT); > > + ret = 0; > > + goto dec_account; > > + } else if (ret != 1) { > > + ret = -EFAULT; > > + goto dec_account; > > + } else if (!folio_test_swapbacked(page_folio(page))) { > > + /* > > + * We really can't deal with page-cache pages returned by GUP > > + * because (a) we may trigger writeback of a page for which we > > + * no longer have access and (b) page_mkclean() won't find the > > + * stage-2 mapping in the rmap so we can get out-of-whack with > > + * the filesystem when marking the page dirty during unpinning > > + * (see cc5095747edf ("ext4: don't BUG if someone dirty pages > > + * without asking ext4 first")). > > I've been trying to wrap my head around this. Would you mind providing a few > more hints about what the issue is? I'm sure the approach is correct, it's > likely just me not being familiar with the code. The fundamental problem is that unmapping page-cache pages from the host stage-2 can confuse filesystems who don't know that either the page is now inaccessible (and so may attempt to access it) or that the page can be accessed concurrently by the guest without updating the page state. To fix those issues, we would need to support MMU notifiers for protected memory but that would allow the host to mess with the guest stage-2 page-table, which breaks the security model that we're trying to uphold. > > @@ -2190,15 +2258,20 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu) > > goto out_unlock; > > } > > > > - VM_WARN_ON_ONCE(kvm_vcpu_trap_is_permission_fault(vcpu) && > > - !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu)); > > + if (kvm_vm_is_protected(vcpu->kvm)) { > > + ret = pkvm_mem_abort(vcpu, fault_ipa, memslot, hva); > > I guess the reason this comes after handling an access fault is because you want > the WARN_ON() to trigger in pkvm_pgtable_stage2_mkyoung(). Right, we should only ever see translation faults for protected guests and that's all that pkvm_mem_abort() is prepared to handle, so we call it last. Will