From: Marc Zyngier <maz@kernel.org>
To: Fuad Tabba <tabba@google.com>
Cc: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org,
linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com,
chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org,
paul.walmsley@sifive.com, palmer@dabbelt.com,
aou@eecs.berkeley.edu, seanjc@google.com,
viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org,
akpm@linux-foundation.org, xiaoyao.li@intel.com,
yilun.xu@intel.com, chao.p.peng@linux.intel.com,
jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com,
isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz,
vannapurve@google.com, ackerleytng@google.com,
mail@maciej.szmigiero.name, david@redhat.com,
michael.roth@amd.com, wei.w.wang@intel.com,
liam.merwick@oracle.com, isaku.yamahata@gmail.com,
kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com,
steven.price@arm.com, quic_eberman@quicinc.com,
quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com,
quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com,
quic_pderrin@quicinc.com, quic_pheragu@quicinc.com,
catalin.marinas@arm.com, james.morse@arm.com,
yuzenghui@huawei.com, oliver.upton@linux.dev, will@kernel.org,
qperret@google.com, keirf@google.com, roypat@amazon.co.uk,
shuah@kernel.org, hch@infradead.org, jgg@nvidia.com,
rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com,
hughd@google.com, jthoughton@google.com, peterx@redhat.com,
pankaj.gupta@amd.com, ira.weiny@intel.com
Subject: Re: [PATCH v13 16/20] KVM: arm64: Handle guest_memfd-backed guest page faults
Date: Fri, 11 Jul 2025 17:37:56 +0100 [thread overview]
Message-ID: <865xfyadjv.wl-maz@kernel.org> (raw)
In-Reply-To: <20250709105946.4009897-17-tabba@google.com>
On Wed, 09 Jul 2025 11:59:42 +0100,
Fuad Tabba <tabba@google.com> wrote:
>
> Add arm64 architecture support for handling guest page faults on memory
> slots backed by guest_memfd.
>
> This change introduces a new function, gmem_abort(), which encapsulates
> the fault handling logic specific to guest_memfd-backed memory. The
> kvm_handle_guest_abort() entry point is updated to dispatch to
> gmem_abort() when a fault occurs on a guest_memfd-backed memory slot (as
> determined by kvm_slot_has_gmem()).
>
> Until guest_memfd gains support for huge pages, the fault granule for
> these memory regions is restricted to PAGE_SIZE.
>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Reviewed-by: James Houghton <jthoughton@google.com>
> Signed-off-by: Fuad Tabba <tabba@google.com>
> ---
> arch/arm64/kvm/mmu.c | 82 ++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 79 insertions(+), 3 deletions(-)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 58662e0ef13e..71f8b53683e7 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1512,6 +1512,78 @@ static void adjust_nested_fault_perms(struct kvm_s2_trans *nested,
> *prot |= kvm_encode_nested_level(nested);
> }
>
> +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED)
> +
> +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> + struct kvm_s2_trans *nested,
> + struct kvm_memory_slot *memslot, bool is_perm)
> +{
> + bool write_fault, exec_fault, writable;
> + enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS;
> + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R;
> + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt;
> + struct page *page;
> + struct kvm *kvm = vcpu->kvm;
> + void *memcache;
> + kvm_pfn_t pfn;
> + gfn_t gfn;
> + int ret;
> +
> + ret = prepare_mmu_memcache(vcpu, true, &memcache);
> + if (ret)
> + return ret;
> +
> + if (nested)
> + gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT;
> + else
> + gfn = fault_ipa >> PAGE_SHIFT;
> +
> + write_fault = kvm_is_write_fault(vcpu);
> + exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> +
> + if (write_fault && exec_fault) {
> + kvm_err("Simultaneous write and execution fault\n");
> + return -EFAULT;
> + }
I don't think we need to cargo-cult this stuff. This cannot happen
architecturally (data and instruction aborts are two different
exceptions, so you can't have both at the same time), and is only
there because we were young and foolish when we wrote this crap.
Now that we (the royal We) are only foolish, we can save a few bits by
dropping it. Or turn it into a VM_BUG_ON() if you really want to keep
it.
> +
> + if (is_perm && !write_fault && !exec_fault) {
> + kvm_err("Unexpected L2 read permission error\n");
> + return -EFAULT;
> + }
Again, this is copying something that was always a bit crap:
- it's not an "error", it's a permission fault
- it's not "L2", it's "stage-2"
But this should equally be turned into an assertion, ideally in a
single spot. See below for the usual untested hack.
Thanks,
M.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b92ce4d9b4e01..c79dc8fd45d5a 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1540,16 +1540,7 @@ static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
write_fault = kvm_is_write_fault(vcpu);
exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
-
- if (write_fault && exec_fault) {
- kvm_err("Simultaneous write and execution fault\n");
- return -EFAULT;
- }
-
- if (is_perm && !write_fault && !exec_fault) {
- kvm_err("Unexpected L2 read permission error\n");
- return -EFAULT;
- }
+ VM_BUG_ON(write_fault && exec_fault);
ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL);
if (ret) {
@@ -1616,11 +1607,6 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
VM_BUG_ON(write_fault && exec_fault);
- if (fault_is_perm && !write_fault && !exec_fault) {
- kvm_err("Unexpected L2 read permission error\n");
- return -EFAULT;
- }
-
/*
* Permission faults just need to update the existing leaf entry,
* and so normally don't require allocations from the memcache. The
@@ -2035,6 +2021,9 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
goto out_unlock;
}
+ VM_BUG_ON(kvm_vcpu_trap_is_permission_fault(vcpu) &&
+ !write_fault && !kvm_vcpu_trap_is_exec_fault(vcpu));
+
if (kvm_slot_has_gmem(memslot))
ret = gmem_abort(vcpu, fault_ipa, nested, memslot,
esr_fsc_is_permission_fault(esr));
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2025-07-11 16:38 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-09 10:59 [PATCH v13 00/20] KVM: Enable host userspace mapping for guest_memfd-backed memory for non-CoCo VMs Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 01/20] KVM: Rename CONFIG_KVM_PRIVATE_MEM to CONFIG_KVM_GMEM Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 02/20] KVM: Rename CONFIG_KVM_GENERIC_PRIVATE_MEM to CONFIG_KVM_GENERIC_GMEM_POPULATE Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 03/20] KVM: Introduce kvm_arch_supports_gmem() Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 04/20] KVM: x86: Introduce kvm->arch.supports_gmem Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 05/20] KVM: Rename kvm_slot_can_be_private() to kvm_slot_has_gmem() Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 06/20] KVM: Fix comments that refer to slots_lock Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 07/20] KVM: Fix comment that refers to kvm uapi header path Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 08/20] KVM: guest_memfd: Allow host to map guest_memfd pages Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 09/20] KVM: guest_memfd: Track guest_memfd mmap support in memslot Fuad Tabba
2025-07-11 8:34 ` Shivank Garg
2025-07-09 10:59 ` [PATCH v13 10/20] KVM: x86/mmu: Generalize private_max_mapping_level x86 op to max_mapping_level Fuad Tabba
2025-07-11 9:36 ` David Hildenbrand
2025-07-09 10:59 ` [PATCH v13 11/20] KVM: x86/mmu: Allow NULL-able fault in kvm_max_private_mapping_level Fuad Tabba
2025-07-11 9:38 ` David Hildenbrand
2025-07-09 10:59 ` [PATCH v13 12/20] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 13/20] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 14/20] KVM: x86: Enable guest_memfd mmap for default VM type Fuad Tabba
2025-07-11 1:14 ` kernel test robot
2025-07-11 9:45 ` David Hildenbrand
2025-07-11 11:09 ` Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 15/20] KVM: arm64: Refactor user_mem_abort() Fuad Tabba
2025-07-11 13:25 ` Marc Zyngier
2025-07-09 10:59 ` [PATCH v13 16/20] KVM: arm64: Handle guest_memfd-backed guest page faults Fuad Tabba
2025-07-11 9:59 ` Roy, Patrick
2025-07-11 11:08 ` Fuad Tabba
2025-07-11 13:49 ` Marc Zyngier
2025-07-11 14:17 ` Fuad Tabba
2025-07-11 15:48 ` Marc Zyngier
2025-07-14 6:35 ` Fuad Tabba
2025-07-11 16:37 ` Marc Zyngier [this message]
2025-07-14 7:42 ` Fuad Tabba
2025-07-14 8:04 ` Marc Zyngier
2025-07-09 10:59 ` [PATCH v13 17/20] KVM: arm64: Enable host mapping of shared guest_memfd memory Fuad Tabba
2025-07-11 14:25 ` Marc Zyngier
2025-07-11 14:34 ` Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 18/20] KVM: Introduce the KVM capability KVM_CAP_GMEM_MMAP Fuad Tabba
2025-07-11 8:48 ` Shivank Garg
2025-07-09 10:59 ` [PATCH v13 19/20] KVM: selftests: Do not use hardcoded page sizes in guest_memfd test Fuad Tabba
2025-07-09 10:59 ` [PATCH v13 20/20] KVM: selftests: guest_memfd mmap() test when mmap is supported Fuad Tabba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=865xfyadjv.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=ackerleytng@google.com \
--cc=akpm@linux-foundation.org \
--cc=amoorthy@google.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chao.p.peng@linux.intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=dmatlack@google.com \
--cc=fvdl@google.com \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=ira.weiny@intel.com \
--cc=isaku.yamahata@gmail.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=jarkko@kernel.org \
--cc=jgg@nvidia.com \
--cc=jhubbard@nvidia.com \
--cc=jthoughton@google.com \
--cc=keirf@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=liam.merwick@oracle.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mail@maciej.szmigiero.name \
--cc=mic@digikod.net \
--cc=michael.roth@amd.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=pankaj.gupta@amd.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qperret@google.com \
--cc=quic_cvanscha@quicinc.com \
--cc=quic_eberman@quicinc.com \
--cc=quic_mnalajal@quicinc.com \
--cc=quic_pderrin@quicinc.com \
--cc=quic_pheragu@quicinc.com \
--cc=quic_svaddagi@quicinc.com \
--cc=quic_tsoni@quicinc.com \
--cc=rientjes@google.com \
--cc=roypat@amazon.co.uk \
--cc=seanjc@google.com \
--cc=shuah@kernel.org \
--cc=steven.price@arm.com \
--cc=suzuki.poulose@arm.com \
--cc=tabba@google.com \
--cc=vannapurve@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=wei.w.wang@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=xiaoyao.li@intel.com \
--cc=yilun.xu@intel.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).