From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5A5A4C83F1A for ; Fri, 11 Jul 2025 14:18:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B133C6B0089; Fri, 11 Jul 2025 10:18:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AEAA46B0092; Fri, 11 Jul 2025 10:18:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DA166B00A4; Fri, 11 Jul 2025 10:18:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 8B12D6B0089 for ; Fri, 11 Jul 2025 10:18:27 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 04BC8140177 for ; Fri, 11 Jul 2025 14:18:26 +0000 (UTC) X-FDA: 83652189054.08.0F43B4F Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf18.hostedemail.com (Postfix) with ESMTP id 200A81C0013 for ; Fri, 11 Jul 2025 14:18:24 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4YL+sPJP; spf=pass (imf18.hostedemail.com: domain of tabba@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752243505; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SPqXh8MG6l1A7ppi1PWLeVr8JkzkP6r8TXr4WFndhco=; b=qB0XoTlMQO1G8wy4mdXTNsYeBg2cQkdkcEHFJm/zo+77NTD0/81Cr76BOC1CoTac56z1n4 XBMGwrTN6DysrmvC1hbeQOoBj3dgwQKokazPpG6TqgvPDw5Qnh+e22ukOxrnQNydXSILpE Xdys1htnw04Jrlx9UDE9oJqlsRIbhlA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4YL+sPJP; spf=pass (imf18.hostedemail.com: domain of tabba@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=tabba@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752243505; a=rsa-sha256; cv=none; b=NnSfe7EXKC+0z7tWk9o/KWpP0Ytrc3iUIRClPytDKsumErTGxrmiQE19kKSwnn0giKpXEN RiN4sRnVDeonYYoIwxOqDjV2BMoYpv+FCvEgieFI21cK+8U1JQ3OBwBHEK69Z8+hcIsVH5 qpL6YdKs/9i+N/sNFHwA5bcnM8RaKQI= Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-4aaf43cbbdcso138271cf.1 for ; Fri, 11 Jul 2025 07:18:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752243504; x=1752848304; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=SPqXh8MG6l1A7ppi1PWLeVr8JkzkP6r8TXr4WFndhco=; b=4YL+sPJPj+s6WzTKwa+S8kT+F22gT5LdQOYjQDGmtdIki5mj7z+1H6qQ1OOVP8D0xA 5UGFChhILCgdTQfqX1Ti7XAcmHpfjbUNSiB63586yrmBO7TE9CoX6MgIzyy3TXwhWZ/B ijKJ8dNX3VQZTUU6fhmQrGX35T7rTexkZqAhkuC0txxY5EFxDJELqQZQRKtsrMFD4kev VHKIfTZCkcsnQGRnEwNekeavwJlXw6IyO1qtM5kh4lTymymoilzrSUqrpcOEvuf/QOFy ICcmCB5DK307i0y2M1/a76otpU2pq8fKFhvkf2wgqhy4Nztm7zeeLCgEENoVlqN+2Juq BSMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752243504; x=1752848304; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SPqXh8MG6l1A7ppi1PWLeVr8JkzkP6r8TXr4WFndhco=; b=vneb4xsee8cMFv1WrYZ9DhJgxhEHwYlYpuXZSFR7B4sgaUXzPFuiebs3lxjCLE70bF e2DFmVPR9LtPoJYtfxnBODcfQpguJuDM3q3m+wmJcuSRRuiBmIqdhC9asy7cabVBjc36 Yn8Hg3FMxQxgncFa44bq1oufanlDSe1C5hPnSwEhuT/F2WxuxgwoDiz7Z5JYazIciVAI kJ1bj7CqrcT0FpD1V/y8wVH5VBChSXS+bBC7HNT2x72cfe5WOcbqCA/93+D8QD0eYJ17 qtzPq6AyG7d9mAMxYOMgoeX4ObiTdIWvbfCmmrIpXrXecIyain8Q4O2JgoiN25y+XT7V EQQw== X-Forwarded-Encrypted: i=1; AJvYcCUIgb6m0cDP0s0H30fMGpcAff0U8s1ut3Mdobr49fFa+MAKTEtwOZDQsZsMsl6rYZOCKymy7C1nyw==@kvack.org X-Gm-Message-State: AOJu0Ywx41758QuctIVTj37G7rwLV1AGXDD2347As3oQYxaCRlfOnY+C mA+ETec33mffuonylSGqyfzph443cVYzfuaYE76Q5uI38VZgH+gzfHulZrwQYgNoF2qp4Hivaty lyVhN/l49Bq4uDgsbsuhJnXuFA00aEdU9rP0dpIV9 X-Gm-Gg: ASbGnctIcxeAq2hbgh2p/KZPUpwnck8LkXuHnoAHKf21CPwyn52hl/X829abqAN5TWl qA4BE6wzHO80EqpEJ0pcl3kTV0ZJXiKpvHPrzGZXGSS4L1zrG2xMpdPzqM3crOP9YeqHnlJecZ8 zBNeH5eRncMSZIij3moqXDo6R1d/5gIoOFEFqTo7CAbDWEHp7hEBdBolEvVxNsfPA/20SUzKhy8 1iOPTU= X-Google-Smtp-Source: AGHT+IFB29LZRp6S9EJ4JwmCmEMzaf7JXF7ZrEyQ3VcuzP+ZF9XJDCj3TztBCcGyfvrNcJcm+eOBHyX3XQUVn4sD0j4= X-Received: by 2002:a05:622a:49:b0:4a9:d263:d983 with SMTP id d75a77b69052e-4a9fbf49438mr3728031cf.22.1752243503244; Fri, 11 Jul 2025 07:18:23 -0700 (PDT) MIME-Version: 1.0 References: <20250709105946.4009897-17-tabba@google.com> <20250711095937.22365-1-roypat@amazon.co.uk> <86a55aalbv.wl-maz@kernel.org> In-Reply-To: <86a55aalbv.wl-maz@kernel.org> From: Fuad Tabba Date: Fri, 11 Jul 2025 15:17:46 +0100 X-Gm-Features: Ac12FXwgQHu30_kYr5OEcOlHvsbWxMi7kgse2dUFQjfHWptAQiWGsqT7tOy6s0Q Message-ID: Subject: Re: [PATCH v13 16/20] KVM: arm64: Handle guest_memfd-backed guest page faults To: Marc Zyngier Cc: "Roy, Patrick" , "ackerleytng@google.com" , "akpm@linux-foundation.org" , "amoorthy@google.com" , "anup@brainfault.org" , "aou@eecs.berkeley.edu" , "brauner@kernel.org" , "catalin.marinas@arm.com" , "chao.p.peng@linux.intel.com" , "chenhuacai@kernel.org" , "david@redhat.com" , "dmatlack@google.com" , "fvdl@google.com" , "hch@infradead.org" , "hughd@google.com" , "ira.weiny@intel.com" , "isaku.yamahata@gmail.com" , "isaku.yamahata@intel.com" , "james.morse@arm.com" , "jarkko@kernel.org" , "jgg@nvidia.com" , "jhubbard@nvidia.com" , "jthoughton@google.com" , "keirf@google.com" , "kirill.shutemov@linux.intel.com" , "kvm@vger.kernel.org" , "kvmarm@lists.linux.dev" , "liam.merwick@oracle.com" , "linux-arm-msm@vger.kernel.org" , "linux-mm@kvack.org" , "mail@maciej.szmigiero.name" , "mic@digikod.net" , "michael.roth@amd.com" , "mpe@ellerman.id.au" , "oliver.upton@linux.dev" , "palmer@dabbelt.com" , "pankaj.gupta@amd.com" , "paul.walmsley@sifive.com" , "pbonzini@redhat.com" , "peterx@redhat.com" , "qperret@google.com" , "quic_cvanscha@quicinc.com" , "quic_eberman@quicinc.com" , "quic_mnalajal@quicinc.com" , "quic_pderrin@quicinc.com" , "quic_pheragu@quicinc.com" , "quic_svaddagi@quicinc.com" , "quic_tsoni@quicinc.com" , "rientjes@google.com" , "seanjc@google.com" , "shuah@kernel.org" , "steven.price@arm.com" , "suzuki.poulose@arm.com" , "vannapurve@google.com" , "vbabka@suse.cz" , "viro@zeniv.linux.org.uk" , "wei.w.wang@intel.com" , "will@kernel.org" , "willy@infradead.org" , "xiaoyao.li@intel.com" , "yilun.xu@intel.com" , "yuzenghui@huawei.com" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 200A81C0013 X-Stat-Signature: ckx7swz5gqy15nsytstabzwmb5pp8tyu X-Rspam-User: X-HE-Tag: 1752243504-967957 X-HE-Meta: U2FsdGVkX19bNbxyLF415QFgd/CsbeIFUu94jmo8DdQF7xhKZbFgSK730d7ce+FPYpcfHlBDDO0XPHZ1XEGWsdje7CXoC3p2+SOkUBDYvfUihOq63pjkhv28bTom45OypxQMZCWO5MIXcTQwKjVW/1ITh8SyqeH3KyRseHSM9Q7qOGVf7rKNmhnnT86DrFnTadSjMEcjLUCRfCt6JqByxJXOcwEvtDmGCTGQ02TUDprR7Lya9iyyD1pTpPijcQh0f848pSOSWPwt5laVH6dgkZnm6hlu2t4K1f/o04UATlzwia/rla38jwSiHwlAX+kdW4zcydLl3tsi1gk/7IAfPcGX1SYYyKcF6docRjjvhaLe2YUb+zgC/5NM+2SRD6cT0m0Lqkkzkdtbc2DrTyQMCW3uq5RwfF2aG7TZFMCAF5zh6qVM/6sgy9ALm8Z+48TAzDbROtQzUjYj2q8MLSj9f+QKJs5EkSmCP1V8cBmy8X+pTDMdJXhj/7ebdSeAvUmJkMZ7T2VHFfIhpSDDv0z6qj7/BkxcuaADaZm4UYXF+eOsEBwEmQ3KPXuniKNZSUqXQHtvZTvKok5x2356tIz5XKr1EyasPcx3MT76rMGGqa2yhPslng1nPGjA3BVP2h6VB+aGjtEjElvKzPbVLrOaq917Ogl0J5xrumXkoNJPN++eD1UMHYy1H+iZ4JmzSqPeZadp+nIBl+EH+lKRd6pDsOSeuU91aqGywvGtuonu60b3bTrzf5CQgsVHx+EtMfVvNuy9s3FbBsfbPR/2Rg5sUDttIVE4ImIq4bnqZxayeQipC141bpKDTj3GN2Tf1g+pW3XAs189D1qGR9OdBdZwTNhknVwTldSyPrwrM73TWqybNbdGxpyLZ9EimjlNSk0/kBzAcDgf29quNo00NKM0+kqERRxOgKJ0T521nSwJfoJR6YscM2/8cno6vozBxvsVlvECuYqid9wyiz9uiJ2 nVlhw35Y 9+i/XWSiM9Q1nSwbrjC1x9PHRhqHhsWbEhL2Cd/QiAH4ioFyykxd73ZDI99zfvzFysXYk3fwbyir0kFVTwXl85dVwuH6Rnjp4mcAatYPyf4V+NE0yoPuzzo6vYxulEtLTsEX7NHL6bT5A6XrPISPo+iMjFrl9EoH67yy91ldhF3oaKb///bOtMdk2iN4n/xbIZkc5nTyulNLx238lc4KLTyecZB+sLbEr9HUyqbsZah3LzS+L9OGzFWiGt66LeCgcJHDgDAIaz3SgRI/oM/4nE1j71fMz1n1tGzUOLZsxs3oev6mp6uNEBDUsY9op2+MW7iaQUBuylYXPvKGWbgti5dtEwnm6e01ATph0185myZK2rwbHY+geZJQP7rP+1OCvzkTvVeHnWBeTIFF2Sg05BkRQo+fEbsH7uEtXg6ZsYbHxF/yovshhi8VzjwOrGJWOoGvjjgunYN2kwCX8RJArKNxP/bmKfxa8fRGJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Marc, On Fri, 11 Jul 2025 at 14:50, Marc Zyngier wrote: > > On Fri, 11 Jul 2025 10:59:39 +0100, > "Roy, Patrick" wrote: > > > > > > Hi Fuad, > > > > On Wed, 2025-07-09 at 11:59 +0100, Fuad Tabba wrote:> -snip- > > > +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED) > > > + > > > +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > > + struct kvm_s2_trans *nested, > > > + struct kvm_memory_slot *memslot, bool is_perm) > > > +{ > > > + bool write_fault, exec_fault, writable; > > > + enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS; > > > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > > > + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt; > > > + struct page *page; > > > + struct kvm *kvm = vcpu->kvm; > > > + void *memcache; > > > + kvm_pfn_t pfn; > > > + gfn_t gfn; > > > + int ret; > > > + > > > + ret = prepare_mmu_memcache(vcpu, true, &memcache); > > > + if (ret) > > > + return ret; > > > + > > > + if (nested) > > > + gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT; > > > + else > > > + gfn = fault_ipa >> PAGE_SHIFT; > > > + > > > + write_fault = kvm_is_write_fault(vcpu); > > > + exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > > > + > > > + if (write_fault && exec_fault) { > > > + kvm_err("Simultaneous write and execution fault\n"); > > > + return -EFAULT; > > > + } > > > + > > > + if (is_perm && !write_fault && !exec_fault) { > > > + kvm_err("Unexpected L2 read permission error\n"); > > > + return -EFAULT; > > > + } > > > + > > > + ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL); > > > + if (ret) { > > > + kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE, > > > + write_fault, exec_fault, false); > > > + return ret; > > > + } > > > + > > > + writable = !(memslot->flags & KVM_MEM_READONLY); > > > + > > > + if (nested) > > > + adjust_nested_fault_perms(nested, &prot, &writable); > > > + > > > + if (writable) > > > + prot |= KVM_PGTABLE_PROT_W; > > > + > > > + if (exec_fault || > > > + (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) && > > > + (!nested || kvm_s2_trans_executable(nested)))) > > > + prot |= KVM_PGTABLE_PROT_X; > > > + > > > + kvm_fault_lock(kvm); > > > > Doesn't this race with gmem invalidations (e.g. fallocate(PUNCH_HOLE))? > > E.g. if between kvm_gmem_get_pfn() above and this kvm_fault_lock() a > > gmem invalidation occurs, don't we end up with stage-2 page tables > > refering to a stale host page? In user_mem_abort() there's the "grab > > mmu_invalidate_seq before dropping mmap_lock and check it hasnt changed > > after grabbing mmu_lock" which prevents this, but I don't really see an > > equivalent here. > > Indeed. We have a similar construct in kvm_translate_vncr() as well, > and I'd definitely expect something of the sort 'round here. If for > some reason this is not needed, then a comment explaining why would be > welcome. > > But this brings me to another interesting bit: kvm_translate_vncr() is > another path that deals with a guest translation fault (despite being > caught as an EL2 S1 fault), and calls kvm_faultin_pfn(). What happens > when the backing store is gmem? Probably nothin I'll add guest_memfd handling logic to kvm_translate_vncr(). > I don't immediately see why NV and gmem should be incompatible, so > something must be done on that front too (including the return to > userspace if the page is gone). Should it return to userspace or go back to the guest? user_mem_abort() returns to the guest if the page disappears (I don't quite understand the rationale behind that, but it was a deliberate change [1]): on mmu_invalidate_retry() it sets ret to -EAGAIN [2], which gets flipped to 0 on returning from user_mem_abort() [3]. [1] https://lore.kernel.org/all/20210114121350.123684-4-wangyanan55@huawei.com/ [2] https://elixir.bootlin.com/linux/v6.16-rc5/source/arch/arm64/kvm/mmu.c#L1690 [3] https://elixir.bootlin.com/linux/v6.16-rc5/source/arch/arm64/kvm/mmu.c#L1764 Cheers, /fuad > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible.