From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A781C83F1A for ; Fri, 11 Jul 2025 13:50:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F3A36B007B; Fri, 11 Jul 2025 09:50:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CBDE6B00A4; Fri, 11 Jul 2025 09:50:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F23F96B00A5; Fri, 11 Jul 2025 09:50:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E3C4C6B007B for ; Fri, 11 Jul 2025 09:50:05 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 944ED11221C for ; Fri, 11 Jul 2025 13:50:05 +0000 (UTC) X-FDA: 83652117570.17.410B757 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf20.hostedemail.com (Postfix) with ESMTP id 9FD181C000F for ; Fri, 11 Jul 2025 13:50:03 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ex0NQL1v; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of maz@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=maz@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752241803; a=rsa-sha256; cv=none; b=QoyTAXSVT75MoU+CPaU5qtDugJbH/ygWIOSgX5vjR0Cht8iIzkju6U4/ziJoJVwu5bR2GR e4YzdgNYN2ke+7I1rKSfO/DgWy73rw2Aq2BdYZbcll9clcC2jKILkt8JWlnNyEscOqgvpH 6/ZIXOChphyx+/7PIWLPsAGUP9t2Zcs= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ex0NQL1v; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf20.hostedemail.com: domain of maz@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=maz@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752241803; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Tcw+pZog0vhyrpNwIfPoeYnaJEbLcTZ25eS/XCLTCd0=; b=ifleIrivKHNU4mNEEyYPrDGgHo2vHHjXhDi+0cGAcJxniOQfGt+U9zqFOqoGx8wcu3AVgF m8BgqXgupqPFlLMWOvVLtpPfdTN75I7zguLs9ffDcLYNGk4uVkEipsYKk2KHWz5yagTUOJ t7oxGS2Df3YFbPi8gZam8U+KrYmhGuQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 6F4555C70F1; Fri, 11 Jul 2025 13:50:02 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C45ABC4CEF0; Fri, 11 Jul 2025 13:50:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1752241802; bh=Q9f2v4IJA+QGsJHQoinYJy5bvAk8yXzuvhOFwXDXv4o=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ex0NQL1vDhdncgmiYXCmU3DpofZgoGGw6WTDF5RxZE365iiYnABDMrI9dBtvTOeap EKpYixCy2oRwA+dAD22F2baw9hJnxCvQ/2Yg9RmKVWN3xbZvXr+ip6k3OpT+C4QE8C I6bnQEF0ColAZ0V2qDktz0DiAMUae8l0V6XsWz7n52LLK1JHmbNq9a9J6U3qIiQmcH 5PpAE0BDJaO1wd9KA11DtoHUYWe/x+rjhT3c2VkXwEPkyenoGojQEWkybS4cpEzJNN ayGOQpWP06L/0hoOrUPBCG7U4iD/rjZ/TZneO9vjZfTKkfC5hytOZUPpDV31FOVDIC pVndZ3UqPtGeA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1uaE8U-00EtR4-Ty; Fri, 11 Jul 2025 14:49:59 +0100 Date: Fri, 11 Jul 2025 14:49:56 +0100 Message-ID: <86a55aalbv.wl-maz@kernel.org> From: Marc Zyngier To: "Roy, Patrick" , "Fuad Tabba" Cc: "ackerleytng@google.com" , "akpm@linux-foundation.org" , "amoorthy@google.com" , "anup@brainfault.org" , "aou@eecs.berkeley.edu" , "brauner@kernel.org" , "catalin.marinas@arm.com" , "chao.p.peng@linux.intel.com" , "chenhuacai@kernel.org" , "david@redhat.com" , "dmatlack@google.com" , "fvdl@google.com" , "hch@infradead.org" , "hughd@google.com" , "ira.weiny@intel.com" , "isaku.yamahata@gmail.com" , "isaku.yamahata@intel.com" , "james.morse@arm.com" , "jarkko@kernel.org" , "jgg@nvidia.com" , "jhubbard@nvidia.com" , "jthoughton@google.com" , "keirf@google.com" , "kirill.shutemov@linux.intel.com" , "kvm@vger.kernel.org" , "kvmarm@lists.linux.dev" , "liam.merwick@oracle.com" , "linux-arm-msm@vger.kernel.org" , "linux-mm@kvack.org" , "mail@maciej.szmigiero.name" , "mic@digikod.net" , "michael.roth@amd.com" , "mpe@ellerman.id.au" , "oliver.upton@linux.dev" , "palmer@dabbelt.com" , "pankaj.gupta@amd.com" , "paul.walmsley@sifive.com" , "pbonzini@redhat.com" , "peterx@redhat.com" , "qperret@google.com" , "quic_cvanscha@quicinc.com" , "quic_eberman@quicinc.com" , "quic_mnalajal@quicinc.com" , "quic_pderrin@quicinc.com" , "quic_pheragu@quicinc.com" , "quic_svaddagi@quicinc.com" , "quic_tsoni@quicinc.com" , "rientjes@google.com" , "seanjc@google.com" , "shuah@kernel.org" , "steven.price@arm.com" , "suzuki.poulose@arm.com" , "vannapurve@google.com" , "vbabka@suse.cz" , "viro@zeniv.linux.org.uk" , "wei.w.wang@intel.com" , "will@kernel.org" , "willy@infradead.org" , "xiaoyao.li@intel.com" , "yilun.xu@intel.com" , "yuzenghui@huawei.com" Subject: Re: [PATCH v13 16/20] KVM: arm64: Handle guest_memfd-backed guest page faults In-Reply-To: <20250711095937.22365-1-roypat@amazon.co.uk> References: <20250709105946.4009897-17-tabba@google.com> <20250711095937.22365-1-roypat@amazon.co.uk> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: roypat@amazon.co.uk, tabba@google.com, ackerleytng@google.com, akpm@linux-foundation.org, amoorthy@google.com, anup@brainfault.org, aou@eecs.berkeley.edu, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@linux.intel.com, chenhuacai@kernel.org, david@redhat.com, dmatlack@google.com, fvdl@google.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@gmail.com, isaku.yamahata@intel.com, james.morse@arm.com, jarkko@kernel.org, jgg@nvidia.com, jhubbard@nvidia.com, jthoughton@google.com, keirf@google.com, kirill.shutemov@linux.intel.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, liam.merwick@oracle.com, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, mail@maciej.szmigiero.name, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, peterx@redhat.com, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@qu icinc.co m, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, rientjes@google.com, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, suzuki.poulose@arm.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Rspamd-Queue-Id: 9FD181C000F X-Stat-Signature: hpp771zqrnh13ceonc5f4bnkipx8ykjo X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1752241803-475243 X-HE-Meta: U2FsdGVkX1/PlAntNhGdZY6wpWr7+C9Jd1SZSUiELJcRSYQzv1hLT04EJmEkWJNfzMYfD1oFyOKWi6RDtzwqu4TASlRQzKJNovgiRj4iG4ouMhydegWDi9I+EaMsksVIAbZQvZGLV6IcYTnXYkW/7rYVR+aYuGWdX1kZbmA77TRlkw3c+XU7qKSeyJzAzZTu4YjklWllL4k0YedNxLHFuLytW5uDoG3BBBF1nQUydYFr15THXOJANahIqh8Oq1CEET8S3SwptCqvcl0gOL2Sk2Dkpqdt4aGgd7HRUW0nmc/POrOjBCGppb72LNKEoVZriC+mLxLNZuik+v+D3b/tHeG/kykoaIWoE2tWXanHzoFxl4cZ7AxFQ2/dX4sLcNS5HgzkXOZooH7YH1JsNcmIJAp0bKkfpO3PUprHh35Pgl16DIZ94geSThLlp6jBup65J1xWNlGYUmQhN+Mc9G7JcbJhiwufogrzkdRzuZA8X93eGICH4ZmUUGzy1ZTzJIXzlogxIhI9y0/1PmV5bINpP4SXbvhBO/N8IwC6asSzcrpUkiy4qsS1vDA/F+ucNambq/z7rGKN0t1nHZuXfuKYrAWoOjPcIbqokpZhIIRwGBFngF1vCktmTCv+giyxJsFc6y3M56AyWSVyEeGJdm+v8VJQqDwUMFdDB5n1SdMtOESMXlUSIan5HH/hOq5E/WvMJObJyfJdfvp4vax1gK94Y9M6fVI4RL16LA+YPxkL3oDD5IIy0tkxnSGZAH2BqrcX7MoGFgNt8axHz5tT63/ci82g47rJ5Hp9rQl8vPzoLkHViKrTOlJpBuxnpCVFS/mc/6xFBkNLS94ilXjgqULAHlTCGuCK7lpS5qEYX39NQx7nnM6u8HW5H/MrrDcdtCGENlFki2CS4ALNWQkZz+jQ9u5eirrCSCcipphx0cLTUHvfHWWy+ybJk3O4jFLMGyqkSqVBdE16oiO4AUMU3MZ L8qeLJKO fFyux1rHUHpghwcMoocy7ha29sWHLyMV0F6QaJ22365UPRcsWFBuqYSaDg6WPSVDpsOXl6lkomp+dNdVDnlMUEnzcm+TO0i0CLAmh3lLzap91U/s5buc0WKCJHZ/TFIF2Ue/rnKV+XE9C9eQFMgphZFkG3XvPHGCB0pqFb53tSIdZ4sBjzZWVtDMk+Ret0K5dSXP7qz0Y3MQY3KNbUliW7q3ibctG5adKsp13ILP9nceyeqxI6CHELaUSp2vBOp3mcEg+ElVNQlTrg9JguP+xKT4ETjbyvGALPLIJHoljxYr8vtg61hgsPUqo1B4ykiaqiJrLBGOOPiJzonwen6adrMxpweR7tZRf5BWl+Gb7t//Eb7GWZ3h/jBGvihjUVgjFdHjwvYMBeh9G+fpxV5ffoYrn4A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 11 Jul 2025 10:59:39 +0100, "Roy, Patrick" wrote: > > > Hi Fuad, > > On Wed, 2025-07-09 at 11:59 +0100, Fuad Tabba wrote:> -snip- > > +#define KVM_PGTABLE_WALK_MEMABORT_FLAGS (KVM_PGTABLE_WALK_HANDLE_FAULT | KVM_PGTABLE_WALK_SHARED) > > + > > +static int gmem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > > + struct kvm_s2_trans *nested, > > + struct kvm_memory_slot *memslot, bool is_perm) > > +{ > > + bool write_fault, exec_fault, writable; > > + enum kvm_pgtable_walk_flags flags = KVM_PGTABLE_WALK_MEMABORT_FLAGS; > > + enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; > > + struct kvm_pgtable *pgt = vcpu->arch.hw_mmu->pgt; > > + struct page *page; > > + struct kvm *kvm = vcpu->kvm; > > + void *memcache; > > + kvm_pfn_t pfn; > > + gfn_t gfn; > > + int ret; > > + > > + ret = prepare_mmu_memcache(vcpu, true, &memcache); > > + if (ret) > > + return ret; > > + > > + if (nested) > > + gfn = kvm_s2_trans_output(nested) >> PAGE_SHIFT; > > + else > > + gfn = fault_ipa >> PAGE_SHIFT; > > + > > + write_fault = kvm_is_write_fault(vcpu); > > + exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu); > > + > > + if (write_fault && exec_fault) { > > + kvm_err("Simultaneous write and execution fault\n"); > > + return -EFAULT; > > + } > > + > > + if (is_perm && !write_fault && !exec_fault) { > > + kvm_err("Unexpected L2 read permission error\n"); > > + return -EFAULT; > > + } > > + > > + ret = kvm_gmem_get_pfn(kvm, memslot, gfn, &pfn, &page, NULL); > > + if (ret) { > > + kvm_prepare_memory_fault_exit(vcpu, fault_ipa, PAGE_SIZE, > > + write_fault, exec_fault, false); > > + return ret; > > + } > > + > > + writable = !(memslot->flags & KVM_MEM_READONLY); > > + > > + if (nested) > > + adjust_nested_fault_perms(nested, &prot, &writable); > > + > > + if (writable) > > + prot |= KVM_PGTABLE_PROT_W; > > + > > + if (exec_fault || > > + (cpus_have_final_cap(ARM64_HAS_CACHE_DIC) && > > + (!nested || kvm_s2_trans_executable(nested)))) > > + prot |= KVM_PGTABLE_PROT_X; > > + > > + kvm_fault_lock(kvm); > > Doesn't this race with gmem invalidations (e.g. fallocate(PUNCH_HOLE))? > E.g. if between kvm_gmem_get_pfn() above and this kvm_fault_lock() a > gmem invalidation occurs, don't we end up with stage-2 page tables > refering to a stale host page? In user_mem_abort() there's the "grab > mmu_invalidate_seq before dropping mmap_lock and check it hasnt changed > after grabbing mmu_lock" which prevents this, but I don't really see an > equivalent here. Indeed. We have a similar construct in kvm_translate_vncr() as well, and I'd definitely expect something of the sort 'round here. If for some reason this is not needed, then a comment explaining why would be welcome. But this brings me to another interesting bit: kvm_translate_vncr() is another path that deals with a guest translation fault (despite being caught as an EL2 S1 fault), and calls kvm_faultin_pfn(). What happens when the backing store is gmem? Probably nothin I don't immediately see why NV and gmem should be incompatible, so something must be done on that front too (including the return to userspace if the page is gone). Thanks, M. -- Without deviation from the norm, progress is not possible.