From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 0C5C4FCB619 for ; Fri, 6 Mar 2026 15:34:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=XBIKfWL0VOzyYVKgeV3FXhMfWq9VeLVVvB7ea5H4Iro=; b=NCUju5hw49oV1cDF8amL26vUIo R/wMTwC/fPq+BsqgJEVEuGjsv6nUNxa0RXBftww2Qc+77f1GwbZJ4FYrbDNeZoXWcFQrQ8oLVQeQz ky4I4obF57s84pLYlQxvZn8fHo/ZCZ4/CFKPQBjqRbLjlNEZvxi/hs1tnrsZFyX3Ehm+5tdHWLrzZ 7cxjjZQ6Vn4CEXh0TgnfIhCMZ20JpuXqndaKAQbRzP1yOB5nA0Xmdj3HVh9hTUZIpqJpKIMpyySiz abBi0OjhyTeJ4E9+qBruCrAAZUWJGQUuNOa/HMCXsGW4tIxc9WW/PU9AV9t9TM7/08gMx+jDtxwoO vD0osQNg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1vyXC0-000000042s9-2MCH; Fri, 06 Mar 2026 15:34:20 +0000 Received: from sea.source.kernel.org ([172.234.252.31]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1vyXBy-000000042rQ-3wSH for linux-arm-kernel@lists.infradead.org; Fri, 06 Mar 2026 15:34:20 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sea.source.kernel.org (Postfix) with ESMTP id CB96744051; Fri, 6 Mar 2026 15:34:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A33AAC4CEF7; Fri, 6 Mar 2026 15:34:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1772811257; bh=8uvPh2HpjMCl9fCTBfKM7y2fpeBlB+HBuTT9XIIMrOc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=AouAY/n6tG1Ud6aCgEJZbMhjAIEiHJvq35rA9cVJQdp6nlHD0C6VYHlzdIJIiMlHN 6REVfZRIP7au3oXKOKlwsP71Tmpz/WVK+f5OSRXKpG1nbYaS6qJfKZ5O9c8zJoLJpl CWujP3D/+JoiRtVDT87ywoEvatSNdgPcBQc/Kt2/JAsQ+l7tTl5AaSIIQUgGPgl2kk qgx87kmMc3ai65mn0juYCYyAMByo/wLfnrFM5H+iPTBpEIsXdvK/B0BGYSiE6tJMTy mEW/zkieqflspSe14t6uUCyvwHDsdQt/4YTkBeZ/f9gor41MVqoBLyHvUn6XkXUvjF kgYppsJFIN9Pw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1vyXBv-0000000Guzf-215X; Fri, 06 Mar 2026 15:34:15 +0000 Date: Fri, 06 Mar 2026 15:34:15 +0000 Message-ID: <86ikb96kg8.wl-maz@kernel.org> From: Marc Zyngier To: Fuad Tabba Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, qperret@google.com, vdonnefort@google.com Subject: Re: [PATCH v1 00/13] KVM: arm64: Refactor user_mem_abort() into a state-object model In-Reply-To: <20260306140232.2193802-1-tabba@google.com> References: <20260306140232.2193802-1-tabba@google.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: tabba@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, qperret@google.com, vdonnefort@google.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20260306_073419_040579_8A003033 X-CRM114-Status: GOOD ( 28.22 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Fuad, On Fri, 06 Mar 2026 14:02:19 +0000, Fuad Tabba wrote: > > As promised in my recent patch series fixing a couple of urgent bugs in > user_mem_abort() [1], here is the actual refactoring to finally clean up this > monolith. > > If you look through the Fixes: history of user_mem_abort(), you will start to > see a very clear pattern of whack-a-mole caused by the sheer size and > complexity of the function. For example: > - We keep leaking struct page references on early error returns because the > cleanup logic is hard to track (e.g., 5f9466b50c1b and the atomic fault leak > I just fixed in the previous series). > - We have had uninitialized memcache pointers (157dbc4a321f) because the > initialization flow jumps around unpredictably. > - We have had subtle TOCTOU and locking boundary bugs (like 13ec9308a857 and > f587661f21eb) because we drop the mmap_read_lock midway through the function > but leave the vma pointer and mmu_seq floating around in the same lexical > scope, tempting people to use them. > > The bulk of the work is in the first 6 patches, which perform a strict, > no-logic-change structural refactoring of user_mem_abort() into a clean, > sequential dispatcher. > > We introduce a state object, struct kvm_s2_fault, which encapsulates > both the input parameters and the intermediate state. Then, > user_mem_abort() is broken down into focused, standalone helpers: > - kvm_s2_resolve_vma_size(): Determines the VMA shift and page size. > - kvm_s2_fault_pin_pfn(): Handles faulting in the physical page. > - kvm_s2_fault_get_vma_info(): A tightly-scoped sub-helper that isolates the > mmap_read_lock, VMA lookup, and metadata snapshotting. > - kvm_s2_fault_compute_prot(): Computes stage-2 protections and evaluates > permission/execution constraints. > - kvm_s2_fault_map(): Manages the KVM MMU lock, mmu_seq retry loops, MTE, and > the final stage-2 mapping. > > This structural change makes the "danger zone" foolproof. By isolating > the mmap_read_lock region inside a tightly-scoped sub-helper > (kvm_s2_fault_get_vma_info), the vma pointer is confined. It snapshots > the required metadata into the kvm_s2_fault structure before dropping > the lock. Because the pointers scope ends when the sub-helper returns, > accessing a stale VMA in the mapping phase is not possible by design. > > The remaining patches in are localized cleanup patches. With the logic > finally extracted into digestible helpers, these patches take the > opportunity to streamline struct initialization, drop redundant struct > variables, simplify nested math, and hoist validation checks (like MTE) > out of the lock-heavy mapping phase. > > I think that there are still more opportunities to tidy things up some > more, but I'll stop here to see what you think. Thanks a lot for going through this, this looks like a very valuable starting point. From a high-level perspective, the end result is even more shocking: static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, struct kvm_s2_trans *nested, struct kvm_memory_slot *memslot, unsigned long hva, bool fault_is_perm) { bool write_fault = kvm_is_write_fault(vcpu); bool logging_active = memslot_is_logging(memslot); struct kvm_s2_fault fault = { .vcpu = vcpu, .fault_ipa = fault_ipa, .nested = nested, .memslot = memslot, .hva = hva, .fault_is_perm = fault_is_perm, .ipa = fault_ipa, .logging_active = logging_active, .force_pte = logging_active, .prot = KVM_PGTABLE_PROT_R, .fault_granule = fault_is_perm ? kvm_vcpu_trap_get_perm_fault_granule(vcpu) : 0, .write_fault = write_fault, .exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu), .topup_memcache = !fault_is_perm || (logging_active && write_fault), }; This kvm_s2_fault structure is a mess of architectural state (the fault itself), hypervisor context (logging_active, force_pte...), and implementation details (topup_memcache...). Effectively, that's the container for *everything* that was a variable in u_m_a() is now part of the structure that is passed around. The task at hand is now to completely nuke this monster. The architectural fault information should be further split, and directly passed as a parameter to user_mem_abort(). We have a lot of redundant state that is computed, recomputed, derived, and that should probably moved into the exact place where it matters. I'll have a play with it, Thanks, M. -- Without deviation from the norm, progress is not possible.