From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 8273DC54798 for ; Tue, 5 Mar 2024 15:03:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=2cXCjemsNxkuhQllUOYoQaidNX2UYbVkbHguUyMsW5M=; b=x5LAt/XnNm9o92 7txThoMyAtVkIqk6fj4j+xyWz93HZoyl7PvbFYtXT65K/MSWedPhlEJie8Obwl+42+G1+sEZg+biV uuQ58ZhBCRFVn5OkF7Gy7IbCinOg+pLhR1fOfuPkIU56Bk+WaatLiOfeeJ5gZ+T/m4QRKm/Y2t41z FjbIh1HYl8FYSbpJg5vJVp+/7pz68M4wWuRuXXl1S88qqwXF51yfINKJfDA6lXXmWoOEiEl2jG2sz mp98wd62QJWSreurVpad/ZZSLrASR03Q94SyQ9CDKMfkg/IDwe5DIjuoCPfdimUPqgfSaKIMP//ex hznqg18yLWn15JPFpMEw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rhWKH-0000000EBji-2Kt6; Tue, 05 Mar 2024 15:03:29 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rhWKD-0000000EBii-0pvx for linux-arm-kernel@lists.infradead.org; Tue, 05 Mar 2024 15:03:27 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id B022E615DE; Tue, 5 Mar 2024 15:03:23 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D76CC433C7; Tue, 5 Mar 2024 15:03:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1709651003; bh=2mdrOFobw+z2H9C4Cm3HCZJiFsJR2HEgKEYWYzKiOyM=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=EOdiagnMhEDSmA3O9ywwmSdokjgZN8utfeB3NA2rvO4XM2fZytggJWGVbeVdWfNbM KYnFa1+ZhMX+lyVz1LDw0RZiwYrtGz/DZN+UFYgTntbU7nE+HyKnevnVTcvb+vBFc9 sf9VPCdNF69Yx15CHzNkiVh902i2pTJXQcbvYDekM5EQl6PdsiRIB0uyvVkW+FhJPB FhFOXhmbYi+DBLo/buQxDbSpk2sKOInL3Qhxn4NCyaUHM0eS/MoO4cj15glXgZ6Ywy oYQpWgskIHLmNczBFFzU0+HSMVMX+cLxYzrC+Ah/msBaHXLCS4t9VWziUXcjCpvesN rcKA5TZ3h6JcA== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rhWK8-009cCM-QI; Tue, 05 Mar 2024 15:03:20 +0000 Date: Tue, 05 Mar 2024 15:03:20 +0000 Message-ID: <86r0go201z.wl-maz@kernel.org> From: Marc Zyngier To: Ganapatrao Kulkarni Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, darren@os.amperecomputing.com, d.scott.phillips@amperecomputing.com Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables. In-Reply-To: <6685c3a6-2017-4bc2-ad26-d11949097050@os.amperecomputing.com> References: <20240305054606.13261-1-gankulkarni@os.amperecomputing.com> <86sf150w4t.wl-maz@kernel.org> <6685c3a6-2017-4bc2-ad26-d11949097050@os.amperecomputing.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, darren@os.amperecomputing.com, d.scott.phillips@amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240305_070325_596774_A5B43CAA X-CRM114-Status: GOOD ( 58.55 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 05 Mar 2024 13:29:08 +0000, Ganapatrao Kulkarni wrote: > > > > On 05-03-2024 04:43 pm, Marc Zyngier wrote: > > [re-sending with kvmarm@ fixed] > > > > On Tue, 05 Mar 2024 05:46:06 +0000, > > Ganapatrao Kulkarni wrote: > >> > >> As per 'commit 178a6915434c ("KVM: arm64: nv: Unmap/flush shadow stage 2 > > > > $ git describe --contains 178a6915434c --match=v\* > > fatal: cannot describe '178a6915434c141edefd116b8da3d55555ea3e63' > > > > My bad(I would have been more verbose), I missed to mention that this > patch is on top of NV-V11 patch series. > > > This commit simply doesn't exist upstream. It only lives in a > > now deprecated branch that will never be merged. > > > >> page tables")', when ever there is unmap of pages that > >> are mapped to L1, they are invalidated from both L1 S2-MMU and from > >> all the active shadow/L2 S2-MMU tables. Since there is no mapping > >> to invalidate the IPAs of Shadow S2 to a page, there is a complete > >> S2-MMU page table walk and invalidation is done covering complete > >> address space allocated to a L2. This has performance impacts and > >> even soft lockup for NV(L1 and L2) boots with higher number of > >> CPUs and large Memory. > >> > >> Adding a lookup table of mapping of Shadow IPA to Canonical IPA > >> whenever a page is mapped to any of the L2. While any page is > >> unmaped, this lookup is helpful to unmap only if it is mapped in > >> any of the shadow S2-MMU tables. Hence avoids unnecessary long > >> iterations of S2-MMU table walk-through and invalidation for the > >> complete address space. > > > > All of this falls in the "premature optimisation" bucket. Why should > > we bother with any of this when not even 'AT S1' works correctly, > > Hmm, I am not aware of this, is this something new issue of V11? it's been there since v0. All we have is a trivial implementation that doesn't survive the S1 page-tables being swapped out. It requires a full S1 PTW to be written. > > > making it trivial to prevent a guest from making forward progress? You > > also show no numbers that would hint at a measurable improvement under > > any particular workload. > > This patch is avoiding long iterations of unmap which was resulting in > soft-lockup, when tried L1 and L2 with 192 cores. > Fixing soft lockup isn't a required fix for feature enablement? No. All we care is correctness, not performance. Addressing soft-lockups is *definitely* a performance issue, which I'm 100% happy to ignore. [...] > >> +static inline bool kvm_is_l1_using_shadow_s2(struct kvm_vcpu > >> *vcpu) > >> +{ > >> + return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu); > >> +} > > > > Isn't that the very definition of "!in_hyp_ctxt()"? You are abusing > > "!in_hyp_ctxt()" isn't true for non-NV case also? Surely you don't try to use this in non-NV contexts, right? Why would you try to populate a shadow reverse-map outside of a NV context? > This function added to know that L1 is NV enabled and using shadow S2. > > > the hw_mmu pointer to derive something, but the source of truth is the > > translation regime, as defined by HCR_EL2.{E2H,TGE} and PSTATE.M. > > > > OK, I can try HCR_EL2.{E2H,TGE} and PSTATE.M instead of hw_mmu in next > version. No. Use is_hyp_ctxt(). [...] > >> index 61bdd8798f83..3948681426a0 100644 > >> --- a/arch/arm64/kvm/mmu.c > >> +++ b/arch/arm64/kvm/mmu.c > >> @@ -1695,6 +1695,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > >> memcache, > >> KVM_PGTABLE_WALK_HANDLE_FAULT | > >> KVM_PGTABLE_WALK_SHARED); > >> + if ((nested || kvm_is_l1_using_shadow_s2(vcpu)) && !ret) { > > > > I don't understand this condition. If nested is non-NULL, it's because > > we're using a shadow S2. So why the additional condition? > > No, nested is set only for L2, for L1 it is not. > To handle L1 shadow S2 case, I have added this condition. But there is *no shadow* for L1 at all. The only way to get a shadow is to be outside of the EL2(&0) translation regime. El2(&0) itself is always backed by the canonical S2. By definition, L1 does not run with a S2 it is in control of. No S2, no shadow. [...] > > What guarantees that the mapping you have for L1 has the same starting > > address as the one you have for L2? L1 could have a 2MB mapping and L2 > > only 4kB *in the middle*. > > IIUC, when a page is mapped to 2MB in L1, it won't be > mapped to L2 and we iterate with the step of PAGE_SIZE and we should > be hitting the L2's IPA in lookup table, provided the L2 page falls in > unmap range. But then how do you handle the reverse (4kB at L1, 2MB at L2)? Without tracking of the *intersection*, this fails to be correctly detected. This is TLB matching 101. [...] > >> + while (start < end) { > >> + size = PAGE_SIZE; > >> + /* > >> + * get the Shadow IPA if the page is mapped > >> + * to L1 and also mapped to any of active L2. > >> + */ > > > > Why is L1 relevant here? > > We do map while L1 boots(early stage) in shadow S2, at that moment > if the L1 mapped page is unmapped/migrated we do need to unmap from > L1's S2 table also. Sure. But you can also get a page that is mapped in L2 and not mapped in the canonical S2, which is L1's. I more and more feel that you have a certain misconception of how L1 gets its pages mapped. > > > > >> + ret = get_shadow_ipa(mmu, start, &shadow_ipa, &size); > >> + if (ret) > >> + kvm_unmap_stage2_range(mmu, shadow_ipa, size); > >> + start += size; > >> + } > >> + } > >> + } > >> +} > >> + > >> /* expects kvm->mmu_lock to be held */ > >> void kvm_nested_s2_flush(struct kvm *kvm) > >> { > > > > There are a bunch of worrying issues with this patch. But more > > importantly, this looks like a waste of effort until the core issues > > that NV still has are solved, and I will not consider anything of the > > sort until then. > > OK thanks for letting us know, I will pause the work on V2 of this > patch until then. > > > > > I get the ugly feeling that you are trying to make it look as if it > > was "production ready", which it won't be for another few years, > > specially if the few interested people (such as you) are ignoring the > > core issues in favour of marketing driven features ("make it fast"). > > > > What are the core issues (please forgive me if you mentioned already)? > certainly we will prioritise them than this. AT is a big one. Maintenance interrupts are more or less broken. I'm slowly plugging PAuth, but there's no testing whatsoever (running Linux doesn't count). Lack of SVE support is also definitely a blocker. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel