From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 93049C47DB3 for ; Wed, 31 Jan 2024 13:50:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=usto4XzFWJCmvQ//OcniWf1ze2E/s5jCJKs209rbXs0=; b=J8ldPPeKENSumI Z4PfZh3uKc7epS+6F5PMMdFCwM5cP2Tz2ZWjqLOsVMFNRWcvsYzaksz687rX+l4AFFCMZw/ZVHcBM 5TSti8iPknRatK0gZTIimc321wSRKLPQ73Yoi5W/+o3LFMUMvvfS/5nNp0w1aTXZZOuFFbDT191WU jChS04N6Nosc6OX3Nq3ylqtRWGIykCfhQrf/d4rnQh0roe94oet7uKXlD16lkxO25cEwRClSf8V5u 3UuOkelXr6Dk67lAwp6wqSLIeAn/0Lbhiav45fAFK59jfoM3HHAJNazzUNx96AYey9dsaAh8yW71O bXkf1SrX9NMCw36AzETw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1rVAzE-00000003i0l-1D92; Wed, 31 Jan 2024 13:50:44 +0000 Received: from sin.source.kernel.org ([2604:1380:40e1:4800::1]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1rVAz9-00000003hy0-1EZi for linux-arm-kernel@lists.infradead.org; Wed, 31 Jan 2024 13:50:42 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 5DA0ACE20ED; Wed, 31 Jan 2024 13:50:34 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D66D5C433F1; Wed, 31 Jan 2024 13:50:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706709029; bh=gDrDYiIKj2pNOm3RnEnHWThrb2793XbTCaCB63sQ+hw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=uPLkcnf3C1e1YK9ZTFbKTQ3lvdHvzdr6O+9KkN0xqsH0tgQS4mQePq7I0NCESTQgj H7uxDt9hiaXFT+l7+kzGBhHaCqn1hsc2XbcR0LGj2Pdj0kn+Ej4YWGmfbrfNtOBOFh rUVMYpDw8uemg0hqKEzsaedA0zUZoJ/um7ZlShyCk4D487VhksTZJGIswnR6pCczL4 YDZScjdovUOkF3hklOD+yTr2J6RUZhjg7iZ6GPnUIOgz60FPJ0SUH73+LiqA6QRXwc sxyYEBH3QgNqRU9azDY6nYWNaxey+VAEs7TSxPMjXQUkduDnWUl9NpvyVVjFjWl9bK TaWj5CBesglgg== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rVAyv-00GcQt-Te; Wed, 31 Jan 2024 13:50:26 +0000 Date: Wed, 31 Jan 2024 13:50:25 +0000 Message-ID: <86o7d17gta.wl-maz@kernel.org> From: Marc Zyngier To: Ganapatrao Kulkarni Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Alexandru Elisei , Andre Przywara , Chase Conklin , Christoffer Dall , Darren Hart , Jintack Lim , Russell King , Miguel Luis , James Morse , Suzuki K Poulose , Oliver Upton , Zenghui Yu , D Scott Phillips Subject: Re: [PATCH v11 17/43] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures In-Reply-To: <3f30ac3a-9226-45fe-9e72-49c26a9f4c97@os.amperecomputing.com> References: <20231120131027.854038-1-maz@kernel.org> <20231120131027.854038-18-maz@kernel.org> <86le8g86t6.wl-maz@kernel.org> <3b51d760-fd32-41b7-b142-5974fdf3e90e@os.amperecomputing.com> <868r4d94c9.wl-maz@kernel.org> <3f30ac3a-9226-45fe-9e72-49c26a9f4c97@os.amperecomputing.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, alexandru.elisei@arm.com, andre.przywara@arm.com, chase.conklin@arm.com, christoffer.dall@arm.com, darren@os.amperecomputing.com, jintack@cs.columbia.edu, rmk+kernel@armlinux.org.uk, miguel.luis@oracle.com, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, scott@os.amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240131_055039_749251_371413D9 X-CRM114-Status: GOOD ( 54.27 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, 31 Jan 2024 09:39:34 +0000, Ganapatrao Kulkarni wrote: > > > Hi Marc, > > On 25-01-2024 02:28 pm, Marc Zyngier wrote: > > On Thu, 25 Jan 2024 08:14:32 +0000, > > Ganapatrao Kulkarni wrote: > >> > >> > >> Hi Marc, > >> > >> On 23-01-2024 07:56 pm, Marc Zyngier wrote: > >>> Hi Ganapatrao, > >>> > >>> On Tue, 23 Jan 2024 09:55:32 +0000, > >>> Ganapatrao Kulkarni wrote: > >>>> > >>>> Hi Marc, > >>>> > >>>>> +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) > >>>>> +{ > >>>>> + if (is_hyp_ctxt(vcpu)) { > >>>>> + vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; > >>>>> + } else { > >>>>> + write_lock(&vcpu->kvm->mmu_lock); > >>>>> + vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu); > >>>>> + write_unlock(&vcpu->kvm->mmu_lock); > >>>>> + } > >>>> > >>>> Due to race, there is a non-existing L2's mmu table is getting loaded > >>>> for some of vCPU while booting L1(noticed with L1 boot using large > >>>> number of vCPUs). This is happening since at the early stage the > >>>> e2h(hyp-context) is not set and trap to eret of L1 boot-strap code > >>>> resulting in context switch as if it is returning to L2(guest enter) > >>>> and loading not initialized mmu table on those vCPUs resulting in > >>>> unrecoverable traps and aborts. > >>> > >>> I'm not sure I understand the problem you're describing here. > >>> > >> > >> IIUC, When the S2 fault happens, the faulted vCPU gets the pages from > >> qemu process and maps in S2 and copies the code to allocated > >> memory. Mean while other vCPUs which are in race to come online, when > >> they switches over to dummy S2 finds the mapping and returns to L1 and > >> subsequent execution does not fault instead fetches from memory where > >> no code exists yet(for some) and generates stage 1 instruction abort > >> and jumps to abort handler and even there is no code exist and keeps > >> aborting. This is happening on random vCPUs(no pattern). > > > > Why is that any different from the way we handle faults in the > > non-nested case? If there is a case where we can map the PTE at S2 > > before the data is available, this is a generic bug that can trigger > > irrespective of NV. > > > >> > >>> What is the race exactly? Why isn't the shadow S2 good enough? Not > >>> having HCR_EL2.VM set doesn't mean we can use the same S2, as the TLBs > >>> are tagged by a different VMID, so staying on the canonical S2 seems > >>> wrong. > >> > >> IMO, it is unnecessary to switch-over for first ERET while L1 is > >> booting and repeat the faults and page allocation which is anyway > >> dummy once L1 switches to E2H. > > > > It is mandated by the architecture. EL1 is, by definition, a different > > translation regime from EL2. So we *must* have a different S2, because > > that defines the boundaries of TLB creation and invalidation. The > > fact that these are the same pages is totally irrelevant. > > > >> Let L1 use its S2 always which is created by L0. Even we should > >> consider avoiding the entry created for L1 in array(first entry in the > >> array) of S2-MMUs and avoid unnecessary iteration/lookup while unmap > >> of NestedVMs. > > > > I'm sorry, but this is just wrong. You are merging the EL1 and EL2 > > translation regimes, which is not acceptable. > > > >> I am anticipating this unwanted switch-over wont happen when we have > >> NV2 only support in V12? > > > > V11 is already NV2 only, so I really don't get what you mean here. > > Everything stays the same, and there is nothing to change here. > > > > I am using still V10 since V11(also V12/nv-6.9-sr-enforcement) has > issues to boot with QEMU. Let's be clear: I have no interest in reports against a version that is older than the current one. If you still use V10, then congratulations, you are the maintainer of that version. > Tried V11 with my local branch of QEMU which > is 7.2 based and also with Eric's QEMU[1] which rebased on 8.2. The > issue is QEMU crashes at the very beginning itself. Not sure about the > issue and yet to debug. > > [1] https://github.com/eauger/qemu/tree/v8.2-nv I have already reported that QEMU was doing some horrible things behind the kernel's back, and I don't think it is working correctly. > > > What you describe looks like a terrible bug somewhere on the > > page-fault path that has the potential to impact non-NV, and I'd like > > to focus on that. > > I found the bug/issue and fixed it. > The problem was so random and was happening when tried booting L1 with > large cores(200 to 300+). > > I have implemented(yet to send to ML for review) to fix the > performance issue[2] due to unmapping of Shadow tables by implementing > the lookup table to unmap only the mapped Shadow IPAs instead of > unmapping complete Shadow S2 of all active NestedVMs. Again, this is irrelevant: - you develop against an unmaintained version - you waste time prematurely optimising code that is clearly advertised as throw-away > > This lookup table was not adding the mappings created for the L1 when > it is using the shadow S2-MMU(my bad, missed to notice that the L1 > hops between vEL2 and EL1 at the booting stage), hence when there is a > page migration, the unmap was not getting done for those pages and > resulting in access of stale pages/memory by the some of the VCPUs of > L1. > > I have modified the check while adding the Shadow-IPA to PA mapping to > a lookup table to check, is this page is getting mapped to NestedVMs > or to a L1 while it is using Shadow S2. > > [2] https://www.spinics.net/lists/kvm/msg326638.html Do I read it correctly that I wasted hours trying to reproduce something that only exists with on an obsolete series together with private patches? M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel