From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68F095F86F; Tue, 23 Jan 2024 14:26:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706019976; cv=none; b=BPeFnPD3puy94e91FEWSYymB50pWXoI25zsLpcHkCQFkNiQEdn6l6TO7ENdN3v9xpK1R+Js8KqhWBD8DJq3Pxam1y2G/8oQYFhshg3NaoKPjA3fc2j32a6XfHBCzBwIMGyc7YVnT5DQhrv9axkHi9t2eRfiGouiBJVfG4kA1TXU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706019976; c=relaxed/simple; bh=XzMv801rPyyxdys6wgkOUwUyiZT+naegA+B9M/CmMKg=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=kbZqgPHgmDl+dQMgs5JbE3QS5o8+2dtCppfKmYJBUZetq/oSMnNDF/ClsKPnAUEQKjhHhQGlwGtviSrav9Ohd3FWwlgpK1FyKgayAegOO645ilmN8aZoZnW79sKM0CJRJP3bFsJEmceiFOFvLltkqDXpE9NEOC5GLrNmKslXYKs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DGy1LPY9; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DGy1LPY9" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D75EAC433F1; Tue, 23 Jan 2024 14:26:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706019975; bh=XzMv801rPyyxdys6wgkOUwUyiZT+naegA+B9M/CmMKg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=DGy1LPY9k13mP1CBLKtTrMt/cfZNH4AXqIXvvEu5epQU3KRr7r45Dy5tonG1MD7LE KmsuF9i8vQjheeQ0YsAJACawbJf61BExY0KBLgFQr0DRtjiAU4m61MSfzeajKDFaLG KJGxQa3POvZYqAnhwgjplaoXF56Cq+pCYkEFeexSJTLUuMd3nRinK9lFcmVExxZyfp K1sA/lq0RZicdXIyXdTBaGvs9NM0jsssopIErV7IsjCj8lvthPXQALNGNpj+nouaT6 nfdIIKSfJ6pqnbgVpsHeyfrInV1P+xvi6wbUe6oKI7jjLfbf17la3H2w+mtsCJ/720 mEYg58vROEMWw== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1rSHjB-00DzgN-Bw; Tue, 23 Jan 2024 14:26:13 +0000 Date: Tue, 23 Jan 2024 14:26:13 +0000 Message-ID: <86le8g86t6.wl-maz@kernel.org> From: Marc Zyngier To: Ganapatrao Kulkarni Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Alexandru Elisei , Andre Przywara , Chase Conklin , Christoffer Dall , Darren Hart , Jintack Lim , Russell King , Miguel Luis , James Morse , Suzuki K Poulose , Oliver Upton , Zenghui Yu , D Scott Phillips Subject: Re: [PATCH v11 17/43] KVM: arm64: nv: Support multiple nested Stage-2 mmu structures In-Reply-To: References: <20231120131027.854038-1-maz@kernel.org> <20231120131027.854038-18-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, alexandru.elisei@arm.com, andre.przywara@arm.com, chase.conklin@arm.com, christoffer.dall@arm.com, darren@os.amperecomputing.com, jintack@cs.columbia.edu, rmk+kernel@armlinux.org.uk, miguel.luis@oracle.com, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, scott@os.amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Hi Ganapatrao, On Tue, 23 Jan 2024 09:55:32 +0000, Ganapatrao Kulkarni wrote: > > Hi Marc, > > > +void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) > > +{ > > + if (is_hyp_ctxt(vcpu)) { > > + vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; > > + } else { > > + write_lock(&vcpu->kvm->mmu_lock); > > + vcpu->arch.hw_mmu = get_s2_mmu_nested(vcpu); > > + write_unlock(&vcpu->kvm->mmu_lock); > > + } > > Due to race, there is a non-existing L2's mmu table is getting loaded > for some of vCPU while booting L1(noticed with L1 boot using large > number of vCPUs). This is happening since at the early stage the > e2h(hyp-context) is not set and trap to eret of L1 boot-strap code > resulting in context switch as if it is returning to L2(guest enter) > and loading not initialized mmu table on those vCPUs resulting in > unrecoverable traps and aborts. I'm not sure I understand the problem you're describing here. What is the race exactly? Why isn't the shadow S2 good enough? Not having HCR_EL2.VM set doesn't mean we can use the same S2, as the TLBs are tagged by a different VMID, so staying on the canonical S2 seems wrong. My expectations are that the L1 ERET from EL2 to EL1 is trapped, and that we pick an empty S2 and start populating it. What fails in this process? > Adding code to check (below diff fixes the issue) stage 2 is enabled > and avoid the false ERET and continue with L1's mmu context. > > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c > index 340e2710cdda..1901dd19d770 100644 > --- a/arch/arm64/kvm/nested.c > +++ b/arch/arm64/kvm/nested.c > @@ -759,7 +759,12 @@ void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu) > > void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) > { > - if (is_hyp_ctxt(vcpu)) { > + bool nested_stage2_enabled = vcpu_read_sys_reg(vcpu, HCR_EL2) > & HCR_VM; > + > + /* Load L2 mmu only if nested_stage2_enabled, avoid mmu > + * load due to false ERET trap. > + */ > + if (is_hyp_ctxt(vcpu) || !nested_stage2_enabled) { > vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; > } else { > write_lock(&vcpu->kvm->mmu_lock); As I said above, this doesn't look right. > Hoping we dont hit this when we move completely NV2 based > implementation and e2h is always set? No, the same constraints apply. I don't see why this would change. Thanks, M. -- Without deviation from the norm, progress is not possible.