From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 641A0C3DA41 for ; Thu, 11 Jul 2024 08:05:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Subject:Cc:To:From:Message-ID:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=vP+cLt+tWdj2dRxQm5IhBkeJH362Gjp6y22ud7n1Nus=; b=XQDw/OsyygEC3sBmnJk8R0iCWM 4jflOzNYUTbAPYD8/4L98nkVahqr4QyJC8NSWvh33wXdT9/DIwnc2crJakvgsBdV8aQzjeDv4eHGV MOq3VVnZPAxKklTbxhy7MhBqjeyGKCDqf+QOkuIpSpiInzvMspj+p7iVBjm380AvXoQbJ9WyhqAz7 IRkYaTNVB9WF1S3QLW6rxiLl84ass11UXTwYc2DrIyL25yNFpcln1XsDIlW7OZc0Sd6pu0cNzAQWN jsMNidaUVM8qHmJjCBqGnWfDKUwv4ORtsUqeqxfFoYM7I45YlY4RniA2UZH+fSAWuuK+tWIoEcUOh Mxb4/ndA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRoo6-0000000D6jK-43BP; Thu, 11 Jul 2024 08:05:38 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRoni-0000000D6dF-3aFD for linux-arm-kernel@lists.infradead.org; Thu, 11 Jul 2024 08:05:16 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 3E85460A72; Thu, 11 Jul 2024 08:05:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E0A8DC116B1; Thu, 11 Jul 2024 08:05:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1720685112; bh=sbqi0Hg1n6WstEJ+OFGkFj8Pi72kzoWmZTENgIKOA5g=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ldUams3TQMAlTgIHz9toaJcGVejfS3b4qtDHZMCrKqcFiZMhI0lltswmtauLH+bcs hGNRZWHrmtoUI7rK4s5UdSkK2onkKiUQ13Z1UhvopI+lZhVbGRxZ/BNvmJTh0Fc1JL yYqNHvscssQkge6EJzMGdLDADsauou9BYfxZ8DNoCBIf6F2YfZZ+XudT5uh8ecWu+d YkJtR72akE9KVS7E/JOmE+t89FAFqKRKRNK/nG/cojaBbvbW8b83sgoEr9IZwzQfAD TE/M2g+K6QSlz8l/1LtUR4eO/fnUo4+WxR6SANZkgbc9Ff+VuSvFz+6GsMZLGHS26U 5G+T6wCEO4ttw== Received: from ip-185-104-136-29.ptr.icomera.net ([185.104.136.29] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sRone-00BSER-8W; Thu, 11 Jul 2024 09:05:10 +0100 Date: Thu, 11 Jul 2024 09:05:02 +0100 Message-ID: <875xtcpd5d.wl-maz@kernel.org> From: Marc Zyngier To: Alexandru Elisei Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, James Morse , Suzuki K Poulose , Oliver Upton , Zenghui Yu , Joey Gouly Subject: Re: [PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation In-Reply-To: References: <20240625133508.259829-1-maz@kernel.org> <20240708165800.1220065-1-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.104.136.29 X-SA-Exim-Rcpt-To: alexandru.elisei@arm.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, james.morse@arm.com, suzuki.poulose@arm.com, oliver.upton@linux.dev, yuzenghui@huawei.com, joey.gouly@arm.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240711_010515_129049_5E2FD10B X-CRM114-Status: GOOD ( 39.56 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Wed, 10 Jul 2024 16:12:53 +0100, Alexandru Elisei wrote: > > Hi Marc, > > On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote: > > In order to plug the brokenness of our current AT implementation, > > we need a SW walker that is going to... err.. walk the S1 tables > > and tell us what it finds. > > > > Of course, it builds on top of our S2 walker, and share similar > > concepts. The beauty of it is that since it uses kvm_read_guest(), > > it is able to bring back pages that have been otherwise evicted. > > > > This is then plugged in the two AT S1 emulation functions as > > a "slow path" fallback. I'm not sure it is that slow, but hey. > > > > Signed-off-by: Marc Zyngier > > [..] > > @@ -331,18 +801,17 @@ void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > > } > > > > if (!fail) > > - par = read_sysreg(par_el1); > > + par = read_sysreg_par(); > > else > > par = SYS_PAR_EL1_F; > > > > + retry_slow = !fail; > > + > > vcpu_write_sys_reg(vcpu, par, PAR_EL1); > > > > /* > > - * Failed? let's leave the building now. > > - * > > - * FIXME: how about a failed translation because the shadow S2 > > - * wasn't populated? We may need to perform a SW PTW, > > - * populating our shadow S2 and retry the instruction. > > + * Failed? let's leave the building now, unless we retry on > > + * the slow path. > > */ > > if (par & SYS_PAR_EL1_F) > > goto nopan; > > This is what follows after the 'if' statement above, and before the 'switch' > below: > > /* No PAN? No problem. */ > if (!(*vcpu_cpsr(vcpu) & PSR_PAN_BIT)) > goto nopan; > > When KVM is executing this statement, the following is true: > > 1. SYS_PAR_EL1_F is clear => the hardware translation table walk was successful. > 2. retry_slow = true; > > Then if the PAN bit is not set, the function jumps to the nopan label, and > performs a software translation table walk, even though the hardware walk > performed by AT was successful. Hmmm. Are you being polite and trying to avoid saying that this code is broken and that I should look for a retirement home instead? There, I've said it for you! ;-) The more I stare at this code, the more I hate it. Trying to interleave the replay condition with the many potential failure modes of the HW walker feels completely wrong, and I feel that I'd better split the whole thing in two: void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) { __kvm_at_s1e01_hw(vcpu, vaddr); if (vcpu_read_sys_reg(vcpu, PAR_EL1) & SYS_PAR_F) __kvm_at_s1e01_sw(vcpu, vaddr); } and completely stop messing with things. This is AT S1 we're talking about, not something that has any sort of high-frequency. Apart for Xen. But as Butch said: "Xen's dead, baby. Xen's dead.". > > > @@ -354,29 +823,58 @@ void __kvm_at_s1e01(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > > switch (op) { > > case OP_AT_S1E1RP: > > case OP_AT_S1E1WP: > > + retry_slow = false; > > fail = check_at_pan(vcpu, vaddr, &par); > > break; > > default: > > goto nopan; > > } > > > > + if (fail) { > > + vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1); > > + goto nopan; > > + } > > + > > /* > > * If the EL0 translation has succeeded, we need to pretend > > * the AT operation has failed, as the PAN setting forbids > > * such a translation. > > - * > > - * FIXME: we hardcode a Level-3 permission fault. We really > > - * should return the real fault level. > > */ > > - if (fail || !(par & SYS_PAR_EL1_F)) > > - vcpu_write_sys_reg(vcpu, (0xf << 1) | SYS_PAR_EL1_F, PAR_EL1); > > - > > + if (par & SYS_PAR_EL1_F) { > > + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par); > > + > > + /* > > + * If we get something other than a permission fault, we > > + * need to retry, as we're likely to have missed in the PTs. > > + */ > > + if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM) > > + retry_slow = true; > > Shouldn't VCPU's PAR_EL1 register be updated here? As far as I can tell, at this > point the VCPU PAR_EL1 register has the result from the successful walk > performed by AT S1E1R or AT S1E1W in the first 'switch' statement. Yup, yet another sign that this flow is broken. I'll apply my last few grey cells to it, and hopefully the next iteration will be a bit better. Thanks a lot for having a look! M. -- Without deviation from the norm, progress is not possible.