From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 9B8DDC3DA41 for ; Thu, 11 Jul 2024 10:56:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=6FdXjw4m+dAqGj6VGZ3rHFtfzhk1aq68omspPvujJ7U=; b=yJk8rf0+4FMEP3tdIf/DHQoHk4 4zg+WWMUiJ8Q3WOPEeV/LZd/4nHvGTCIEO+XUJzYHmbTaYjwrxh0JIERZuKEPJEugl1Bk64avP5/6 KUN7N+o+CbjlSl1HIF7WOrVyEmOFwQqVZcBOmLY0ei//nmnsOPz59SJ5lIuHeevL9nbORTlKMIUYN v47PsGbGMaKhfABaQ3Y2GlYoZSpgnfB6uzwkrR49OxGyg1GM9IMMUcZe7IpmV0HeV4Pu5yr+OnsTe Rvz7yF9DnFuu+5YLESJkd9csq7tdrFb8qJiY9CEwafzOMWHfQ4kitycfZdtmFBF2luQUhusi3ZA8T XjB7cDHw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRrTZ-0000000Ddcr-1AWc; Thu, 11 Jul 2024 10:56:37 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sRrTG-0000000DdW4-3cgq for linux-arm-kernel@lists.infradead.org; Thu, 11 Jul 2024 10:56:21 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D2A7C1007; Thu, 11 Jul 2024 03:56:42 -0700 (PDT) Received: from arm.com (e121798.manchester.arm.com [10.32.101.22]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 350073F762; Thu, 11 Jul 2024 03:56:16 -0700 (PDT) Date: Thu, 11 Jul 2024 11:56:13 +0100 From: Alexandru Elisei To: Marc Zyngier Cc: kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org, James Morse , Suzuki K Poulose , Oliver Upton , Zenghui Yu , Joey Gouly Subject: Re: [PATCH 10/12] KVM: arm64: nv: Add SW walker for AT S1 emulation Message-ID: References: <20240625133508.259829-1-maz@kernel.org> <20240708165800.1220065-1-maz@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240708165800.1220065-1-maz@kernel.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240711_035619_027798_AD98FEFE X-CRM114-Status: GOOD ( 27.83 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi, On Mon, Jul 08, 2024 at 05:57:58PM +0100, Marc Zyngier wrote: > In order to plug the brokenness of our current AT implementation, > we need a SW walker that is going to... err.. walk the S1 tables > and tell us what it finds. > > Of course, it builds on top of our S2 walker, and share similar > concepts. The beauty of it is that since it uses kvm_read_guest(), > it is able to bring back pages that have been otherwise evicted. > > This is then plugged in the two AT S1 emulation functions as > a "slow path" fallback. I'm not sure it is that slow, but hey. > [..] > switch (op) { > case OP_AT_S1E1RP: > case OP_AT_S1E1WP: > + retry_slow = false; > fail = check_at_pan(vcpu, vaddr, &par); > break; > default: > goto nopan; > } For context, this is what check_at_pan() does: static int check_at_pan(struct kvm_vcpu *vcpu, u64 vaddr, u64 *res) { u64 par_e0; int error; /* * For PAN-involved AT operations, perform the same translation, * using EL0 this time. Twice. Much fun. */ error = __kvm_at(OP_AT_S1E0R, vaddr); if (error) return error; par_e0 = read_sysreg_par(); if (!(par_e0 & SYS_PAR_EL1_F)) goto out; error = __kvm_at(OP_AT_S1E0W, vaddr); if (error) return error; par_e0 = read_sysreg_par(); out: *res = par_e0; return 0; } I'm having a hard time understanding why KVM is doing both AT S1E0R and AT S1E0W regardless of the type of the access (read/write) in the PAN-aware AT instruction. Would you mind elaborating on that? > + if (fail) { > + vcpu_write_sys_reg(vcpu, SYS_PAR_EL1_F, PAR_EL1); > + goto nopan; > + } > [..] > + if (par & SYS_PAR_EL1_F) { > + u8 fst = FIELD_GET(SYS_PAR_EL1_FST, par); > + > + /* > + * If we get something other than a permission fault, we > + * need to retry, as we're likely to have missed in the PTs. > + */ > + if ((fst & ESR_ELx_FSC_TYPE) != ESR_ELx_FSC_PERM) > + retry_slow = true; > + } else { > + /* > + * The EL0 access succeded, but we don't have the full > + * syndrom information to synthetize the failure. Go slow. > + */ > + retry_slow = true; > + } This is what PSTATE.PAN controls: If the Effective value of PSTATE.PAN is 1, then a privileged data access from any of the following Exception levels to a virtual memory address that is accessible to data accesses at EL0 generates a stage 1 Permission fault: - A privileged data access from EL1. - If HCR_EL2.E2H is 1, then a privileged data access from EL2. With that in mind, I am really struggling to understand the logic. If AT S1E0{R,W} (from check_at_pan()) failed, doesn't that mean that the virtual memory address is not accessible to EL0? Add that to the fact that the AT S1E1{R,W} (from the beginning of __kvm_at_s1e01()) succeeded, doesn't that mean that AT S1E1{R,W}P should succeed, and furthermore the PAR_EL1 value should be the one KVM got from AT S1E1{R,W}? Thanks, Alex > nopan: > __mmu_config_restore(&config); > out: > local_irq_restore(flags); > > write_unlock(&vcpu->kvm->mmu_lock); > + > + /* > + * If retry_slow is true, then we either are missing shadow S2 > + * entries, have paged out guest S1, or something is inconsistent. > + * > + * Either way, we need to walk the PTs by hand so that we can either > + * fault things back, in or record accurate fault information along > + * the way. > + */ > + if (retry_slow) { > + par = handle_at_slow(vcpu, op, vaddr); > + vcpu_write_sys_reg(vcpu, par, PAR_EL1); > + } > } > > void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > @@ -433,6 +931,10 @@ void __kvm_at_s1e2(struct kvm_vcpu *vcpu, u32 op, u64 vaddr) > > write_unlock(&vcpu->kvm->mmu_lock); > > + /* We failed the translation, let's replay it in slow motion */ > + if (!fail && (par & SYS_PAR_EL1_F)) > + par = handle_at_slow(vcpu, op, vaddr); > + > vcpu_write_sys_reg(vcpu, par, PAR_EL1); > } > > -- > 2.39.2 > >