From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5785AC4332F for ; Sat, 24 Dec 2022 15:27:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=U52MypIJOsczlBJW3ciySxQ1KYdEuS7U6ZGwtD0dt8M=; b=OKvoVVqfw83hr5 MNpGeM2hLBd4KQZd7+3beYIs5UX1Jiil2q0qvYPchRP3GaQLArdDkWN9qs2+MHN39pxQds6LYlD4m JsNpPUvYJyw+2tLbBXeobUlGK5aroy4MboW427Cf7xtsy94S+hFtOcLclZyXRNKusifWd2Rj5F7N7 0xhZg9gQOv2HCTwNhqaMsBl50UTdxEopwpMzFEtVsy1u3vISE4G1VQ2BQ9IEARH2K0avNHsCC93t5 4J5dGa/MxFsqVowx2BJAE9dW8tPl+ndmWgnWYfP/gVGnJOv2DHpWOvlxye7OZNNyQNtMnceyWmrBN J/3VK3gpiHe/tkbRPvXg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p96Os-0014fB-QU; Sat, 24 Dec 2022 15:25:29 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1p93Ct-00H9Wh-Tj for linux-arm-kernel@lists.infradead.org; Sat, 24 Dec 2022 12:00:53 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 67B17B82173; Sat, 24 Dec 2022 12:00:46 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 093F5C433D2; Sat, 24 Dec 2022 12:00:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1671883245; bh=m2EiFEyisQ9bmq/xgqzbRGVb+gogXJXymw1uS+HVbo4=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=JbF9kr9jYH1y/IzjuATj80ZAE6QwqcRuvDkqjjwbrEHnHKZyLzaT5RsRODWlThPIF a3hsfQK7uMzvFrpIcapk330TcJ1ElBElbnTE3N9IdaQ/0S5rrfaC9hGhpo8EWItBot RkLcFd9fZ4WHeJS9XfVZRgfJtK97E3K98j/+oK0PiA3ufVsshd1CczVjwWNorkBkse /yJsyA+BRFvaxsFfgyrwJGqqFAde1ZfS9Od944deiTwLwJ+HpcxVxfQjjQ3So+77Yt R1/0q6IE0Xmzhj5mX7OCfG5frOdG+ro/AVffjLixpUzO3yn71I0sufQ6rAfI5lTubL z2nVa8jP2FzPQ== Received: from host81-132-227-111.range81-132.btcentralplus.com ([81.132.227.111] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1p93Ck-00EuJg-L8; Sat, 24 Dec 2022 12:00:42 +0000 Date: Sat, 24 Dec 2022 11:59:19 +0000 Message-ID: <878rixf1wo.wl-maz@kernel.org> From: Marc Zyngier To: Oliver Upton Cc: Ricardo Koller , kvmarm@lists.cs.columbia.edu, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Will Deacon Subject: Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write In-Reply-To: References: <20221220200923.1532710-1-maz@kernel.org> <20221220200923.1532710-3-maz@kernel.org> <86ili3byn8.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 81.132.227.111 X-SA-Exim-Rcpt-To: oliver.upton@linux.dev, ricarkol@google.com, kvmarm@lists.cs.columbia.edu, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, will@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221224_040052_291479_49B86EDE X-CRM114-Status: GOOD ( 46.18 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Thu, 22 Dec 2022 20:58:40 +0000, Oliver Upton wrote: > > On Thu, Dec 22, 2022 at 09:01:15AM +0000, Marc Zyngier wrote: > > On Wed, 21 Dec 2022 17:46:24 +0000, Oliver Upton wrote: > > > - When UFFD is in use, translation faults are reported to userspace as > > > writes when from a RW memslot and reads when from an RO memslot. > > > > Not quite: translation faults are reported as reads if TCR_EL1.HA > > isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment, > > this matches exactly the behaviour of the page-table walker, which > > will update the S1 PTs only if this bit is set. > > My bad, yes you're right. I conflated the use case here with the > architectural state. > > I'm probably being way too pedantic, but I just wanted to make sure we > agree about the ensuing subtlety. More below: > > > Or is it what userfaultfd does on its own? That'd be confusing... > > > > > > > > - S1 page table memory is spuriously marked as dirty, as we presume a > > > write immediately follows the translation fault. That isn't entirely > > > senseless, as it would mean both the target page and the S1 PT that > > > maps it are both old. This is nothing new I suppose, just weird. > > > > s/old/young/ ? > > > > I think you're confusing the PT access with the access that caused the > > PT access (I'll have that printed on a t-shirt, thank you very much). > > I'd buy it! > > > Here, we're not considering the cause of the PT access anymore. If > > TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a > > read, and only that page. > > I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests > a write could possibly follow, but I don't think it requires it. The > page table walker must first load the S1 PTE before writing to it. Ah, you're talking of the write to the PTE. Too many writes! My reasoning is based on Rule LFTXR in DDI0487I.a, which says: "When the PE performs a hardware update of the AF, it sets the AF to 1 in the corresponding descriptor in memory, in a coherent manner, using an atomic read-modify-write of that descriptor." An atomic-or operation fits this description, and I cannot see anything in the architecture that would prevent the write of a PTE even if AF is already set, such as mandating something like a test-and-set or compare-and-swap. I'm not saying this is the only possible implementation, or even a good one. But I don't think this is incompatible with what the architecture mandates. > > From AArch64.S1Translate() (DDI0487H.a): > > (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, walkparams, va, regime, > ss, acctype, iswrite, ispriv); > > [...] > > new_desc = descriptor; > if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then > // Set descriptor AF bit > new_desc<10> = '1'; > > [...] > > // Either the access flag was clear or AP<2> is set > if new_desc != descriptor then > if regime == Regime_EL10 && EL2Enabled() then > s1aarch64 = TRUE; > s2fs1walk = TRUE; > aligned = TRUE; > iswrite = TRUE; > (s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, s1aarch64, > ss, s2fs1walk, AccType_ATOMICRW, > aligned, iswrite, ispriv); > > if s2fault.statuscode != Fault_None then > return (s2fault, AddressDescriptor UNKNOWN); > else > descupdateaddress = descaddress; > > (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc, > walkparams.ee, descupdateaddress) > > Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the > descriptor. The second stage-2 walk for write is conditioned on having > already fetched the stage-1 descriptor and determining the AF needs > to be set. The question is whether this is one possible implementation, or the only possible implementation. My bet is on the former. > Relating back to UFFD: if we expect KVM to do exactly what hardware > does, UFFD should see an attempted read when the first walk fails > because of an S2 translation fault. Based on this patch, though, we'd > promote it to a write if TCR_EL1.HA == 1. > > This has the additional nuance of marking the S1 PT's IPA as dirty, even > though it might not actually have been written to. Having said that, > the false positive rate should be negligible given that S1 PTs ought to > account for a small amount of guest memory. > > Like I said before, I'm probably being unnecessarily pedantic :) It just > seems to me that the view we're giving userspace of S1PTW aborts isn't > exactly architectural and I want to make sure that is explicitly > intentional. I think it is perfectly fine to be pedantic about these things, because they really matter. I still think that this change doesn't violate the architecture. But at the same time, I see some value in strictly following what the HW does, specially given that this only optimises the case where: - S1 PTs do not have a pre-existing S2 mapping (either placed there by the VMM, or swapped out) - TCR_EL1.HA==1 which is such a corner case that nobody will loose any sleep over it (and I'll buy beer to anyone who can come up with a real workload where this optimisation actually matters). So my suggestion is to drop the change altogether, and stick with the original fix. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel