From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 98E48C25B7E for ; Tue, 4 Jun 2024 08:00:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=p7uucj1DKqPvzo3Zw4JvUYKljH8wHvJyUBpxaY5FsfY=; b=AxGTi6lzeSdvzT lOzoCDY5NqDC0jj21dNtb8Ed85CXmVepuE7+YMTYcepTcEFmXvZb4vHWXxGwrcJziaNCwTVluj0jW NOH8uI+R4RY2B1u2sd6AHqIPXA1vxo4JZ+u9zY9oFHl8jXs6La280SOLsYuCdnSV1mHl4F7Jro1ps +VGroxkhH6U3Bfr4brjupqbZpJN1ZGBsahxySLjLoPHEgaq2PY1p8obw7ZZHShpWRJYIvMqNrZ5h+ gpnWejerd+yRKGxmL2na8SRr0bAvxuDKpcMB+VlImANJ3lVJxxQObzc9Ekp4BlpYpGjjQz86BPuo8 Jc9eTJNsSv7dVJgPVUFQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux)) id 1sEP5O-00000001bfl-00CC; Tue, 04 Jun 2024 08:00:02 +0000 Received: from dfw.source.kernel.org ([139.178.84.217]) by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux)) id 1sEP5K-00000001bdV-2iWC for linux-arm-kernel@lists.infradead.org; Tue, 04 Jun 2024 08:00:00 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id E6A866121A; Tue, 4 Jun 2024 07:59:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9999FC2BBFC; Tue, 4 Jun 2024 07:59:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1717487995; bh=vdxauOmzSboHFnDF9buP0En7vyb3RHtdFdbq9LJy4mc=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=LtodnuPO2CS14uvj48kFDapRADhNtETdT+ND/y+TtmLXzQn4lPjTT3tm/Q/hKcoKA CvJ+G7kzzidAN4yghffCb6DJHqrR+YG7a5EHxqkq1DzBg2dljr94xAwmLwWQcHoqx5 hSqo95an3+3BSbOBxJxfCqLGtAE/SKmKXcTVO5nCuQgsqRIO0bwMPB6VtQa0WQDN7n +pjXqUASWMxrGcjy7DqPVnmz+PmonLjr3src34RmbtqgO66qbMpjvp/1fWWepraxIZ 3kD3m2FfJUr9+f06knLUYKEVlhiKLb9ZNj8znUMXERgeC6HCdtlihLFPEYid8WfdvM 96S/e775jkdTw== Received: from 82-132-216-203.dab.02.net ([82.132.216.203] helo=wait-a-minute.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1sEP5E-000Tse-LN; Tue, 04 Jun 2024 08:59:53 +0100 Date: Tue, 04 Jun 2024 08:59:49 +0100 Message-ID: <87frtt2l56.wl-maz@kernel.org> From: Marc Zyngier To: Oliver Upton Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, James Morse , Suzuki K Poulose , Zenghui Yu , Joey Gouly , Alexandru Elisei , Christoffer Dall , Ganapatrao Kulkarni Subject: Re: [PATCH v2 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information In-Reply-To: References: <20240529145628.3272630-1-maz@kernel.org> <20240529145628.3272630-14-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") X-SA-Exim-Connect-IP: 82.132.216.203 X-SA-Exim-Rcpt-To: oliver.upton@linux.dev, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, joey.gouly@arm.com, alexandru.elisei@arm.com, christoffer.dall@arm.com, gankulkarni@os.amperecomputing.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20240604_005958_844348_806E5839 X-CRM114-Status: GOOD ( 36.30 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Mon, 03 Jun 2024 19:36:57 +0100, Oliver Upton wrote: > > On Wed, May 29, 2024 at 03:56:25PM +0100, Marc Zyngier wrote: [...] > > +/* > > + * Compute the equivalent of the TTL field by parsing the shadow PT. The > > + * granule size is extracted from the cached VTCR_EL2.TG0 while the level is > > + * retrieved from first entry carrying the level as a tag. > > + */ > > +static u8 get_guest_mapping_ttl(struct kvm_s2_mmu *mmu, u64 addr) > > +{ > > Can you add a lockdep assertion that the MMU lock is held for write > here? At least for me this is far enough away from the 'real' page table > walk that it wasn't clear what locks were held at this point. Sure thing. > > > + u64 tmp, sz = 0, vtcr = mmu->tlb_vtcr; > > + kvm_pte_t pte; > > + u8 ttl, level; > > + > > + switch (vtcr & VTCR_EL2_TG0_MASK) { > > + case VTCR_EL2_TG0_4K: > > + ttl = (TLBI_TTL_TG_4K << 2); > > + break; > > + case VTCR_EL2_TG0_16K: > > + ttl = (TLBI_TTL_TG_16K << 2); > > + break; > > + case VTCR_EL2_TG0_64K: > > + default: /* IMPDEF: treat any other value as 64k */ > > + ttl = (TLBI_TTL_TG_64K << 2); > > + break; > > + } > > + > > + tmp = addr; > > + > > +again: > > + /* Iteratively compute the block sizes for a particular granule size */ > > + switch (vtcr & VTCR_EL2_TG0_MASK) { > > + case VTCR_EL2_TG0_4K: > > + if (sz < SZ_4K) sz = SZ_4K; > > + else if (sz < SZ_2M) sz = SZ_2M; > > + else if (sz < SZ_1G) sz = SZ_1G; > > + else sz = 0; > > + break; > > + case VTCR_EL2_TG0_16K: > > + if (sz < SZ_16K) sz = SZ_16K; > > + else if (sz < SZ_32M) sz = SZ_32M; > > + else sz = 0; > > + break; > > + case VTCR_EL2_TG0_64K: > > + default: /* IMPDEF: treat any other value as 64k */ > > + if (sz < SZ_64K) sz = SZ_64K; > > + else if (sz < SZ_512M) sz = SZ_512M; > > + else sz = 0; > > + break; > > + } > > + > > + if (sz == 0) > > + return 0; > > + > > + tmp &= ~(sz - 1); > > + if (kvm_pgtable_get_leaf(mmu->pgt, tmp, &pte, NULL)) > > + goto again; > > Assuming we're virtualizing a larger TG than what's in use at L0 this > may not actually find a valid leaf that exists within the span of a > single virtual TLB entry. > > For example, if we're using 4K at L0 and 16K at L1, we could have: > > [ ----- valid 16K entry ------- ] > > mapped as: > > [ ----- | ----- | valid | ----- ] > > in the shadow S2. kvm_pgtable_get_leaf() will always return the first > splintered page, which could be invalid. > > What I'm getting at is: should this use a bespoke table walker that > scans for a valid TTL in the range of [addr, addr + sz)? It may make > sense to back off a bit more aggressively and switch to a conservative, > unscoped TLBI to avoid visiting too many PTEs. I had something along those lines at some point (circa 2019), and quickly dropped it as it had a horrible "look-around" behaviour, specially if the L1 S2 granule size is much larger than L0's (64k vs 4k). As you pointed out, it needs heuristics to limit the look-around, which I don't find very satisfying. Which is why the current code limits the search to be in depth only, hoping for the head descriptor to be valid, and quickly backs off to do a level-0 invalidation. My preferred option would be to allow the use of non-valid entries to cache the level (always using the first L0 entry that would map the L1 descriptor), but this opens another can of worms: you could end-up with page table pages containing only invalid descriptors, except for the presence of a level annotation, which screws the refcounting. I'd very much like to see this rather than the look-around option. Now, it is important to consider how useful this is. I expect modern hypervisors to use either TTL-hinted (which we emulate even if the HW doesn't support it) or Range-based invalidation in the vast majority of the cases, so this would only help SW that hasn't got on with the program. Thoughts? M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel