From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 98E48C25B7E
	for <linux-arm-kernel@archiver.kernel.org>; Tue,  4 Jun 2024 08:00:15 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:
	Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=p7uucj1DKqPvzo3Zw4JvUYKljH8wHvJyUBpxaY5FsfY=; b=AxGTi6lzeSdvzT
	lOzoCDY5NqDC0jj21dNtb8Ed85CXmVepuE7+YMTYcepTcEFmXvZb4vHWXxGwrcJziaNCwTVluj0jW
	NOH8uI+R4RY2B1u2sd6AHqIPXA1vxo4JZ+u9zY9oFHl8jXs6La280SOLsYuCdnSV1mHl4F7Jro1ps
	+VGroxkhH6U3Bfr4brjupqbZpJN1ZGBsahxySLjLoPHEgaq2PY1p8obw7ZZHShpWRJYIvMqNrZ5h+
	gpnWejerd+yRKGxmL2na8SRr0bAvxuDKpcMB+VlImANJ3lVJxxQObzc9Ekp4BlpYpGjjQz86BPuo8
	Jc9eTJNsSv7dVJgPVUFQ==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1sEP5O-00000001bfl-00CC;
	Tue, 04 Jun 2024 08:00:02 +0000
Received: from dfw.source.kernel.org ([139.178.84.217])
	by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux))
	id 1sEP5K-00000001bdV-2iWC
	for linux-arm-kernel@lists.infradead.org;
	Tue, 04 Jun 2024 08:00:00 +0000
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
	by dfw.source.kernel.org (Postfix) with ESMTP id E6A866121A;
	Tue,  4 Jun 2024 07:59:55 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9999FC2BBFC;
	Tue,  4 Jun 2024 07:59:55 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1717487995;
	bh=vdxauOmzSboHFnDF9buP0En7vyb3RHtdFdbq9LJy4mc=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=LtodnuPO2CS14uvj48kFDapRADhNtETdT+ND/y+TtmLXzQn4lPjTT3tm/Q/hKcoKA
	 CvJ+G7kzzidAN4yghffCb6DJHqrR+YG7a5EHxqkq1DzBg2dljr94xAwmLwWQcHoqx5
	 hSqo95an3+3BSbOBxJxfCqLGtAE/SKmKXcTVO5nCuQgsqRIO0bwMPB6VtQa0WQDN7n
	 +pjXqUASWMxrGcjy7DqPVnmz+PmonLjr3src34RmbtqgO66qbMpjvp/1fWWepraxIZ
	 3kD3m2FfJUr9+f06knLUYKEVlhiKLb9ZNj8znUMXERgeC6HCdtlihLFPEYid8WfdvM
	 96S/e775jkdTw==
Received: from 82-132-216-203.dab.02.net ([82.132.216.203] helo=wait-a-minute.misterjones.org)
	by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <maz@kernel.org>)
	id 1sEP5E-000Tse-LN;
	Tue, 04 Jun 2024 08:59:53 +0100
Date: Tue, 04 Jun 2024 08:59:49 +0100
Message-ID: <87frtt2l56.wl-maz@kernel.org>
From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev,
	kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Joey Gouly <joey.gouly@arm.com>,
	Alexandru Elisei <alexandru.elisei@arm.com>,
	Christoffer Dall <christoffer.dall@arm.com>,
	Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Subject: Re: [PATCH v2 13/16] KVM: arm64: nv: Invalidate TLBs based on shadow S2 TTL-like information
In-Reply-To: <Zl4NScV0E_YV7GR2@linux.dev>
References: <20240529145628.3272630-1-maz@kernel.org>
	<20240529145628.3272630-14-maz@kernel.org>
	<Zl4NScV0E_YV7GR2@linux.dev>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/28.2
 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
X-SA-Exim-Connect-IP: 82.132.216.203
X-SA-Exim-Rcpt-To: oliver.upton@linux.dev, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, joey.gouly@arm.com, alexandru.elisei@arm.com, christoffer.dall@arm.com, gankulkarni@os.amperecomputing.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240604_005958_844348_806E5839 
X-CRM114-Status: GOOD (  36.30  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Mon, 03 Jun 2024 19:36:57 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Wed, May 29, 2024 at 03:56:25PM +0100, Marc Zyngier wrote:

[...]

> > +/*
> > + * Compute the equivalent of the TTL field by parsing the shadow PT.  The
> > + * granule size is extracted from the cached VTCR_EL2.TG0 while the level is
> > + * retrieved from first entry carrying the level as a tag.
> > + */
> > +static u8 get_guest_mapping_ttl(struct kvm_s2_mmu *mmu, u64 addr)
> > +{
> 
> Can you add a lockdep assertion that the MMU lock is held for write
> here? At least for me this is far enough away from the 'real' page table
> walk that it wasn't clear what locks were held at this point.

Sure thing.

> 
> > +	u64 tmp, sz = 0, vtcr = mmu->tlb_vtcr;
> > +	kvm_pte_t pte;
> > +	u8 ttl, level;
> > +
> > +	switch (vtcr & VTCR_EL2_TG0_MASK) {
> > +	case VTCR_EL2_TG0_4K:
> > +		ttl = (TLBI_TTL_TG_4K << 2);
> > +		break;
> > +	case VTCR_EL2_TG0_16K:
> > +		ttl = (TLBI_TTL_TG_16K << 2);
> > +		break;
> > +	case VTCR_EL2_TG0_64K:
> > +	default:	    /* IMPDEF: treat any other value as 64k */
> > +		ttl = (TLBI_TTL_TG_64K << 2);
> > +		break;
> > +	}
> > +
> > +	tmp = addr;
> > +
> > +again:
> > +	/* Iteratively compute the block sizes for a particular granule size */
> > +	switch (vtcr & VTCR_EL2_TG0_MASK) {
> > +	case VTCR_EL2_TG0_4K:
> > +		if	(sz < SZ_4K)	sz = SZ_4K;
> > +		else if (sz < SZ_2M)	sz = SZ_2M;
> > +		else if (sz < SZ_1G)	sz = SZ_1G;
> > +		else			sz = 0;
> > +		break;
> > +	case VTCR_EL2_TG0_16K:
> > +		if	(sz < SZ_16K)	sz = SZ_16K;
> > +		else if (sz < SZ_32M)	sz = SZ_32M;
> > +		else			sz = 0;
> > +		break;
> > +	case VTCR_EL2_TG0_64K:
> > +	default:	    /* IMPDEF: treat any other value as 64k */
> > +		if	(sz < SZ_64K)	sz = SZ_64K;
> > +		else if (sz < SZ_512M)	sz = SZ_512M;
> > +		else			sz = 0;
> > +		break;
> > +	}
> > +
> > +	if (sz == 0)
> > +		return 0;
> > +
> > +	tmp &= ~(sz - 1);
> > +	if (kvm_pgtable_get_leaf(mmu->pgt, tmp, &pte, NULL))
> > +		goto again;
> 
> Assuming we're virtualizing a larger TG than what's in use at L0 this
> may not actually find a valid leaf that exists within the span of a
> single virtual TLB entry.
> 
> For example, if we're using 4K at L0 and 16K at L1, we could have:
> 
> 	[ ----- valid 16K entry ------- ]
> 
> mapped as:
> 
> 	[ ----- | ----- | valid | ----- ]
> 
> in the shadow S2. kvm_pgtable_get_leaf() will always return the first
> splintered page, which could be invalid.
> 
> What I'm getting at is: should this use a bespoke table walker that
> scans for a valid TTL in the range of [addr, addr + sz)? It may make
> sense to back off a bit more aggressively and switch to a conservative,
> unscoped TLBI to avoid visiting too many PTEs.

I had something along those lines at some point (circa 2019), and
quickly dropped it as it had a horrible "look-around" behaviour,
specially if the L1 S2 granule size is much larger than L0's (64k vs
4k). As you pointed out, it needs heuristics to limit the look-around,
which I don't find very satisfying.

Which is why the current code limits the search to be in depth only,
hoping for the head descriptor to be valid, and quickly backs off to
do a level-0 invalidation.

My preferred option would be to allow the use of non-valid entries to
cache the level (always using the first L0 entry that would map the L1
descriptor), but this opens another can of worms: you could end-up
with page table pages containing only invalid descriptors, except for
the presence of a level annotation, which screws the refcounting. I'd
very much like to see this rather than the look-around option.

Now, it is important to consider how useful this is. I expect modern
hypervisors to use either TTL-hinted (which we emulate even if the HW
doesn't support it) or Range-based invalidation in the vast majority
of the cases, so this would only help SW that hasn't got on with the
program.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel