From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.lore.kernel.org (Postfix) with ESMTPS id 8273DC54798
	for <linux-arm-kernel@archiver.kernel.org>; Tue,  5 Mar 2024 15:03:42 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:
	Subject:Cc:To:From:Message-ID:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=2cXCjemsNxkuhQllUOYoQaidNX2UYbVkbHguUyMsW5M=; b=x5LAt/XnNm9o92
	7txThoMyAtVkIqk6fj4j+xyWz93HZoyl7PvbFYtXT65K/MSWedPhlEJie8Obwl+42+G1+sEZg+biV
	uuQ58ZhBCRFVn5OkF7Gy7IbCinOg+pLhR1fOfuPkIU56Bk+WaatLiOfeeJ5gZ+T/m4QRKm/Y2t41z
	FjbIh1HYl8FYSbpJg5vJVp+/7pz68M4wWuRuXXl1S88qqwXF51yfINKJfDA6lXXmWoOEiEl2jG2sz
	mp98wd62QJWSreurVpad/ZZSLrASR03Q94SyQ9CDKMfkg/IDwe5DIjuoCPfdimUPqgfSaKIMP//ex
	hznqg18yLWn15JPFpMEw==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.97.1 #2 (Red Hat Linux))
	id 1rhWKH-0000000EBji-2Kt6;
	Tue, 05 Mar 2024 15:03:29 +0000
Received: from dfw.source.kernel.org ([139.178.84.217])
	by bombadil.infradead.org with esmtps (Exim 4.97.1 #2 (Red Hat Linux))
	id 1rhWKD-0000000EBii-0pvx
	for linux-arm-kernel@lists.infradead.org;
	Tue, 05 Mar 2024 15:03:27 +0000
Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58])
	by dfw.source.kernel.org (Postfix) with ESMTP id B022E615DE;
	Tue,  5 Mar 2024 15:03:23 +0000 (UTC)
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3D76CC433C7;
	Tue,  5 Mar 2024 15:03:23 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1709651003;
	bh=2mdrOFobw+z2H9C4Cm3HCZJiFsJR2HEgKEYWYzKiOyM=;
	h=Date:From:To:Cc:Subject:In-Reply-To:References:From;
	b=EOdiagnMhEDSmA3O9ywwmSdokjgZN8utfeB3NA2rvO4XM2fZytggJWGVbeVdWfNbM
	 KYnFa1+ZhMX+lyVz1LDw0RZiwYrtGz/DZN+UFYgTntbU7nE+HyKnevnVTcvb+vBFc9
	 sf9VPCdNF69Yx15CHzNkiVh902i2pTJXQcbvYDekM5EQl6PdsiRIB0uyvVkW+FhJPB
	 FhFOXhmbYi+DBLo/buQxDbSpk2sKOInL3Qhxn4NCyaUHM0eS/MoO4cj15glXgZ6Ywy
	 oYQpWgskIHLmNczBFFzU0+HSMVMX+cLxYzrC+Ah/msBaHXLCS4t9VWziUXcjCpvesN
	 rcKA5TZ3h6JcA==
Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org)
	by disco-boy.misterjones.org with esmtpsa  (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.95)
	(envelope-from <maz@kernel.org>)
	id 1rhWK8-009cCM-QI;
	Tue, 05 Mar 2024 15:03:20 +0000
Date: Tue, 05 Mar 2024 15:03:20 +0000
Message-ID: <86r0go201z.wl-maz@kernel.org>
From: Marc Zyngier <maz@kernel.org>
To: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: kvmarm@lists.linux.dev,
	kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	oliver.upton@linux.dev,
	darren@os.amperecomputing.com,
	d.scott.phillips@amperecomputing.com
Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.
In-Reply-To: <6685c3a6-2017-4bc2-ad26-d11949097050@os.amperecomputing.com>
References: <20240305054606.13261-1-gankulkarni@os.amperecomputing.com>
	<86sf150w4t.wl-maz@kernel.org>
	<6685c3a6-2017-4bc2-ad26-d11949097050@os.amperecomputing.com>
User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue)
 FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1
 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO)
MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue")
X-SA-Exim-Connect-IP: 185.219.108.64
X-SA-Exim-Rcpt-To: gankulkarni@os.amperecomputing.com, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, oliver.upton@linux.dev, darren@os.amperecomputing.com, d.scott.phillips@amperecomputing.com
X-SA-Exim-Mail-From: maz@kernel.org
X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20240305_070325_596774_A5B43CAA 
X-CRM114-Status: GOOD (  58.55  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>,
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Tue, 05 Mar 2024 13:29:08 +0000,
Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> 
> 
> 
> On 05-03-2024 04:43 pm, Marc Zyngier wrote:
> > [re-sending with kvmarm@ fixed]
> > 
> > On Tue, 05 Mar 2024 05:46:06 +0000,
> > Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com> wrote:
> >> 
> >> As per 'commit 178a6915434c ("KVM: arm64: nv: Unmap/flush shadow stage 2
> > 
> > $ git describe --contains 178a6915434c --match=v\*
> > fatal: cannot describe '178a6915434c141edefd116b8da3d55555ea3e63'
> > 
> 
> My bad(I would have been more verbose), I missed to mention that this
> patch is on top of NV-V11 patch series.
> 
> > This commit simply doesn't exist upstream. It only lives in a
> > now deprecated branch that will never be merged.
> > 
> >> page tables")', when ever there is unmap of pages that
> >> are mapped to L1, they are invalidated from both L1 S2-MMU and from
> >> all the active shadow/L2 S2-MMU tables. Since there is no mapping
> >> to invalidate the IPAs of Shadow S2 to a page, there is a complete
> >> S2-MMU page table walk and invalidation is done covering complete
> >> address space allocated to a L2. This has performance impacts and
> >> even soft lockup for NV(L1 and L2) boots with higher number of
> >> CPUs and large Memory.
> >> 
> >> Adding a lookup table of mapping of Shadow IPA to Canonical IPA
> >> whenever a page is mapped to any of the L2. While any page is
> >> unmaped, this lookup is helpful to unmap only if it is mapped in
> >> any of the shadow S2-MMU tables. Hence avoids unnecessary long
> >> iterations of S2-MMU table walk-through and invalidation for the
> >> complete address space.
> > 
> > All of this falls in the "premature optimisation" bucket. Why should
> > we bother with any of this when not even 'AT S1' works correctly,
> 
> Hmm, I am not aware of this, is this something new issue of V11?

it's been there since v0. All we have is a trivial implementation that
doesn't survive the S1 page-tables being swapped out. It requires a
full S1 PTW to be written.

> 
> > making it trivial to prevent a guest from making forward progress? You
> > also show no numbers that would hint at a measurable improvement under
> > any particular workload.
> 
> This patch is avoiding long iterations of unmap which was resulting in
> soft-lockup, when tried L1 and L2 with 192 cores.
> Fixing soft lockup isn't a required fix for feature enablement?

No. All we care is correctness, not performance. Addressing
soft-lockups is *definitely* a performance issue, which I'm 100% happy
to ignore.

[...]

> >>   +static inline bool kvm_is_l1_using_shadow_s2(struct kvm_vcpu
> >> *vcpu)
> >> +{
> >> +	return (vcpu->arch.hw_mmu != &vcpu->kvm->arch.mmu);
> >> +}
> > 
> > Isn't that the very definition of "!in_hyp_ctxt()"? You are abusing
> 
> "!in_hyp_ctxt()" isn't true for non-NV case also?

Surely you don't try to use this in non-NV contexts, right? Why would
you try to populate a shadow reverse-map outside of a NV context?

> This function added to know that L1 is NV enabled and using shadow S2.
> 
> > the hw_mmu pointer to derive something, but the source of truth is the
> > translation regime, as defined by HCR_EL2.{E2H,TGE} and PSTATE.M.
> > 
> 
> OK, I can try HCR_EL2.{E2H,TGE} and PSTATE.M instead of hw_mmu in next
> version.

No. Use is_hyp_ctxt().

[...]

> >> index 61bdd8798f83..3948681426a0 100644
> >> --- a/arch/arm64/kvm/mmu.c
> >> +++ b/arch/arm64/kvm/mmu.c
> >> @@ -1695,6 +1695,13 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
> >>   					     memcache,
> >>   					     KVM_PGTABLE_WALK_HANDLE_FAULT |
> >>   					     KVM_PGTABLE_WALK_SHARED);
> >> +		if ((nested || kvm_is_l1_using_shadow_s2(vcpu)) && !ret) {
> > 
> > I don't understand this condition. If nested is non-NULL, it's because
> > we're using a shadow S2. So why the additional condition?
> 
> No, nested is set only for L2, for L1 it is not.
> To handle L1 shadow S2 case, I have added this condition.

But there is *no shadow* for L1 at all. The only way to get a shadow
is to be outside of the EL2(&0) translation regime. El2(&0) itself is
always backed by the canonical S2. By definition, L1 does not run with
a S2 it is in control of. No S2, no shadow.

[...]

> > What guarantees that the mapping you have for L1 has the same starting
> > address as the one you have for L2? L1 could have a 2MB mapping and L2
> > only 4kB *in the middle*.
> 
> IIUC, when a page is mapped to 2MB in L1, it won't be
> mapped to L2 and we iterate with the step of PAGE_SIZE and we should
> be hitting the L2's IPA in lookup table, provided the L2 page falls in
> unmap range.

But then how do you handle the reverse (4kB at L1, 2MB at L2)? Without
tracking of the *intersection*, this fails to be correctly detected.
This is TLB matching 101.

[...]

> >> +			while (start < end) {
> >> +				size = PAGE_SIZE;
> >> +				/*
> >> +				 * get the Shadow IPA if the page is mapped
> >> +				 * to L1 and also mapped to any of active L2.
> >> +				 */
> > 
> > Why is L1 relevant here?
> 
> We do map while L1 boots(early stage) in shadow S2, at that moment
> if the L1 mapped page is unmapped/migrated we do need to unmap from
> L1's S2 table also.

Sure. But you can also get a page that is mapped in L2 and not mapped
in the canonical S2, which is L1's. I more and more feel that you have
a certain misconception of how L1 gets its pages mapped.

> 
> > 
> >> +				ret = get_shadow_ipa(mmu, start, &shadow_ipa, &size);
> >> +				if (ret)
> >> +					kvm_unmap_stage2_range(mmu, shadow_ipa, size);
> >> +				start += size;
> >> +			}
> >> +		}
> >> +	}
> >> +}
> >> +
> >>   /* expects kvm->mmu_lock to be held */
> >>   void kvm_nested_s2_flush(struct kvm *kvm)
> >>   {
> > 
> > There are a bunch of worrying issues with this patch. But more
> > importantly, this looks like a waste of effort until the core issues
> > that NV still has are solved, and I will not consider anything of the
> > sort until then.
> 
> OK thanks for letting us know, I will pause the work on V2 of this
> patch until then.
> 
> > 
> > I get the ugly feeling that you are trying to make it look as if it
> > was "production ready", which it won't be for another few years,
> > specially if the few interested people (such as you) are ignoring the
> > core issues in favour of marketing driven features ("make it fast").
> > 
> 
> What are the core issues (please forgive me if you mentioned already)?
> certainly we will prioritise them than this.

AT is a big one. Maintenance interrupts are more or less broken. I'm
slowly plugging PAuth, but there's no testing whatsoever (running
Linux doesn't count). Lack of SVE support is also definitely a
blocker.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel