From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AAD69629
	for <kvmarm@lists.linux.dev>; Thu,  3 Oct 2024 00:04:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.185
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1727913851; cv=none; b=fQYfFP1YX/XdEBK2MaT2ILxwX1vsmf7olmtNGYyzAxtFzcnWT2O/HcaXXx6V/x6eVyJyB+rvLJ6Ttwdq9VN0Zmbf767DIzpmwO4V8iOOC+bxD7hvYrO+hG3JAXB95BdFKYoRuh17Gss851b6OfzYg/PURCcoB3jt/I5vuvpWoxY=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1727913851; c=relaxed/simple;
	bh=a8pwle6XnsTcGHIMXQsw2it8rNJEtWYEn1RYQrp7bcc=;
	h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version:
	 Content-Type:Content-Disposition:In-Reply-To; b=tyXlo1Ir8CGFMtDqWy8Q7g96pqS0RLHKbn+HtQzSyAh+5CMMoBCb8z0irp+c6cMo+Dt5VDX0eYbxlUIK393xLTcKMYqCHlAGh5kRK9aps+AEcUOyTG8CB9NgN5QYuuZeWfpD6BqkSN90AblhYBQL7bDGypIxl4zrlIEXFGu6fUQ=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=i8AbKSth; arc=none smtp.client-ip=95.215.58.185
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="i8AbKSth"
Date: Thu, 3 Oct 2024 00:04:01 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1;
	t=1727913846;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 in-reply-to:in-reply-to:references:references;
	bh=MpIbh67MyOVtU9n+6hRxLGYlIbnrq/BceOHLHDPDuH8=;
	b=i8AbKSth5/xWIqI81pqQtO5wnTRNuB2lzXVESt9c8XwA6+NbTJs00zGB1jfElQeFezImik
	z7KoGOFsWZDneDOiB3kq15+cIzwRaLWzqT7koaXiEWBAAsSwKjUt0gloWNv6p8tpUKAr8V
	/GXA50JFT0ztrA2i8hgcX71LmeXoPU0=
X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers.
From: Oliver Upton <oliver.upton@linux.dev>
To: Marc Zyngier <maz@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>, kvmarm@lists.linux.dev,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [PATCH 3/3] KVM: arm64: nv: Punt stage-2 recycling to a vCPU
 request
Message-ID: <Zv3fcT9lCSujib7J@linux.dev>
References: <20241001001709.1303668-1-oliver.upton@linux.dev>
 <20241001001709.1303668-4-oliver.upton@linux.dev>
 <ZvxH3el9SNuNWwi8@google.com>
 <ZvxeeVn8LphHxWeS@linux.dev>
 <ZvyFkqsRFBAYwqP7@google.com>
 <86cykj75a0.wl-maz@kernel.org>
 <ZvyOcnZqNzfD7MZx@linux.dev>
 <ZvySjfDWOhl2O1IA@google.com>
 <865xqa6q0a.wl-maz@kernel.org>
Precedence: bulk
X-Mailing-List: kvmarm@lists.linux.dev
List-Id: <kvmarm.lists.linux.dev>
List-Subscribe: <mailto:kvmarm+subscribe@lists.linux.dev>
List-Unsubscribe: <mailto:kvmarm+unsubscribe@lists.linux.dev>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <865xqa6q0a.wl-maz@kernel.org>
X-Migadu-Flow: FLOW_OUT

Hey,

On Thu, Oct 03, 2024 at 12:31:33AM +0100, Marc Zyngier wrote:
> On Wed, 02 Oct 2024 01:23:41 +0100, Sean Christopherson <seanjc@google.com> wrote:
> > IIUC, KVM round-robins across 2*nr_vcpus MMUs, and when L1 switches to a different
> > VTTBR, it will first drop its reference to the previous MMU.  So at any given time,
> > there are nr_vcpus worth of unused MMUs, i.e. a vCPU is guaranteed to be able to
> > find an unused slot, even if vCPUs that are scheduled out hold onto their S2 MMU
> > reference.
> 
> It's not about not finding a slot, but about making sure that vcpus
> that context switch rapidly between VTTBRs for their own guests can do
> so freely without sacrificing the TLBs they have just produced. Not
> reusing the TLBs hogged by a vcpu that cannot run is a waste of
> resource.

OTOH, our global TLBs don't model hardware exactly since a vCPU doing
rapid context switches trash the TLBs of *all* vCPUs in the system.
The cost of reusing an MMU is quite noticeable, since our unmap
implementation is slightly crap at the moment, the cost of which shows
up both on sides of the reclaim (victim and user).

> >
> > At that point, choosing an MMU that no vCPU is using seems more likely to recycle
> > a cold/dead MMU than a soon-to-be-reused MMU.
> > 
> > And the round-robin approach makes it all heavily luck-based anyways.  E.g. if
> > a vCPU puts VTTBR A and then loads VTTBR B, B could recycle A's S2 MMU if that
> > MMU slot is next up for recycling.
> 
> Well, we'll have to agree to disagree. It's a terrible hack to add
> artificial ties between a vcpu and TLBs. Because that's what the
> shadow MMU is, nothing else.
> 
> So if you don't like the TLB eviction policy, please come up with a
> better one, making sure that a recently preempted vcpu gets its S2 MMU
> recycled last. But please don't add the notion of "locked TLBs" to the
> mix, because that's a pretty dodgy architectural concept.

Well, I've effectively implemented "locked" TLBs by way of allowing
vCPUs to pin an MMU when doing some MMU operation. It's just that the
detail gets encoded in the callsite instead of being some general
property of MMU assignment.

After fiddling with this a bit more (diff below), I'm actually inclined
to go for holding a reference on scheduled out vCPUs *if* we have reason
to believe it is gonna do something useful. You get a stable hw_mmu
pointer in the places where it matters and can avoid taking the MMU lock
for write during vcpu_load() in the 'fast path'.

Still should drop the reference in most other cases, as I do *not* want
to entertain vCPUs holding a reference when they've gone out to
userspace.

diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c
index f9e30dd34c7a..df670c14e1c6 100644
--- a/arch/arm64/kvm/nested.c
+++ b/arch/arm64/kvm/nested.c
@@ -663,6 +663,13 @@ void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu)
 
 void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * The vCPU kept its reference on the MMU after the last put, keep
+	 * rolling with it.
+	 */
+	if (vcpu->arch.hw_mmu)
+		return;
+
 	if (is_hyp_ctxt(vcpu)) {
 		vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu;
 	} else {
@@ -674,10 +681,18 @@ void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu)
 {
-	if (kvm_is_nested_s2_mmu(vcpu->kvm, vcpu->arch.hw_mmu)) {
+	/*
+	 * Keep a reference on the associated stage-2 MMU if the vCPU is
+	 * scheduling out and not in WFI emulation, suggesting it is likely to
+	 * reuse the MMU sometime soon.
+	 */
+	if (vcpu->scheduled_out && !vcpu_get_flag(vcpu, IN_WFI))
+		return;
+
+	if (kvm_is_nested_s2_mmu(vcpu->kvm, vcpu->arch.hw_mmu))
 		atomic_dec(&vcpu->arch.hw_mmu->refcnt);
-		vcpu->arch.hw_mmu = NULL;
-	}
+
+	vcpu->arch.hw_mmu = NULL;
 }
 
 /*
-- 
2.47.0.rc0.187.ge670bccf7e-goog


-- 
Thanks,
Oliver