From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B48374C92 for ; Thu, 3 Oct 2024 00:12:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727914379; cv=none; b=DE+YKkTZfcapiOfP3mg5CDjnNSlf34i19APC6sZ22LhNAQ8ytgTTw7RxWTNsijX8YsppLuL3mJzqpYV8XTQV88xlazKm314ilps1+gEdS+IoP7DV+/3YnWN5HbqKdmd9MJYyxo8D/U4xfm+1kVfMJEZ8nAwTfhOzbRjDm99NSWk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727914379; c=relaxed/simple; bh=Y62vgAdPMqyAk56mFW+wIy0i9uAnDSmro1exnTXwaLY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=R54oegU87J2lKby1oRXHSvQn1W0AGaArr7pHLH+uFO6EteI5K3Dqu8I78CxlT+CWEtvpSh2t1N7YhjyuXkJbJIJighnnyHR5OpFw5jsmvXwaQqUW3jsTU04JaBgAesNRCWBNKjba+A1s/YZNlY38ceWSLCB3HPpwHQKdzNE2SX4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=FGqUcIK2; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="FGqUcIK2" Date: Thu, 3 Oct 2024 00:12:48 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1727914374; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TjhSod9+vVM0x9AQtvn4kjgpPDKTDcl2kmuRYLvi3i0=; b=FGqUcIK26lS6bFt/naWa6Zl1FNPKoHW0mmIn4HQJrDiCf1kbRO9MgCKN52x8mLq/dwH1a/ X3oV5bfqAHycczYQMDg6ho6x3ncJdY0bxHuheaMQB2nRAANV4H6n4LBo+tt+sx2/3zvw8k fD2gvXJ8iaMA/GDk1gf0ZuP+RgGG2gY= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Oliver Upton To: Marc Zyngier Cc: Sean Christopherson , kvmarm@lists.linux.dev, Joey Gouly , Suzuki K Poulose , Zenghui Yu Subject: Re: [PATCH 3/3] KVM: arm64: nv: Punt stage-2 recycling to a vCPU request Message-ID: References: <20241001001709.1303668-1-oliver.upton@linux.dev> <20241001001709.1303668-4-oliver.upton@linux.dev> <86cykj75a0.wl-maz@kernel.org> <865xqa6q0a.wl-maz@kernel.org> Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT On Thu, Oct 03, 2024 at 12:04:06AM +0000, Oliver Upton wrote: > Hey, > > On Thu, Oct 03, 2024 at 12:31:33AM +0100, Marc Zyngier wrote: > > On Wed, 02 Oct 2024 01:23:41 +0100, Sean Christopherson wrote: > > > IIUC, KVM round-robins across 2*nr_vcpus MMUs, and when L1 switches to a different > > > VTTBR, it will first drop its reference to the previous MMU. So at any given time, > > > there are nr_vcpus worth of unused MMUs, i.e. a vCPU is guaranteed to be able to > > > find an unused slot, even if vCPUs that are scheduled out hold onto their S2 MMU > > > reference. > > > > It's not about not finding a slot, but about making sure that vcpus > > that context switch rapidly between VTTBRs for their own guests can do > > so freely without sacrificing the TLBs they have just produced. Not > > reusing the TLBs hogged by a vcpu that cannot run is a waste of > > resource. > > OTOH, our global TLBs don't model hardware exactly since a vCPU doing > rapid context switches trash the TLBs of *all* vCPUs in the system. > The cost of reusing an MMU is quite noticeable, since our unmap > implementation is slightly crap at the moment, the cost of which shows > up both on sides of the reclaim (victim and user). Oh, and why unmap is crap: The walker has no way of informing the iterator how far into the range the walk actually got. Instead, the iterator forcibly walks every PUD/PMD sized chunk. So if you have massive holes in your IPA space (which we generally do) then you wind up revisiting the same invalid PGD entry a bunch of times. And TLBI for it too when you have TLBIRANGE. I'm working on a fix for this too, but what I have right now needs to be aggressively de-crapified first. > > > > > > At that point, choosing an MMU that no vCPU is using seems more likely to recycle > > > a cold/dead MMU than a soon-to-be-reused MMU. > > > > > > And the round-robin approach makes it all heavily luck-based anyways. E.g. if > > > a vCPU puts VTTBR A and then loads VTTBR B, B could recycle A's S2 MMU if that > > > MMU slot is next up for recycling. > > > > Well, we'll have to agree to disagree. It's a terrible hack to add > > artificial ties between a vcpu and TLBs. Because that's what the > > shadow MMU is, nothing else. > > > > So if you don't like the TLB eviction policy, please come up with a > > better one, making sure that a recently preempted vcpu gets its S2 MMU > > recycled last. But please don't add the notion of "locked TLBs" to the > > mix, because that's a pretty dodgy architectural concept. > > Well, I've effectively implemented "locked" TLBs by way of allowing > vCPUs to pin an MMU when doing some MMU operation. It's just that the > detail gets encoded in the callsite instead of being some general > property of MMU assignment. > > After fiddling with this a bit more (diff below), I'm actually inclined > to go for holding a reference on scheduled out vCPUs *if* we have reason > to believe it is gonna do something useful. You get a stable hw_mmu > pointer in the places where it matters and can avoid taking the MMU lock > for write during vcpu_load() in the 'fast path'. > > Still should drop the reference in most other cases, as I do *not* want > to entertain vCPUs holding a reference when they've gone out to > userspace. > > diff --git a/arch/arm64/kvm/nested.c b/arch/arm64/kvm/nested.c > index f9e30dd34c7a..df670c14e1c6 100644 > --- a/arch/arm64/kvm/nested.c > +++ b/arch/arm64/kvm/nested.c > @@ -663,6 +663,13 @@ void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu) > > void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) > { > + /* > + * The vCPU kept its reference on the MMU after the last put, keep > + * rolling with it. > + */ > + if (vcpu->arch.hw_mmu) > + return; > + > if (is_hyp_ctxt(vcpu)) { > vcpu->arch.hw_mmu = &vcpu->kvm->arch.mmu; > } else { > @@ -674,10 +681,18 @@ void kvm_vcpu_load_hw_mmu(struct kvm_vcpu *vcpu) > > void kvm_vcpu_put_hw_mmu(struct kvm_vcpu *vcpu) > { > - if (kvm_is_nested_s2_mmu(vcpu->kvm, vcpu->arch.hw_mmu)) { > + /* > + * Keep a reference on the associated stage-2 MMU if the vCPU is > + * scheduling out and not in WFI emulation, suggesting it is likely to > + * reuse the MMU sometime soon. > + */ > + if (vcpu->scheduled_out && !vcpu_get_flag(vcpu, IN_WFI)) > + return; > + > + if (kvm_is_nested_s2_mmu(vcpu->kvm, vcpu->arch.hw_mmu)) > atomic_dec(&vcpu->arch.hw_mmu->refcnt); > - vcpu->arch.hw_mmu = NULL; > - } > + > + vcpu->arch.hw_mmu = NULL; > } > > /* > -- > 2.47.0.rc0.187.ge670bccf7e-goog > > > -- > Thanks, > Oliver -- Thanks, Oliver