All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oliver Upton <oliver.upton@linux.dev>
To: Sean Christopherson <seanjc@google.com>
Cc: Marc Zyngier <maz@kernel.org>,
	kvmarm@lists.linux.dev, Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [PATCH 3/3] KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
Date: Thu, 3 Oct 2024 15:03:04 -0700	[thread overview]
Message-ID: <Zv8UmMfXOqT2A9_A@linux.dev> (raw)
In-Reply-To: <Zv7hKD_6Pvhg4ULY@google.com>

On Thu, Oct 03, 2024 at 11:23:36AM -0700, Sean Christopherson wrote:
> > > Why not?  The vCPU is still running, keeping its S2 MMU resident is desirable, no?
> > 
> > How could we possibly know what the intent of userspace is? The VMM
> > could just as well throw that vCPU fd on ice for an eternity.
> > 
> > For example, you could have a PSCI implementation that lives in
> > userspace. Guest does CPU_OFF and the VMM decides to terminate the
> > backing thread and keep the FD around for the next CPU_ON.
> 
> Yes, but we need to play the odds.

I agree that we can make an educated guess about the state of a vCPU
when it remains in kernel, but anything outside of that is guesswork.

> I.e. make the common case fast/efficient.
> KVM obviously needs to not fallover or crater performance in the presence of edge
> cases, but IMO, disallowing a vCPU from pinning a vCPU because it _might_ go
> offline is the wrong tradeoff.

But in the event of a 'rare' offline event the vCPU took out an MMU slot
forever, or at least until the VM decides to online it again. That feels
off to me.

So like I mentioned earlier, the common case is that the L1 is running a
VM where all of the vCPUs are sharing the same stage-2 MMU context.

In this case, it is highly likely that the L2 VM's nested MMU keeps an
elevated refcount, as at least one of the vCPUs remains in the KVM_RUN
loop.

In addition to that, we likely have quite a few free slots as we
overprovision the nested MMUs to make sure the worst case remains
functional. The only practical situation in which we would see thrashing
of the nested stage-2 MMUs is if the L1 were running more than 2*NR_VCPUS
VMs, which is already a 2x overcommit of the L1.

> > Since KVM still views that fd as 'runnable', it'd sit on the reference
> > that vCPU holds indefinitely. On top of that, it adds complexity to the
> > implementation since we would need more refcount cleanup flows to handle
> > these straggler references.
> 
> But only one flow, vCPU destruction, is mandatory.  Anything beyond that is pure
> optimization.

vcpu_load() / vcpu_put() is the mandatory flow in this design. We reload
the vCPU to handle nested transitions (i.e. L1<->L2), and we need to attach
an MMU that matches the new context.

> > > Essentially all I'm suggesting is that instead of having a common pool of 2*vCPUs
> > > TLBs per L1 VMM, have 2 (or however many) TLBs per L1 vCPU, plus maybe N extra
> > > TLBs per L1 VMM.  I.e. mimic the hierarchical design of hardware caches and TLBs
> > > to some extent.
> > 
> > Making TLBs private to the L1 vCPU is almost guaranteed to be a net loss
> > in performance.
> 
> I'm not saying make TLBs private, I'm saying allow each vCPU to "pin" (i.e. hold
> a reference) up to N TLBs/MMUs, regardless of "where" that vCPU is in the flow
> of things.  Versus the proposed behavior of pinning TLBs only when it's absolutely
> mandatory to do so for functional correctness.

Ah, got it. It is an interesting idea, if we want to explore any meaningful
value of N then we're gonna need to fix the allocation scheme. We'd
probably also need that mechanism to be more tightly integrated into
TLBIs to potentially drop references when a scope has been invalidated.

> Holding a reference across preemption would be the first step towards that model.

I'm OK with doing this for non-WFI preemption, which has the favorable
property of avoiding lock serialization at vcpu_load() in most cases
too.

-- 
Thanks,
Oliver

  reply	other threads:[~2024-10-03 22:03 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-01  0:17 [PATCH 0/3] KVM: arm64: nv: Fixes for stage-2 MMU recycling Oliver Upton
2024-10-01  0:17 ` [PATCH 1/3] KVM: arm64: Treat stage-2 MMUs as refcounted generally Oliver Upton
2024-10-01  0:17 ` [PATCH 2/3] KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed Oliver Upton
2024-10-01  0:17 ` [PATCH 3/3] KVM: arm64: nv: Punt stage-2 recycling to a vCPU request Oliver Upton
2024-10-01 19:05   ` Sean Christopherson
2024-10-01 20:41     ` Oliver Upton
2024-10-01 23:28       ` Sean Christopherson
2024-10-01 23:49         ` Marc Zyngier
2024-10-02  0:06           ` Oliver Upton
2024-10-02  0:23             ` Sean Christopherson
2024-10-02 23:31               ` Marc Zyngier
2024-10-03  0:04                 ` Oliver Upton
2024-10-03  0:12                   ` Oliver Upton
2024-10-03 16:45                     ` Sean Christopherson
2024-10-03 17:52                       ` Oliver Upton
2024-10-03 18:23                         ` Sean Christopherson
2024-10-03 22:03                           ` Oliver Upton [this message]
2024-10-01 23:23   ` Marc Zyngier
2024-10-02  0:06     ` Oliver Upton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zv8UmMfXOqT2A9_A@linux.dev \
    --to=oliver.upton@linux.dev \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.