From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E8A188BF0 for ; Thu, 3 Oct 2024 18:23:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727979820; cv=none; b=kEgQSL6WKmGstYHNp5wBYMm+cIEd6FMQIaibaxqWXRFhvz9hgBKbHLmyPuIRDuRJEFH0eDStdLUxbS7FnnPHjy3pgXH2fR0ryz15BZX6InZ0iE0dSaLxcjTmcCOqPeEELamdGjVOMZrSD9AMbSksj5qmPxMZxIjhleFY0Bu3lXw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727979820; c=relaxed/simple; bh=+Qwko865Dxb23zLYgBNY7MwA1Z4+ZEHBZf5jKMEL8/Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Fxv5BPdOf8GFXGSjtstvbGLua1WPtkLwpD49YDlyIVvn3ftxf9byTMQYV3YkW54lu60DllRWOTqnX5M6LBWcZq50z3WWvC0swldnv/btlOLj3LpgSeakFw+C0L7ssAminTIn/dcbjt+yL/IlxQN948du8OBTvA4/4IQJcdGZYa8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=BUje97N7; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="BUje97N7" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2e0a47eb73fso1615495a91.0 for ; Thu, 03 Oct 2024 11:23:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1727979818; x=1728584618; darn=lists.linux.dev; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ySmfm9Cr5pP5Ntea2x6oizdkkAlkeRhH8HqoIV51tqo=; b=BUje97N7sE2jQOA3Ct8ECXnbkMU4o0YdGAgpoNnndyFYzVXBxoZzNXT5zM+jq1LAok RHmOKo/kVrCTI104s/ZYJyVNP50r2dxZrPYIsZY8i84bk/1DwXOs0Morg6S1Ar/yihZJ DQuSauKUeiY4Teip0bIp8OH8RhnaFmXGdAqPdKvB5IggjeyYynrnyJIapWnN15BCwnce 4iP9y5uEqxsbaOJNgS9/e/7WGqatTK9nylNT1z/JZ+7aVMp0zBh6VQ+haZbKSPYNi4Qr Gf9uHfzSr4RlbdipgpNNOX9aLAT2UwZDx3H9QtjJzYa+N2u0vGBGtjJj4PlrUHhXcXZ1 wwAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727979818; x=1728584618; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ySmfm9Cr5pP5Ntea2x6oizdkkAlkeRhH8HqoIV51tqo=; b=Nu+0sDpjX08oGaRf3a2I8UW4pucKrAVwH52rfVeAFu7hJlAc/TycnUHG61myWo7Zfp okzVhwd2Lu+MU9HVUeIOREvgYICguTpj2hN2Nha8Ny5fcc0hMV8x2QY/OlJHOrT4fdw/ bTyWVgqhb+7oKpgC72mm49ropG3G1DZu4QvtjlnTj1sd+/0fE2A1PwGiuuFfQlMVphYb 55mMZjAKGxlpSQC1YyGIxZPA9MdpOmMsIiQrR8Yu1EqexSvP4fM7IqnEuygVockqc0iZ 5xDLXm2E6Gy/+7EI3QMvNT7jRpkQwHUa0A1ZKWQ5kzA38rMS6nE1D5+0DsRGQVvr+hi7 mkVA== X-Forwarded-Encrypted: i=1; AJvYcCWOASF/qCcpiKlnXr37zuYbiIExrCJ22ssNKEuY63c7P22EMLfygbcS/kzHD7wTPicCnormzY4=@lists.linux.dev X-Gm-Message-State: AOJu0Ywzk7x9RWfa8yRJQaFVQfZrnvVm6bGxOVxKlq8OcyPLTB7jDClx cB/lohDpOk/zGEryHbfQPSFM2zPmx7C/st1k2BXNYeN/D/gVduFMqW1DS9i6ek5MwuCLX25z1Cg 0+A== X-Google-Smtp-Source: AGHT+IFzhaffNkrbhAeC1oApOfdVpx2o3VOEWi7YxZsmOHqtBcaZJWVc8oCX7nOlCDNE0KMvFyo/4xUZC/g= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a17:90b:4b49:b0:2da:96a4:dadb with SMTP id 98e67ed59e1d1-2e1849343b1mr26745a91.5.1727979818033; Thu, 03 Oct 2024 11:23:38 -0700 (PDT) Date: Thu, 3 Oct 2024 11:23:36 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: kvmarm@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <86cykj75a0.wl-maz@kernel.org> <865xqa6q0a.wl-maz@kernel.org> Message-ID: Subject: Re: [PATCH 3/3] KVM: arm64: nv: Punt stage-2 recycling to a vCPU request From: Sean Christopherson To: Oliver Upton Cc: Marc Zyngier , kvmarm@lists.linux.dev, Joey Gouly , Suzuki K Poulose , Zenghui Yu Content-Type: text/plain; charset="us-ascii" On Thu, Oct 03, 2024, Oliver Upton wrote: > On Thu, Oct 03, 2024 at 09:45:40AM -0700, Sean Christopherson wrote: > > [...[ > > > > > OTOH, our global TLBs don't model hardware exactly since a vCPU doing > > > > rapid context switches trash the TLBs of *all* vCPUs in the system. > > > > The cost of reusing an MMU is quite noticeable, since our unmap > > > > implementation is slightly crap at the moment, the cost of which shows > > > > up both on sides of the reclaim (victim and user). > > > > > > Oh, and why unmap is crap: > > > > Heh, isn't unmap by definition crap? If KVM needs to unmap and rebuild an S2 MMU, > > then KVM is already in a slow, sub-optimal situation. > > Not really, the unmap plumbing is used for applying the intent of a > guest TLBI too. Sub-optimal or not, it is exactly what the VM asked for, > and it'd be in our interest to handle the unmap as expeditiously as > possible. Sorry, I meant "unnecessary unmap". > > > > Still should drop the reference in most other cases, as I do *not* want > > > > to entertain vCPUs holding a reference when they've gone out to > > > > userspace. > > > > Why not? The vCPU is still running, keeping its S2 MMU resident is desirable, no? > > How could we possibly know what the intent of userspace is? The VMM > could just as well throw that vCPU fd on ice for an eternity. > > For example, you could have a PSCI implementation that lives in > userspace. Guest does CPU_OFF and the VMM decides to terminate the > backing thread and keep the FD around for the next CPU_ON. Yes, but we need to play the odds. I.e. make the common case fast/efficient. KVM obviously needs to not fallover or crater performance in the presence of edge cases, but IMO, disallowing a vCPU from pinning a vCPU because it _might_ go offline is the wrong tradeoff. > Since KVM still views that fd as 'runnable', it'd sit on the reference > that vCPU holds indefinitely. On top of that, it adds complexity to the > implementation since we would need more refcount cleanup flows to handle > these straggler references. But only one flow, vCPU destruction, is mandatory. Anything beyond that is pure optimization. > > Essentially all I'm suggesting is that instead of having a common pool of 2*vCPUs > > TLBs per L1 VMM, have 2 (or however many) TLBs per L1 vCPU, plus maybe N extra > > TLBs per L1 VMM. I.e. mimic the hierarchical design of hardware caches and TLBs > > to some extent. > > Making TLBs private to the L1 vCPU is almost guaranteed to be a net loss > in performance. I'm not saying make TLBs private, I'm saying allow each vCPU to "pin" (i.e. hold a reference) up to N TLBs/MMUs, regardless of "where" that vCPU is in the flow of things. Versus the proposed behavior of pinning TLBs only when it's absolutely mandatory to do so for functional correctness. Holding a reference across preemption would be the first step towards that model.