From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C03D316A956 for ; Tue, 12 May 2026 00:13:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778544798; cv=none; b=bqXu5APRUSfwJKQVbxdfGI4eYIKkkCyYsC4pC9Ly7Bfhlpt3i4rL+dKtPDIVH5dQmV6aPmjmRHs/JNxdbb00Gzn6mqCU4sL37bTfJGfwC36l4N5A2Zk2dHcDvxpmd6JiGNCozXDHPeOpbkJuH8dnH4evPoQCy173HNCVO6WUvhY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778544798; c=relaxed/simple; bh=HSBgfbObOgGU8ldj/hIKoGQPFZw89/D/MCoMiB8hSrM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YrgAT4MT6ehgmQGeOAZf1UafLVCVaeXAqNvrtEPRT683pYwwv3cWnhdksUKAzMyBXoSHpn2CykGcFyqbSKV8L/1W2Tmx/SPxKTaR/cQ/VTVYA+uMP4CWBdxm9vGN/bFIXdUB2xKcWl+0hGsncKJE5Sa8hHkD1y85w4RXoDhWD9k= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Ozf5Okxa; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ozf5Okxa" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-8353df9bc7eso4663675b3a.2 for ; Mon, 11 May 2026 17:13:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1778544796; x=1779149596; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=v4JYEVaDRVs+or3txsBOsP9kx9Y7GV2JqrS69eG16uc=; b=Ozf5OkxaZnK3qbCCRRpx1mntF+0CrROwnsD5ZW26cc9z4VA0DEQ/nUtRt0n+z92x1S 2GEic0CeQyDkfk+tp8ENN4AUS37jId372A/p2u27Dm2AzICO8JfcWGxzV9Ui2ccKIkJQ XJcSSDJzwnfZC91EC1y0ySuuhpRCvr3agPYiM/uKWb8NVsW2wh7yib7qStIq0BCL3T+1 6ngDImOf9OHOwHWeHACCnl67hvkqfMNr5/blHPrGc3Lj+jtfKUxmdnIyz0JbNnNDTJJ2 9Hd9GCX1bp9lYRtN19oikgZGKXQ86QfJ8IzC5Q6uGB0bv4xPsKD2+tQ8Y2NJDhw0zeoY 7tBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778544796; x=1779149596; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=v4JYEVaDRVs+or3txsBOsP9kx9Y7GV2JqrS69eG16uc=; b=dx1zpUwWE6VigAv95CPIC2EcWFIIDcvIISp2Ce8cS+bSAeesc23XwUYge05pntqcT3 fNs4RREBOIC+cCrvZ1kc1fi+rxXtvIoMp+2jzH4OkRb4l64Kw6ILjvyn1XizOjiB1Bwi 64dwyiKc3LzkjBPGnM8PjIWgXfTRoXPRi0iINIXP7N8Je6gMOVgyWvyEzdgFN4jF/7Hq CM7aBWThui1gsm3+qVT7sDRPOP5T1EaF9+QMDPm97SawsPF+GZhKtB0uwEyj73m9zBtZ 7RDqNrolxkYo3P+7HHIjLkXsM4kh8XOp9f98Jrs44Lk7Vbf3ETK7GEcQY9MsTukh4nC/ HaGw== X-Gm-Message-State: AOJu0YyIhbDi8ajnxZbbhb5SF/1YDkXebVDgIb0BaRuLTtBp6je23LWZ 6Ig8NJxDE4M0zEfuOiTTUKLfCiW7inZwRM4SmIlORKUnkNqqi5u9/uc+aNFmJ0HSDnVDU7PlL7H scRsrMA== X-Received: from pfgu4.prod.google.com ([2002:a05:6a00:984:b0:83b:b70d:c021]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:4c0a:b0:835:3949:3c25 with SMTP id d2e1a72fcca58-83a5d873e79mr27493326b3a.24.1778544795770; Mon, 11 May 2026 17:13:15 -0700 (PDT) Date: Mon, 11 May 2026 17:13:15 -0700 In-Reply-To: <20260102142429.896101-9-griffoul@gmail.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260102142429.896101-1-griffoul@gmail.com> <20260102142429.896101-9-griffoul@gmail.com> Message-ID: Subject: Re: [PATCH v4 08/10] KVM: x86: Add nested context management From: Sean Christopherson To: Fred Griffoul Cc: kvm@vger.kernel.org, pbonzini@redhat.com, vkuznets@redhat.com, shuah@kernel.org, dwmw@amazon.co.uk, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, Fred Griffoul Content-Type: text/plain; charset="us-ascii" On Fri, Jan 02, 2026, Fred Griffoul wrote: > From: Fred Griffoul > > Add infrastructure to persist nested virtualization state when L2 vCPUs Please be more transparent with what exactly is being persisted. > are switched on an L1 vCPU or migrated between L1 vCPUs. > > The nested context table uses a hash table for fast lookup by nested > control block GPA (VMPTR for VMX, VMCB for SVM) and maintains a free > list for context management. > > The kvm_nested_context_load() function searches for a context indexed by > the target GPA; if not found, it allocates a new context up to the > configured maximum. If at capacity, it recycles the oldest context from > the free list. > > The oversubscription is hardcoded to support up to 8 L2 vCPUs per L1 > vCPU. > > The kvm_nested_context_clear() function moves the context to the free > list while keeping it in the hash table for potential reuse. > > This allows nested hypervisors to multiplex multiple L2 vCPUs on L1 > vCPUs without losing cached nested state, significantly improving > performance for workloads with frequent L2 context switches. > > This patch adds the basic infrastructure. Subsequent patches will add > the nested VMX and SVM specific support to populate and utilize the > cached nested state. > > Signed-off-by: Fred Griffoul > --- > arch/x86/include/asm/kvm_host.h | 31 +++++ > arch/x86/include/uapi/asm/kvm.h | 2 + > arch/x86/kvm/Makefile | 2 +- > arch/x86/kvm/nested.c | 199 ++++++++++++++++++++++++++++++++ > arch/x86/kvm/x86.c | 5 +- > 5 files changed, 237 insertions(+), 2 deletions(-) Please provide concrete performance numbers. They need to be isolated from the switch to gpcs, and need to show how much benefit is provided for a per-VM hash table vs. (much) simpler approaches, e.g. versus a stupid simple per-vCPU LRU cache, a la KVM's pgd caching. There also needs to be an analysis of the downsides of the performance gains. If I'm putting the pieces together correctly, quoting a snippet from the cover letter, the performance benefits come from: The pfncache infrastructure maintains persistent mappings as long as the page GPA does not change, eliminating the memremap/memunmap overhead on every VM entry/exit cycle. Which means that this caching effectively eliminates the security value added by removing memory from the kernel's direct map. If, in the long term, we're collectively moving towards guest_memfd (for setups that don't want all of the overcommit goodness provided by mm/), then the performance provided by this approach is directly at odds with the efforts to remove guest_memfd memory from the direct map for added security. E.g. if the ratio of L2:L1 contexts is pushed high enough, it would be possible to have the majority of guest memory mapped into the host kernel. That then raises the question of whether or not we are optimizing the right thing. E.g. if we can somehow make map+unmap blazing fast for "all" real world usage that matters, then maybe we don't need this type of caching. In general, this needs a _lot_ more justification on the design decisions. A lot, a lot, a _lot_ more. This is too much code and complexity for me to even start reviewing without hard data.