From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f48.google.com (mail-wm1-f48.google.com [209.85.128.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED8DB25393B for ; Fri, 21 Nov 2025 11:11:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.48 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763723487; cv=none; b=McMvNPJ0FetmjIQibJjNwfwn+WpzObSpNY5aOmI2iKLPahmfeTOFUq2GTP5ThcEu0DFBRR12py71O992lwYAcP2YDdUCnjF2BCMw33+bxeLxixPUs4OdLIXuIX7zHOg8aTJvQNDkWUZhaw/GGjseeFn31uXuJuqvEm1RbJt+e/Y= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763723487; c=relaxed/simple; bh=ea5lE3z+X38KwIeBD+4HkrPwqhtLVx6KnsZqPdtzMDw=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=l5D/jy3QlFV2m+A7OxrUmaNfMj+qZxkzfPJDHHYnf7UFHWfVvLBAJXmG83KjQnXWTy6Dy01lHV9pLJjZyiHPSaTI0PxKQGPu44AH4y6+I3byRKld2HNTUWoQ+bUqFnDWwfP89/HzPoPr5qjha/0KI/0bPZBxlnwPSnBlAwHaVt0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=BCmnvvct; arc=none smtp.client-ip=209.85.128.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="BCmnvvct" Received: by mail-wm1-f48.google.com with SMTP id 5b1f17b1804b1-47795f6f5c0so12534955e9.1 for ; Fri, 21 Nov 2025 03:11:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763723482; x=1764328282; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=CrSvNsjaC/3WVr/Zsh1+adlhBYeJfG+lNGfokhPzPQQ=; b=BCmnvvct/I24QUGtQv4P9RMY3/oPRC0XZoGBRdbhqhwpaP1KeUDVP0ZzqSnfJR6Adv JQmUoGVXohGbpIA4uYjxYx3vU267O98Hs6zRACUSmFCB0/dSoYGfmcVIK7BTjqfI7UNF Kf+qaKe/+0QYbF8ppf9uZsiM0VLHcIYJmfIvCCIX/rdK4XmL2uUW4Lst9/RFt6ILN4+v ldPgjNpa2QT5jFi03sAS7lYLkrXjADYTKEFgtkA89G9QyUkYgYKxixbN7kMtPdyMeNiC //QbKOrSqXeTx2jenqQf5pjgCHglIZ/QF6OVHSgujGqVl99K1Ab9NUlQfYXNn2wTFiv0 E8nQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763723482; x=1764328282; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CrSvNsjaC/3WVr/Zsh1+adlhBYeJfG+lNGfokhPzPQQ=; b=A1dLRpLecKVR7KTPHrFA29pWVwg4LDMQgLKfOMSCm+KA6RgsX/MM3id+Ls3jiZFrY9 Xud148gMIn8UeBxtqC/wTfJ+s7Xu2E9TqqjhF4FzaSZiuCUfgMNDmrd9URgToiazKDPF lzDTPCH6rFgpKviz+OEFOPlttQvYzpWiBlHfmyYTeYiKgonbSt5XiMSf7XMeez073dFv ufTNhXNvSAspoD0eeMhB2hRoATH5hENuERHafNKdXExE32rfSZZSqaBmXrQfHc98ldSZ Nr1vUkMnBBruGmf/52/aHuZlfMRdLp1q2Y2WZM3HPdh/e1Azk0GhUznXVatSwJ9cN48M 7Yig== X-Forwarded-Encrypted: i=1; AJvYcCXLv1NHcriVPU/XZegmLp3AaF6IAmQAF+hrSrjVZuzPWrjyd8pzci3z6Y47XSmrDf5CdTPTD8gNxpZcpVw=@vger.kernel.org X-Gm-Message-State: AOJu0Yw++KBw2fonvygndNaNTEMKCe19n8jSh5LMNmpUTqQK0/xvhOdq nEKHoo5QZ3ioBalf+DxjDZjAS10yJu+Fn1F4DdkMGWbx1hADxjlY4+sW X-Gm-Gg: ASbGncsOxnsYO7v1tXmVBkFCojAKwHJ+6y2jfdX0RzqPjaLfoOxOTcStI3/BVmcs8Gx ca8YTP+JSoZEAoRKw3xyzcldoZ5G8/63fP4wwMa1Efjm/iM+aSh1jZ2+HC3YX1VXzl+CQNMS0Rg Aue3/kPB33JBdUaBr5kQCrHxFLCuFyGhy/v1cPkmxmQBtYrRzwZmlnpwxAQeeG3nJcZ4oXa+pjf tuYkZH2I2LBuXbjQ7OdyS5xXXe6pBozHhgRxVSud+Kdo/sPnF+zrihbtP/RZdz6TxZXo733cOyP mamZ0tDUPYhzuAI3q+hESNXKvI2IhnluLYnDlqnv8mzlQutFQnqG6eXIHxLEFTrLTdMpRNtAlNn 1gpY7ctQUCXAOKj6Ng72qlECvycLImlIP0f8LuKsl44snjOZIZhyRrE9bb0nmfVAs8vBEfIUIe+ hso8YEfjyKwwDQ0zLnnYRxERJz4CBCCOjePK1Ww/0e2UzNOsQAaCAuo9BbO1lI/3KTFxHNGOZAi FlotXkyULKsHJyqhY5vojP8MWqvDrGU X-Google-Smtp-Source: AGHT+IEC7hOvqyYPygsb1D7Hq8HrYWfmnLLe9fQ2FxMgvDY1wOad9Qi6jbWaCOYLWB5h194p4b4Gyw== X-Received: by 2002:a05:600c:468c:b0:477:79f8:daa8 with SMTP id 5b1f17b1804b1-477c01b219amr19202265e9.17.1763723481939; Fri, 21 Nov 2025 03:11:21 -0800 (PST) Received: from ip-10-0-150-200.eu-west-1.compute.internal (ec2-52-49-196-232.eu-west-1.compute.amazonaws.com. [52.49.196.232]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-42cb7f363e4sm10484180f8f.12.2025.11.21.03.11.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 21 Nov 2025 03:11:21 -0800 (PST) From: Fred Griffoul To: kvm@vger.kernel.org Cc: seanjc@google.com, pbonzini@redhat.com, vkuznets@redhat.com, shuah@kernel.org, dwmw@amazon.co.uk, linux-kselftest@vger.kernel.org, linux-kernel@vger.kernel.org, Fred Griffoul Subject: [PATCH v3 00/10] KVM: nVMX: Improve performance for unmanaged guest memory Date: Fri, 21 Nov 2025 11:11:03 +0000 Message-ID: <20251121111113.456628-1-griffoul@gmail.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Fred Griffoul This patch series addresses both performance and correctness issues in nested VMX when handling guest memory. During nested VMX operations, L0 (KVM) accesses specific L1 guest pages to manage L2 execution. These pages fall into two categories: pages accessed only by L0 (such as the L1 MSR bitmap page or the eVMCS page), and pages passed to the L2 guest via vmcs02 (such as APIC access, virtual APIC, and posted interrupt descriptor pages). The current implementation uses kvm_vcpu_map/unmap, which causes two issues. First, the current approach is missing proper invalidation handling in critical scenarios. Enlightened VMCS (eVMCS) pages can become stale when memslots are modified, as there is no mechanism to invalidate the cached mappings. Similarly, APIC access and virtual APIC pages can be migrated by the host, but without proper notification through mmu_notifier callbacks, the mappings become invalid and can lead to incorrect behavior. Second, for unmanaged guest memory (memory not directly mapped by the kernel, such as memory passed with the mem= parameter or guest_memfd for non-CoCo VMs), this workflow invokes expensive memremap/memunmap operations on every L2 VM entry/exit cycle. This creates significant overhead that impacts nested virtualization performance. This series replaces kvm_host_map with gfn_to_pfn_cache in nested VMX. The pfncache infrastructure maintains persistent mappings as long as the page GPA does not change, eliminating the memremap/memunmap overhead on every VM entry/exit cycle. Additionally, pfncache provides proper invalidation handling via mmu_notifier callbacks and memslots generation check, ensuring that mappings are correctly updated during both memslot updates and page migration events. As an example, a microbenchmark using memslot_perf_test with 8192 memslots demonstrates huge improvements in nested VMX operations with unmanaged guest memory (this is a synthetic benchmark run on AWS EC2 Nitro instances, and the results are not representative of typical nested virtualization workloads): Before After Improvement map: 26.12s 1.54s ~17x faster unmap: 40.00s 0.017s ~2353x faster unmap chunked: 10.07s 0.005s ~2014x faster The series is organized as follows: Patches 1-5 handle the L1 MSR bitmap page and system pages (APIC access, virtual APIC, and posted interrupt descriptor). Patch 1 converts the MSR bitmap to use gfn_to_pfn_cache. Patches 2-3 restore and complete "guest-uses-pfn" support in pfncache. Patch 4 converts the system pages to use gfn_to_pfn_cache. Patch 5 adds a selftest for cache invalidation and memslot updates. Patches 6-7 add enlightened VMCS support. Patch 6 avoids accessing eVMCS fields after they are copied into the cached vmcs12 structure. Patch 7 converts eVMCS page mapping to use gfn_to_pfn_cache. Patches 8-10 implement persistent nested context to handle L2 vCPU multiplexing and migration between L1 vCPUs. Patch 8 introduces the nested context management infrastructure. Patch 9 integrates pfncache with persistent nested context. Patch 10 adds a selftest for this L2 vCPU context switching. v3: - fixed warnings reported by kernel test robot in patches 7 and 8. v2: - Extended series to support enlightened VMCS (eVMCS). - Added persistent nested context for improved L2 vCPU handling. - Added additional selftests. Suggested-by: dwmw@amazon.co.uk Fred Griffoul (10): KVM: nVMX: Implement cache for L1 MSR bitmap KVM: pfncache: Restore guest-uses-pfn support KVM: x86: Add nested state validation for pfncache support KVM: nVMX: Implement cache for L1 APIC pages KVM: selftests: Add nested VMX APIC cache invalidation test KVM: nVMX: Cache evmcs fields to ensure consistency during VM-entry KVM: nVMX: Replace evmcs kvm_host_map with pfncache KVM: x86: Add nested context management KVM: nVMX: Use nested context for pfncache persistence KVM: selftests: Add L2 vcpu context switch test arch/x86/include/asm/kvm_host.h | 32 ++ arch/x86/include/uapi/asm/kvm.h | 2 + arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/nested.c | 199 ++++++++ arch/x86/kvm/vmx/hyperv.c | 5 +- arch/x86/kvm/vmx/hyperv.h | 33 +- arch/x86/kvm/vmx/nested.c | 469 ++++++++++++++---- arch/x86/kvm/vmx/vmx.c | 8 + arch/x86/kvm/vmx/vmx.h | 16 +- arch/x86/kvm/x86.c | 19 +- include/linux/kvm_host.h | 34 +- include/linux/kvm_types.h | 1 + tools/testing/selftests/kvm/Makefile.kvm | 2 + .../selftests/kvm/x86/vmx_apic_update_test.c | 302 +++++++++++ .../selftests/kvm/x86/vmx_l2_switch_test.c | 416 ++++++++++++++++ virt/kvm/kvm_main.c | 3 +- virt/kvm/kvm_mm.h | 6 +- virt/kvm/pfncache.c | 43 +- 18 files changed, 1469 insertions(+), 123 deletions(-) create mode 100644 arch/x86/kvm/nested.c create mode 100644 tools/testing/selftests/kvm/x86/vmx_apic_update_test.c create mode 100644 tools/testing/selftests/kvm/x86/vmx_l2_switch_test.c base-commit: 6b36119b94d0b2bb8cea9d512017efafd461d6ac prerequisite-patch-id: afd3db49735b65c8a642de8dab7d0160d5da4b67 -- 2.43.0