From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C74AC83F09 for ; Wed, 9 Jul 2025 11:00:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 528166B00D8; Wed, 9 Jul 2025 07:00:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B3C36B00D9; Wed, 9 Jul 2025 07:00:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B7CD6B00DA; Wed, 9 Jul 2025 07:00:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 19A736B00D8 for ; Wed, 9 Jul 2025 07:00:18 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9410F140409 for ; Wed, 9 Jul 2025 11:00:17 +0000 (UTC) X-FDA: 83644432074.30.4B3B4EF Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by imf25.hostedemail.com (Postfix) with ESMTP id 8BA9AA0015 for ; Wed, 9 Jul 2025 11:00:15 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DpchlxfU; spf=pass (imf25.hostedemail.com: domain of 3vUtuaAUKCFgJ01106EE6B4.2ECB8DKN-CCAL02A.EH6@flex--tabba.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3vUtuaAUKCFgJ01106EE6B4.2ECB8DKN-CCAL02A.EH6@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752058815; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UaUwxTScFAZ1KHnv3Pyvr3YYeYpiKzC/UYnl14S+xII=; b=i1xLBNxMhXdiVaRHX59i+4nP1vsF1QG5NY11rZzbgKAYbtE4POG9fYu5TzQF9rakeLXbZt NN6rUC3hXnR1qRM5rdvWmHmk0yDRLPY4kCN6TnoxleS7XoIJP0lRZE/+GQjR8IrvlGJcMc NBv8LaHC70h/r8C3DtYmozAVcNsqVxQ= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=DpchlxfU; spf=pass (imf25.hostedemail.com: domain of 3vUtuaAUKCFgJ01106EE6B4.2ECB8DKN-CCAL02A.EH6@flex--tabba.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3vUtuaAUKCFgJ01106EE6B4.2ECB8DKN-CCAL02A.EH6@flex--tabba.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752058815; a=rsa-sha256; cv=none; b=SS/CAq1+EtDoVPQYU3bzjqoGnpjxWkZaVuXDmmmbNOgwV+bzoVJb75sO90rdM8q1LRLYEv DmChEpMkP7rhCUi30lUkTVNU8IzKDTCDkn4VE7OmUpMr+T41GNYafnE6TaUaAOZDfE9Wbe 6EhHs8cmn9uyPhy2aG6flILqRh4FPtI= Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-451d5600a54so43241415e9.2 for ; Wed, 09 Jul 2025 04:00:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1752058814; x=1752663614; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UaUwxTScFAZ1KHnv3Pyvr3YYeYpiKzC/UYnl14S+xII=; b=DpchlxfUk0YCynIQ5fqB55mlsl2mKBuTM6embrJHNeC5haoWwJW5TXxuhwHlwPGxgZ +5fVS9PjkQ74VgjshYnIULG8jgl64Naz0YRFKNgoxvzba6bovkAk94/OuBB5ifNwR7Tb pgA2kDWXPNIwQC9z0K+GFNakVivZBcj9R8/FTqwHAxTYQR4qa91sIjoaXGTpzEv5MXx5 NliSRWPqXLrvO4Qrp80bOKsGvaqzwuIKVzP5cJ07l7lCtpbTI3IO7WSLk1tf51fYKlz+ Vbbbxil30HWMGFVdEh9m1msqQ8qX/BJVrJo9WtG2f2VMk1W96f2eZDwa8S7/BeitcXze LgEg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752058814; x=1752663614; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UaUwxTScFAZ1KHnv3Pyvr3YYeYpiKzC/UYnl14S+xII=; b=RfsU3Z5h/qjI21NXNKE+r9d8bQucPCASQkkVnmOUoxUTBpDOZpneCQHblk7UXvl1Hh wjcGc3gfJ06EMjHMuAcZZ4MhoT40jGglKJ3L9G7U/JAcxicsPjN/vlIYIiNmDZW0GHnt BLydvIMyaah0Fyy530BE+W3tBC+iJ7PWQBD05sKD4qsDOx27WvoHSEnMpLpOVz2Z8lcS O/AfTe6qZB+hIjBov+dZJvrwWWyL25bLmyarU1A71tH15Wn28CE3pr77GFS4IDmZfSKz k3p1UTMD62zExUzxiVDFtloKwYZa7jUfvhWaWk3MzDROwBz8XbR0n5XZgoZv6kRAobgk GAxQ== X-Forwarded-Encrypted: i=1; AJvYcCV6RlOxVNPQQ0GrZHKwZoLXqwTAImmM895Y6ug26Efj1nRC+MxQ2stgFyMFU71F26rxY8T44K2lyw==@kvack.org X-Gm-Message-State: AOJu0Yz7yOjHdhXQTAgGT2igEitGYmVpjkltXUJAhWGLw2GO7BM/8iJ9 HN5bCMngUjVsYf1KK3CX94brin9/mv5jekz5dgz1E06WYlXhc4mHYXb9vnS8sB/Uj95V0uHovWZ k4A== X-Google-Smtp-Source: AGHT+IGMn7XkMgrUdBk/ivBuJpGwNPA2Q7ebQymW3M2klLX0THE5Cc8XW0j7s3/fzZMIZ9xlnwtAe0ePng== X-Received: from wmqd14.prod.google.com ([2002:a05:600c:34ce:b0:453:86cc:7393]) (user=tabba job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:1e09:b0:43c:fa24:873e with SMTP id 5b1f17b1804b1-454d532656emr20423845e9.13.1752058813962; Wed, 09 Jul 2025 04:00:13 -0700 (PDT) Date: Wed, 9 Jul 2025 11:59:38 +0100 In-Reply-To: <20250709105946.4009897-1-tabba@google.com> Mime-Version: 1.0 References: <20250709105946.4009897-1-tabba@google.com> X-Mailer: git-send-email 2.50.0.727.gbf7dc18ff4-goog Message-ID: <20250709105946.4009897-13-tabba@google.com> Subject: [PATCH v13 12/20] KVM: x86/mmu: Consult guest_memfd when computing max_mapping_level From: Fuad Tabba To: kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev Cc: pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, ackerleytng@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com, tabba@google.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 8BA9AA0015 X-Stat-Signature: i5r9z9o5695bt6hm3hswhz1pyik8r5yt X-Rspam-User: X-HE-Tag: 1752058815-363152 X-HE-Meta: U2FsdGVkX1+Yv5NnNQRvet32DycDHHJUJ45hYMu6SxZxFez3pUdrhuHkGfHwLHTfXZ7RbnPud2HMgl6NTrT9VRNpks6mmxcIi95bNdobzeW97ZkU9zFyVTHZAdsJ72WE/xC+EVBKB1DN0AkUc+81WexBoA/dq5tAqpi65cHcoMvT0BTtpD+b7dnnGMKAat0nyQYbIVoZj0J/QAVJXi9Bt8Vu/wVEnl51E89fcolO+qWv1gvxA7tSkjCEqsohnZTBBOLLmEoEHd7GRS22wZ5Mw2jyqlwGGVW7sXbq7PBzy2hkaR2L1WYukjl+HF666qyIvXya3TD8jOmug/GnZbK3y8aDCowp2cGsHMOPdkZQrdTY1D+xUJcSbUyioHq5Rrfz5/36tL1LxQmMlth8/MIQfl0YyyJ83FGPQaXVXlqnPDYVBJT12p8AxI4GdvuEvuAOsudZHmnugXJMqbXbEels7mTcuMEuDXulZL7wiEeeq+VeFcZMplLmiMYpkQvtOvoRcCYSk+Mu5kS8G0nv7oEpNuL7XRctBl4AA3sM9rSw5KISsjVliu9FpMKHbvKtHPHZlndueucvGBWabvfrUOpELWd5ya78k7AjtfvIXxX6IAkRaKswhGSF4QQiYHcYkTOY2lW88wFpTtw+MDd9AgHhRDAIReIfTC/4gzw+wVSJlFkDOOvNmrTtUR1WHVDmn5bhfshm0Rsj4cxsyA6fdRGg3VLh/In0FNj/uHhw+4QSHP+OauAYdbIrA0nIsmENFSfeqqIurMbhuNXBwSOCSFsGwsVX1ITykCVC8HY9p8pCSz3rf48sjW1wG3egv605vcur7ZAFlqlF6vzHA4pquC5jJOdaJ/KlyXJaaxuLcmUshEzQcT/06RTbW8hFRiOvX13I/EvAL9vXUCUyT0H+N4V15tdULZZ3ToxYmbAm+ZKZGqiHp2ePj5LPsrCDucKhBJlpeov2i5WblwUm1z3hEAo vEki6FrW 2j1YsSwxqhLkEAAbgq/TtwUWdQ6CoiCaI8WpXnY8n6KV08t/MHcGPvrzocSnlSbvObP9b3I117cFLz+Os+vTu6p2sbV9j3CmKae8s1qojBjFQEDCOItkDIMLfb4XAniCGk0qOzaSichBtQ2xL3+Zxc0ddlmedL2ScGFIEcmtIest1WQeBt5o9SAdEnxAOaR4OUXo/SdbilehJ5BhnU/8s/AC5rAGnSxYlOqJBpJupVCkmoENbc1fZfEQHhlG03rrOc6fqbx8rQA8uBzTEB+EGtS2LrH41JDOiqhgnLszef8HWs4A60/UOLWzB6Uo2dOcbq0uYwqa9hZ0G8n7zwqkooVLT72AnThBzMf1SiYtu9funptsy/3KZRbkaosaSkiGnAqjC5rjvoH5i4OwZ1s57qIvQVJgQH2avoa0MmACGiHKszmuNrLMRlyeEgY26cct7EflJips2pQIrHplRCfII0UQKIdn4WYix+LoFPblBTab5l59rhKke7Ojbe0UD9p8dj011C+K9LhuLAZ1dT5ujzcCu4Ot59xsj4WEmKjV1YYR5rcC3G+Cy7BTtb/GggBdS4M5OJHBZpOFp9obT/q0SyMKZ4APiC3bxN0g3+glFYeaYMkBbmmr1rGUG3CXm+W65WU/OPAXpQywdqmRswl0j+7hJiVTpQnDO3Ai7GR7Wcuowxxk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Ackerley Tng Modify kvm_mmu_max_mapping_level() to consult guest_memfd for memory regions backed by it when computing the maximum mapping level, especially during huge page recovery. Previously, kvm_mmu_max_mapping_level() was designed primarily for host-backed memory and private pages. With guest_memfd now supporting non-private memory, it's necessary to factor in guest_memfd's influence on mapping levels for such memory. Since guest_memfd can now be used for non-private memory, make kvm_max_max_mapping_level, when recovering huge pages, take input from guest_memfd. Input is taken from guest_memfd as long as a fault to that slot and gfn would have been served from guest_memfd. For now, take a shortcut if the slot and gfn points to memory that is private, since recovering huge pages aren't supported for private memory yet. Since guest_memfd memory can also be faulted into host page tables, __kvm_mmu_max_mapping_level() still applies since consulting lpage_info and host page tables are required. Move functions kvm_max_level_for_order() and kvm_gmem_max_mapping_level() so kvm_mmu_max_mapping_level() can use those functions. Acked-by: David Hildenbrand Signed-off-by: Ackerley Tng Co-developed-by: Fuad Tabba Signed-off-by: Fuad Tabba --- arch/x86/kvm/mmu/mmu.c | 90 ++++++++++++++++++++++++---------------- include/linux/kvm_host.h | 7 ++++ virt/kvm/guest_memfd.c | 17 ++++++++ 3 files changed, 79 insertions(+), 35 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 495dcedaeafa..6d997063f76f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3282,13 +3282,67 @@ static int __kvm_mmu_max_mapping_level(struct kvm *kvm, return min(host_level, max_level); } +static u8 kvm_max_level_for_order(int order) +{ + BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); + + KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) && + order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)); + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) + return PG_LEVEL_1G; + + if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) + return PG_LEVEL_2M; + + return PG_LEVEL_4K; +} + +static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order, + struct kvm_page_fault *fault) +{ + u8 req_max_level; + u8 max_level; + + max_level = kvm_max_level_for_order(order); + if (max_level == PG_LEVEL_4K) + return PG_LEVEL_4K; + + req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault); + if (req_max_level) + max_level = min(max_level, req_max_level); + + return max_level; +} + int kvm_mmu_max_mapping_level(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t gfn) { bool is_private = kvm_slot_has_gmem(slot) && kvm_mem_is_private(kvm, gfn); + int max_level = PG_LEVEL_NUM; + + /* + * For now, kvm_mmu_max_mapping_level() is only called from + * kvm_mmu_recover_huge_pages(), and that's not yet supported for + * private memory, hence we can take a shortcut and return early. + */ + if (is_private) + return PG_LEVEL_4K; - return __kvm_mmu_max_mapping_level(kvm, slot, gfn, PG_LEVEL_NUM, is_private); + /* + * For non-private pages that would have been faulted from guest_memfd, + * let guest_memfd influence max_mapping_level. + */ + if (kvm_memslot_is_gmem_only(slot)) { + int order = kvm_gmem_mapping_order(slot, gfn); + + max_level = min(max_level, + kvm_gmem_max_mapping_level(kvm, order, NULL)); + } + + return __kvm_mmu_max_mapping_level(kvm, slot, gfn, max_level, is_private); } void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) @@ -4450,40 +4504,6 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) vcpu->stat.pf_fixed++; } -static inline u8 kvm_max_level_for_order(int order) -{ - BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G); - - KVM_MMU_WARN_ON(order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G) && - order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M) && - order != KVM_HPAGE_GFN_SHIFT(PG_LEVEL_4K)); - - if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G)) - return PG_LEVEL_1G; - - if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M)) - return PG_LEVEL_2M; - - return PG_LEVEL_4K; -} - -static u8 kvm_gmem_max_mapping_level(struct kvm *kvm, int order, - struct kvm_page_fault *fault) -{ - u8 req_max_level; - u8 max_level; - - max_level = kvm_max_level_for_order(order); - if (max_level == PG_LEVEL_4K) - return PG_LEVEL_4K; - - req_max_level = kvm_x86_call(max_mapping_level)(kvm, fault); - if (req_max_level) - max_level = min(max_level, req_max_level); - - return max_level; -} - static void kvm_mmu_finish_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int r) { diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d2218ec57ceb..662271314778 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2574,6 +2574,7 @@ static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, kvm_pfn_t *pfn, struct page **page, int *max_order); +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn); #else static inline int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, gfn_t gfn, @@ -2583,6 +2584,12 @@ static inline int kvm_gmem_get_pfn(struct kvm *kvm, KVM_BUG_ON(1, kvm); return -EIO; } +static inline int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, + gfn_t gfn) +{ + WARN_ONCE(1, "Unexpected call since gmem is disabled."); + return 0; +} #endif /* CONFIG_KVM_GMEM */ #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_PREPARE diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 2b00f8796a15..d01bd7a2c2bd 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -713,6 +713,23 @@ int kvm_gmem_get_pfn(struct kvm *kvm, struct kvm_memory_slot *slot, } EXPORT_SYMBOL_GPL(kvm_gmem_get_pfn); +/** + * kvm_gmem_mapping_order() - Get the mapping order for this @gfn in @slot. + * + * @slot: the memslot that gfn belongs to. + * @gfn: the gfn to look up mapping order for. + * + * This is equal to max_order that would be returned if kvm_gmem_get_pfn() were + * called now. + * + * Return: the mapping order for this @gfn in @slot. + */ +int kvm_gmem_mapping_order(const struct kvm_memory_slot *slot, gfn_t gfn) +{ + return 0; +} +EXPORT_SYMBOL_GPL(kvm_gmem_mapping_order); + #ifdef CONFIG_KVM_GENERIC_GMEM_POPULATE long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages, kvm_gmem_populate_cb post_populate, void *opaque) -- 2.50.0.727.gbf7dc18ff4-goog