From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EF4FC433EF for ; Sun, 15 May 2022 11:42:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229585AbiEOLm6 (ORCPT ); Sun, 15 May 2022 07:42:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229941AbiEOLm6 (ORCPT ); Sun, 15 May 2022 07:42:58 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D26C513DD8; Sun, 15 May 2022 04:42:56 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 4863B60F6B; Sun, 15 May 2022 11:42:56 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 83016C385B8; Sun, 15 May 2022 11:42:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1652614975; bh=7dIbSSC/bm+DplSUSi0xigmR6pbZLkiHLoE4XzIM19M=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Rpvso4EBNT37tDYBsc9ObnOQUk00vKtVas7WliAOCjUUzdiLbHfFD+h7bwe30PB59 UaoTrP/EssBsJ0iT+cG3vQ8D1QeQlBX40eREQ29IBiw5ji1ZdBwnlJPUAf4JXZu1ak kc3N2hlk0cbWjIBTdlAvsttPMDv+u41KescmWnsKxn7rRDO7XyxHgHvqiSHI33HkSB GN68ze6eTNgYMB9IR7l+5/fxcAwfvwb51nvSZILeFEK6Eo6gU12fczb0Yxn/LXs0NL hvND1HuhpGq8NIqc9kCY8yCNM7RbDqvj1mhYepKiGSOoPKrAMCXznvSTz9UG5dZlYp ib4JQWSxVZSRQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nqCeC-00BQ2N-It; Sun, 15 May 2022 12:42:52 +0100 Date: Sun, 15 May 2022 12:42:52 +0100 Message-ID: <87r14v58eb.wl-maz@kernel.org> From: Marc Zyngier To: David Matlack Cc: Paolo Bonzini , Huacai Chen , Aleksandar Markovic , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , Sean Christopherson , Andrew Jones , Ben Gardon , Peter Xu , maciej.szmigiero@oracle.com, "moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)" , "open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" , "open list:KERNEL VIRTUAL MACHINE FOR MIPS (KVM/mips)" , "open list:KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv)" , Peter Feiner , Lai Jiangshan Subject: Re: [PATCH v5 20/21] KVM: Allow for different capacities in kvm_mmu_memory_cache structs In-Reply-To: <20220513202819.829591-21-dmatlack@google.com> References: <20220513202819.829591-1-dmatlack@google.com> <20220513202819.829591-21-dmatlack@google.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: dmatlack@google.com, pbonzini@redhat.com, chenhuacai@kernel.org, aleksandar.qemu.devel@gmail.com, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, seanjc@google.com, drjones@redhat.com, bgardon@google.com, peterx@redhat.com, maciej.szmigiero@oracle.com, kvmarm@lists.cs.columbia.edu, linux-mips@vger.kernel.org, kvm@vger.kernel.org, kvm-riscv@lists.infradead.org, pfeiner@google.com, jiangshanlai@gmail.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: linux-mips@vger.kernel.org On Fri, 13 May 2022 21:28:18 +0100, David Matlack wrote: > > Allow the capacity of the kvm_mmu_memory_cache struct to be chosen at > declaration time rather than being fixed for all declarations. This will > be used in a follow-up commit to declare an cache in x86 with a capacity > of 512+ objects without having to increase the capacity of all caches in > KVM. > > This change requires each cache now specify its capacity at runtime, > since the cache struct itself no longer has a fixed capacity known at > compile time. To protect against someone accidentally defining a > kvm_mmu_memory_cache struct directly (without the extra storage), this > commit includes a WARN_ON() in kvm_mmu_topup_memory_cache(). > > In order to support different capacities, this commit changes the > objects pointer array to be dynamically allocated the first time the > cache is topped-up. > > An alternative would be to lay out the objects array after the > kvm_mmu_memory_cache struct, which can be done at compile time. But that > change, unfortunately, adds some grottiness to arm64 and riscv, which > uses a function-local (i.e. stack-allocated) kvm_mmu_memory_cache > struct. Since C does not allow anonymous structs in functions, the new > wrapper struct that contains kvm_mmu_memory_cache and the objects > pointer array, must be named, which means dealing with an outer and > inner struct. The outer struct can't be dropped since then there would > be no guarantee the kvm_mmu_memory_cache struct and objects array would > be laid out consecutively on the stack. You may want to drop this paragraph. Someone interested in the history can find it on the list. > > No functional change intended. > > Signed-off-by: David Matlack > --- > arch/arm64/kvm/arm.c | 1 + > arch/arm64/kvm/mmu.c | 5 ++++- > arch/mips/kvm/mips.c | 2 ++ > arch/riscv/kvm/mmu.c | 8 ++++---- > arch/riscv/kvm/vcpu.c | 1 + > arch/x86/kvm/mmu/mmu.c | 9 +++++++++ > include/linux/kvm_types.h | 9 +++++++-- > virt/kvm/kvm_main.c | 20 ++++++++++++++++++-- > 8 files changed, 46 insertions(+), 9 deletions(-) > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 7fceb855fa71..aa1e0c1659d4 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -320,6 +320,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > vcpu->arch.target = -1; > bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES); > > + vcpu->arch.mmu_page_cache.capacity = KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE; > vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO; > > /* Set up the timer */ > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 53ae2c0640bc..2f2ef6b60ff4 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -764,7 +764,10 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa, > { > phys_addr_t addr; > int ret = 0; > - struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, }; > + struct kvm_mmu_memory_cache cache = { > + .capacity = KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE, > + .gfp_zero = __GFP_ZERO, > + }; > struct kvm_pgtable *pgt = kvm->arch.mmu.pgt; > enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_DEVICE | > KVM_PGTABLE_PROT_R | > diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c > index a25e0b73ee70..45c7179144dc 100644 > --- a/arch/mips/kvm/mips.c > +++ b/arch/mips/kvm/mips.c > @@ -387,6 +387,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > if (err) > goto out_free_gebase; > > + vcpu->arch.mmu_page_cache.capacity = KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE; > + > return 0; > > out_free_gebase: > diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c > index f80a34fbf102..8c2338ecc246 100644 > --- a/arch/riscv/kvm/mmu.c > +++ b/arch/riscv/kvm/mmu.c > @@ -347,10 +347,10 @@ static int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa, > int ret = 0; > unsigned long pfn; > phys_addr_t addr, end; > - struct kvm_mmu_memory_cache pcache; > - > - memset(&pcache, 0, sizeof(pcache)); > - pcache.gfp_zero = __GFP_ZERO; > + struct kvm_mmu_memory_cache pcache = { > + .capacity = KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE, > + .gfp_zero = __GFP_ZERO, > + }; > > end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK; > pfn = __phys_to_pfn(hpa); > diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c > index 6785aef4cbd4..bbcb9d4a04fb 100644 > --- a/arch/riscv/kvm/vcpu.c > +++ b/arch/riscv/kvm/vcpu.c > @@ -94,6 +94,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) > > /* Mark this VCPU never ran */ > vcpu->arch.ran_atleast_once = false; > + vcpu->arch.mmu_page_cache.capacity = KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE; > vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO; > > /* Setup ISA features available to VCPU */ > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c > index 4b40fa2e27eb..dad7e19ef8ed 100644 > --- a/arch/x86/kvm/mmu/mmu.c > +++ b/arch/x86/kvm/mmu/mmu.c > @@ -5803,12 +5803,21 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) > { > int ret; > > + vcpu->arch.mmu_pte_list_desc_cache.capacity = > + KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE; > vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache; > vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO; > > + vcpu->arch.mmu_page_header_cache.capacity = > + KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE; > vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache; > vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO; > > + vcpu->arch.mmu_shadowed_info_cache.capacity = > + KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE; > + > + vcpu->arch.mmu_shadow_page_cache.capacity = > + KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE; > vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO; > > vcpu->arch.mmu = &vcpu->arch.root_mmu; > diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h > index ac1ebb37a0ff..549103a4f7bc 100644 > --- a/include/linux/kvm_types.h > +++ b/include/linux/kvm_types.h > @@ -83,14 +83,19 @@ struct gfn_to_pfn_cache { > * MMU flows is problematic, as is triggering reclaim, I/O, etc... while > * holding MMU locks. Note, these caches act more like prefetch buffers than > * classical caches, i.e. objects are not returned to the cache on being freed. > + * > + * The storage for the cache object pointers is allocated dynamically when the > + * cache is topped-up. The capacity field defines the number of object pointers > + * available after the struct. > */ > struct kvm_mmu_memory_cache { > int nobjs; > + int capacity; > gfp_t gfp_zero; > struct kmem_cache *kmem_cache; > - void *objects[KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE]; > + void **objects; > }; > -#endif > +#endif /* KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE */ One thing that is missing here (and was already missing) is to make it plain that kvm_mmu_memory_cache can only be used in contexts where there are no concurrent accesses to the cache. > > #define HALT_POLL_HIST_COUNT 32 > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index e089db822c12..264e4107e06f 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -371,12 +371,23 @@ static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc, > > int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min) > { > + gfp_t gfp = GFP_KERNEL_ACCOUNT; > void *obj; > > if (mc->nobjs >= min) > return 0; > - while (mc->nobjs < ARRAY_SIZE(mc->objects)) { > - obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT); > + > + if (WARN_ON(mc->capacity == 0)) > + return -EINVAL; > + > + if (!mc->objects) { > + mc->objects = kvmalloc_array(sizeof(void *), mc->capacity, gfp); > + if (!mc->objects) > + return -ENOMEM; > + } > + > + while (mc->nobjs < mc->capacity) { > + obj = mmu_memory_cache_alloc_obj(mc, gfp); > if (!obj) > return mc->nobjs >= min ? 0 : -ENOMEM; > mc->objects[mc->nobjs++] = obj; > @@ -397,6 +408,11 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) > else > free_page((unsigned long)mc->objects[--mc->nobjs]); > } > + > + kvfree(mc->objects); > + > + /* Note, must set to NULL to avoid use-after-free in the next top-up. */ > + mc->objects = NULL; > } > > void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc) Otherwise: Reviewed-by: Marc Zyngier M. -- Without deviation from the norm, progress is not possible.