From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D411258CD0 for ; Wed, 4 Feb 2026 02:16:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770171377; cv=none; b=NuP5VySwhh8ycbfLURQQ7MURMipW54TUdOju7nNhhLHeUpj0C8RSFysHvT9ZwrsezGbH2+VFF0hb/E2fYwiAf6ZB/XkeKSJccxsrnPtlVGsQPwZJ8qocEF5OOZU2z6oFgd6csHwWl3qJ6955/3NvP49FGwG65fsZ0RdVLt7MLQA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770171377; c=relaxed/simple; bh=Sf5jVo4PBZ8D8xrb/7N0ZgU5WeJfHYchhsPy0/JAdI4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XKLTUVoRsUGj5ClbttQTG60nb2W9bgrOy3YAbcjOMJ7HMfCVkCppoDfEV3r7QqUYhaiBEJcjybIuJ79nGjNEYv/rlr/+rjWM976jrx69cJo/+j8v3P6xK/Ysy1PGisko4WP84kxTe5HVL8JhVeCaAtX/VXPwMpS3koDmR0YHKT8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ChHHwlDE; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ChHHwlDE" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c61dee98720so3756085a12.0 for ; Tue, 03 Feb 2026 18:16:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1770171375; x=1770776175; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7Mow+gtTd3xBrxz7PLLvBC7PjYT0zAewqKSEgx5mE5o=; b=ChHHwlDEldl+UDoXqdEHz5FyY9JUa6GafFosM39ISyfXmwT59fEqVi975duGyGPl9x KLKzvz3O7WXMuaNK9dDRgYH2VyuqfjtiQ5PHEsxbIxpIFVTQCiH8Haj4PolLM4cRzVUt 37vhik7dfqhOSy80yZuxiKDI3XsxgAtkbjVMRebu6jM6kUao8hEFBHCghNoCxTEsbaqs LQ1rawcACZzhaCdLgFl5E9xGeoweg1XVPzjlnyPF/MoVdtdb3tz+9D8dq2qq2b/dF3IM Yq7zj3eoLFH/ulCDYkwFWtvsTPXqTPPxbHYBw/UyWy7gh1I+CBp4cK94MxqyadnuB6j0 DnaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770171375; x=1770776175; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7Mow+gtTd3xBrxz7PLLvBC7PjYT0zAewqKSEgx5mE5o=; b=MHUJTygNQnIQKudvnghS1v9gy1GVsOz/RQ1Qh9b+yZtce6CWfLJX9T4y8TmJm4d3lH XDKA0xQGNrVofEvm5jdZeINXWfU5CSVhERHZqJy8rzYAqiSg4jvVeRCD+473sZLxLTKJ yQTovl1H1rd8Z29Uf7IPlPFe/5jByCEyIcUaLqKTtyVtoDjUQEAzjjYU8WrYamsSK3jw 0V2Tc4bB54v/iqzqJrA3OYxQeZ0/5Wnz7/5vnuY7Zyq+ZWq9eXFOPLJwZkynH0n8tLhX qjWwFRhy5qVKBibOlaoVtcyaT85QqK98NHnGgw3Osvztp0EVo4gOChCFPYhOxACxw5Zb NGWQ== X-Forwarded-Encrypted: i=1; AJvYcCV+3OjVF9yigkQuTA8lobLTntwu7BKQEOL14HgVw9vqddJg1/BUPSnMir57D14tOL4M3omLiy+SOmD/SXk=@vger.kernel.org X-Gm-Message-State: AOJu0Ywklf8TiTDhFunxoE7TtBqZUsqTZBkLuYOflgMl2cspxwlHUTEC 6j2FvUGewOs3jivZr38U0406HU1QJkriX7F6mMuMkUnHHjqh3vSWnlwe5gmC+vp6e6jbEaLwOnB tcNXFqg== X-Received: from pgan187.prod.google.com ([2002:a63:40c4:0:b0:c51:8b09:2a32]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6300:6682:b0:38d:e87c:48c2 with SMTP id adf61e73a8af0-39372498707mr1358827637.58.1770171374750; Tue, 03 Feb 2026 18:16:14 -0800 (PST) Date: Tue, 3 Feb 2026 18:16:13 -0800 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> <20260129011517.3545883-20-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator From: Sean Christopherson To: Kai Huang Cc: "kvm@vger.kernel.org" , "linux-coco@lists.linux.dev" , Xiaoyao Li , Yan Y Zhao , "dave.hansen@linux.intel.com" , "kas@kernel.org" , "mingo@redhat.com" , "binbin.wu@linux.intel.com" , "pbonzini@redhat.com" , "ackerleytng@google.com" , "linux-kernel@vger.kernel.org" , Isaku Yamahata , "sagis@google.com" , "tglx@kernel.org" , Rick P Edgecombe , "bp@alien8.de" , Vishal Annapurve , "x86@kernel.org" Content-Type: text/plain; charset="us-ascii" On Tue, Feb 03, 2026, Kai Huang wrote: > On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote: > > On Tue, Feb 03, 2026, Kai Huang wrote: > > > On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote: > > > > Extend "struct kvm_mmu_memory_cache" to support a custom page allocator > > > > so that x86's TDX can update per-page metadata on allocation and free(). > > > > > > > > Name the allocator page_get() to align with __get_free_page(), e.g. to > > > > communicate that it returns an "unsigned long", not a "struct page", and > > > > to avoid collisions with macros, e.g. with alloc_page. > > > > > > > > Suggested-by: Kai Huang > > > > Signed-off-by: Sean Christopherson > > > > > > I thought it could be more generic for allocating an object, but not just a > > > page. > > > > > > E.g., I thought we might be able to use it to allocate a structure which has > > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page'. But > > > it seems you abandoned this idea. May I ask why? Just want to understand > > > the reasoning here. > > > > Because that requires more complexity and there's no known use case, and I don't > > see an obvious way for a use case to come along. All of the motiviations for a > > custom allocation scheme that I can think of apply only to full pages, or fit > > nicely in a kmem_cache. > > > > Specifically, the "cache" logic is already bifurcated between "kmem_cache' and > > "page" usage. Further splitting the "page" case doesn't require modifications to > > the "kmem_cache" case, whereas providing a fully generic solution would require > > additional changes, e.g. to handle this code: > > > > page = (void *)__get_free_page(gfp_flags); > > if (page && mc->init_value) > > memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64)); > > > > It certainly wouldn't be much complexity, but this code is already a bit awkward, > > so I don't think it makes sense to add support for something that will probably > > never be used. > > For this particular piece of code, we can add a helper for allocating normal > page table pages, get rid of mc->init_value completely and hook mc-page_get() > to that helper. Hmm, I like the idea, but I don't think it would be a net positive. In practice, x86's "normal" page tables stop being normal, because KVM now initializes all SPTEs with BIT(63)=1 on x86-64. And that would also incur an extra RETPOLINE on all those allocations. > A bonus is we can then call that helper in all places when KVM needs to > allocate a page for normal page table instead of just calling > get_zerod_pages() directly, e.g., like the one in > tdp_mmu_alloc_sp_for_split(), Huh. Actually, that's a bug, but not the one you probably expect. At a glance, it looks like KVM incorrectly zeroing the page instead of initializing it with SHADOW_NONPRESENT_VALUE. But it's actually a "performance" bug, because KVM doesn't actually need to pre-initialize the page: either the page will never be used, or every SPTE will be initialized as a child SPTE. So that one _should_ be different, e.g. should be: diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a32192c35099..36afd67601fc 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1456,7 +1456,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, if (!sp) return NULL; - sp->spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); + sp->spt = (void *)__get_free_page(GFP_KERNEL_ACCOUNT); if (!sp->spt) goto err_spt; > so that we can have a consistent way for allocating normal page table pages.