From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D411258CD0
	for <linux-kernel@vger.kernel.org>; Wed,  4 Feb 2026 02:16:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770171377; cv=none; b=NuP5VySwhh8ycbfLURQQ7MURMipW54TUdOju7nNhhLHeUpj0C8RSFysHvT9ZwrsezGbH2+VFF0hb/E2fYwiAf6ZB/XkeKSJccxsrnPtlVGsQPwZJ8qocEF5OOZU2z6oFgd6csHwWl3qJ6955/3NvP49FGwG65fsZ0RdVLt7MLQA=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770171377; c=relaxed/simple;
	bh=Sf5jVo4PBZ8D8xrb/7N0ZgU5WeJfHYchhsPy0/JAdI4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=XKLTUVoRsUGj5ClbttQTG60nb2W9bgrOy3YAbcjOMJ7HMfCVkCppoDfEV3r7QqUYhaiBEJcjybIuJ79nGjNEYv/rlr/+rjWM976jrx69cJo/+j8v3P6xK/Ysy1PGisko4WP84kxTe5HVL8JhVeCaAtX/VXPwMpS3koDmR0YHKT8=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ChHHwlDE; arc=none smtp.client-ip=209.85.215.202
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ChHHwlDE"
Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c61dee98720so3756085a12.0
        for <linux-kernel@vger.kernel.org>; Tue, 03 Feb 2026 18:16:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1770171375; x=1770776175; darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=7Mow+gtTd3xBrxz7PLLvBC7PjYT0zAewqKSEgx5mE5o=;
        b=ChHHwlDEldl+UDoXqdEHz5FyY9JUa6GafFosM39ISyfXmwT59fEqVi975duGyGPl9x
         KLKzvz3O7WXMuaNK9dDRgYH2VyuqfjtiQ5PHEsxbIxpIFVTQCiH8Haj4PolLM4cRzVUt
         37vhik7dfqhOSy80yZuxiKDI3XsxgAtkbjVMRebu6jM6kUao8hEFBHCghNoCxTEsbaqs
         LQ1rawcACZzhaCdLgFl5E9xGeoweg1XVPzjlnyPF/MoVdtdb3tz+9D8dq2qq2b/dF3IM
         Yq7zj3eoLFH/ulCDYkwFWtvsTPXqTPPxbHYBw/UyWy7gh1I+CBp4cK94MxqyadnuB6j0
         DnaQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1770171375; x=1770776175;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=7Mow+gtTd3xBrxz7PLLvBC7PjYT0zAewqKSEgx5mE5o=;
        b=MHUJTygNQnIQKudvnghS1v9gy1GVsOz/RQ1Qh9b+yZtce6CWfLJX9T4y8TmJm4d3lH
         XDKA0xQGNrVofEvm5jdZeINXWfU5CSVhERHZqJy8rzYAqiSg4jvVeRCD+473sZLxLTKJ
         yQTovl1H1rd8Z29Uf7IPlPFe/5jByCEyIcUaLqKTtyVtoDjUQEAzjjYU8WrYamsSK3jw
         0V2Tc4bB54v/iqzqJrA3OYxQeZ0/5Wnz7/5vnuY7Zyq+ZWq9eXFOPLJwZkynH0n8tLhX
         qjWwFRhy5qVKBibOlaoVtcyaT85QqK98NHnGgw3Osvztp0EVo4gOChCFPYhOxACxw5Zb
         NGWQ==
X-Forwarded-Encrypted: i=1; AJvYcCV+3OjVF9yigkQuTA8lobLTntwu7BKQEOL14HgVw9vqddJg1/BUPSnMir57D14tOL4M3omLiy+SOmD/SXk=@vger.kernel.org
X-Gm-Message-State: AOJu0Ywklf8TiTDhFunxoE7TtBqZUsqTZBkLuYOflgMl2cspxwlHUTEC
	6j2FvUGewOs3jivZr38U0406HU1QJkriX7F6mMuMkUnHHjqh3vSWnlwe5gmC+vp6e6jbEaLwOnB
	tcNXFqg==
X-Received: from pgan187.prod.google.com ([2002:a63:40c4:0:b0:c51:8b09:2a32])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6300:6682:b0:38d:e87c:48c2
 with SMTP id adf61e73a8af0-39372498707mr1358827637.58.1770171374750; Tue, 03
 Feb 2026 18:16:14 -0800 (PST)
Date: Tue, 3 Feb 2026 18:16:13 -0800
In-Reply-To: <a2bf6a8d9f9b61ae7264afc37d9925cf2e1f3ea9.camel@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260129011517.3545883-1-seanjc@google.com> <20260129011517.3545883-20-seanjc@google.com>
 <de05853257e9cc66998101943f78a4b7e6e3d741.camel@intel.com>
 <aYJWvKagesT3FPfI@google.com> <a2bf6a8d9f9b61ae7264afc37d9925cf2e1f3ea9.camel@intel.com>
Message-ID: <aYKr7XODY-p6YLYa@google.com>
Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to
 provide a custom page allocator
From: Sean Christopherson <seanjc@google.com>
To: Kai Huang <kai.huang@intel.com>
Cc: "kvm@vger.kernel.org" <kvm@vger.kernel.org>, 
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>, Xiaoyao Li <xiaoyao.li@intel.com>, 
	Yan Y Zhao <yan.y.zhao@intel.com>, 
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>, "kas@kernel.org" <kas@kernel.org>, 
	"mingo@redhat.com" <mingo@redhat.com>, "binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>, 
	"pbonzini@redhat.com" <pbonzini@redhat.com>, "ackerleytng@google.com" <ackerleytng@google.com>, 
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, Isaku Yamahata <isaku.yamahata@intel.com>, 
	"sagis@google.com" <sagis@google.com>, "tglx@kernel.org" <tglx@kernel.org>, 
	Rick P Edgecombe <rick.p.edgecombe@intel.com>, "bp@alien8.de" <bp@alien8.de>, 
	Vishal Annapurve <vannapurve@google.com>, "x86@kernel.org" <x86@kernel.org>
Content-Type: text/plain; charset="us-ascii"

On Tue, Feb 03, 2026, Kai Huang wrote:
> On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote:
> > On Tue, Feb 03, 2026, Kai Huang wrote:
> > > On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> > > > Extend "struct kvm_mmu_memory_cache" to support a custom page allocator
> > > > so that x86's TDX can update per-page metadata on allocation and free().
> > > > 
> > > > Name the allocator page_get() to align with __get_free_page(), e.g. to
> > > > communicate that it returns an "unsigned long", not a "struct page", and
> > > > to avoid collisions with macros, e.g. with alloc_page.
> > > > 
> > > > Suggested-by: Kai Huang <kai.huang@intel.com>
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > 
> > > I thought it could be more generic for allocating an object, but not just a
> > > page.
> > > 
> > > E.g., I thought we might be able to use it to allocate a structure which has
> > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page'.  But
> > > it seems you abandoned this idea.  May I ask why?  Just want to understand
> > > the reasoning here.
> > 
> > Because that requires more complexity and there's no known use case, and I don't
> > see an obvious way for a use case to come along.  All of the motiviations for a
> > custom allocation scheme that I can think of apply only to full pages, or fit
> > nicely in a kmem_cache.
> > 
> > Specifically, the "cache" logic is already bifurcated between "kmem_cache' and
> > "page" usage.  Further splitting the "page" case doesn't require modifications to
> > the "kmem_cache" case, whereas providing a fully generic solution would require
> > additional changes, e.g. to handle this code:
> > 
> > 	page = (void *)__get_free_page(gfp_flags);
> > 	if (page && mc->init_value)
> > 		memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64));
> > 
> > It certainly wouldn't be much complexity, but this code is already a bit awkward,
> > so I don't think it makes sense to add support for something that will probably
> > never be used.
> 
> For this particular piece of code, we can add a helper for allocating normal
> page table pages, get rid of mc->init_value completely and hook mc-page_get()
> to that helper.

Hmm, I like the idea, but I don't think it would be a net positive.  In practice,
x86's "normal" page tables stop being normal, because KVM now initializes all
SPTEs with BIT(63)=1 on x86-64.  And that would also incur an extra RETPOLINE on
all those allocations.

> A bonus is we can then call that helper in all places when KVM needs to
> allocate a page for normal page table instead of just calling
> get_zerod_pages() directly, e.g., like the one in
> tdp_mmu_alloc_sp_for_split(),

Huh.  Actually, that's a bug, but not the one you probably expect.  At a glance,
it looks like KVM incorrectly zeroing the page instead of initializing it with
SHADOW_NONPRESENT_VALUE.  But it's actually a "performance" bug, because KVM
doesn't actually need to pre-initialize the page: either the page will never be
used, or every SPTE will be initialized as a child SPTE.

So that one _should_ be different, e.g. should be:

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index a32192c35099..36afd67601fc 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1456,7 +1456,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm,
        if (!sp)
                return NULL;
 
-       sp->spt = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT);
+       sp->spt = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
        if (!sp->spt)
                goto err_spt;
 
> so that we can have a consistent way for allocating normal page table pages.