From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 711B027A123
	for <linux-kernel@vger.kernel.org>; Tue,  3 Feb 2026 21:17:27 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1770153448; cv=none; b=g8cGjDdzutgsMikQuLnDrgY4RMr/NpDPsVKZiE4AG2RFrD3tB7Vy3T3U8tF+qwkY6QMbe1Y5R5z1ivUx2C2yDBZk1K2Bv3OPkNHjRD8Aynbh76Uremjw9JPJzyiXa5euvgGUtLXGwZvzN3zQFWPBr37i9+fCTSUDK8rlbHtmbJQ=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1770153448; c=relaxed/simple;
	bh=q7TBmmYy5zKha9UWOs8VX5s4nynivNQu/jxFVFMQUSw=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=JGTWFPbXKs/GaVWwZFUhh8++AilKy+bOpWFCnV/sSMdEkjUrpLfGduB7HcmVaQrCxCCf8rGfBF/b88FJ9Ir/jCoZktZrsQg4IK1txQMXFkmrq/K4BFsoVIXytkHPzNIJcVKW4IWzHx3EEKHuPAgf72oo9iqH63nekjmFcwdMObI=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=oot8xxCJ; arc=none smtp.client-ip=209.85.214.202
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oot8xxCJ"
Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a8fc061ce1so30883265ad.0
        for <linux-kernel@vger.kernel.org>; Tue, 03 Feb 2026 13:17:27 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1770153447; x=1770758247; darn=vger.kernel.org;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id
         :reply-to;
        bh=D0/+kZSHM7M0m1jLitFGSeKA06Tacic7vjEUioHDGHc=;
        b=oot8xxCJ5H/UYfFnBsuOI9/d6jPgxb5rLuBe+pEi2KfIvirZAXQ7fMbJTUrBH+LsNH
         GI5gasAFU31j5AtGnWS+MkbW+dpqRGjvMgCwAlWJGuVyxmGavxuU/fo4pF99OFrJ/r3o
         /i6Kc/Jpa2H/f2QUtTFnuH5pzSMceu5MliKMw3mWQ7lo6OZPbS3A+OYRPaV+6bZbs/eW
         EXzbU/ZgszvsXzOLyojnoVUgp+gBMFLJdP8sEMAMt73MMfSInycd+7xs+hXUkIkGlUXg
         c4/3abUxgyhPJH20ZxbLhaOZKmZpNt47CMZ8d64QBn/rFqtdlWS8kMpP5ffK7eTWECFm
         nW3A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1770153447; x=1770758247;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject
         :date:message-id:reply-to;
        bh=D0/+kZSHM7M0m1jLitFGSeKA06Tacic7vjEUioHDGHc=;
        b=uIN/09ReIrAtRaV4zqEarwEW3YfSzvEBjRKAUSAWDYFhl2t8qB0/Eg/8yNL3TYHly9
         t1PhM/vjfByHAGpwBNyvSj10BDmBnVymCZAj8hbI4r1bl3Fu5hTedNkXFJr/hK0xy2YG
         w1bnYcKyuCLypuUnkrrjP+O8Xauv95IzzZAV4YIcyzQ1lQ50Z+hgmav9m1pZn0MecDOA
         /ibX5/BFz7hB1azIDsZPP/JxQDcwMKpnHw+acdxP5IPMihoLpBmh451CbzLyjKjE7eiP
         Y3k87FTO8n9xh9saBdMh2F8sgTuieC+s1kDNFKViSSE6UKzl0eVmw7aIJRDh/moYB5JI
         mbNw==
X-Forwarded-Encrypted: i=1; AJvYcCWuzJ0s/UYs2pI66ALod0Nknd8GDTSY5bhiRvOwWxZB9tgsv3xf9T6+QxdrQS8DV0HaKWRldhii2/zy8c8=@vger.kernel.org
X-Gm-Message-State: AOJu0Yw0u5lwgpHoeTmKi/Oq0RjAisA0PgGnpgYeKIyw2xwmS5Ohy/Ht
	Q8kYqAKtSIf++EuIt3w7LnEUHVbUnuKyWOlU8iQlkQVSzwB+fI236X5+a1o2IJqX1mwTCn6HlPm
	z/JTaCQ==
X-Received: from plrq7.prod.google.com ([2002:a17:902:b107:b0:2a0:c92e:a37a])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:8cd:b0:2a7:683c:afc6
 with SMTP id d9443c01a7336-2a933fbc24emr6884695ad.39.1770153446865; Tue, 03
 Feb 2026 13:17:26 -0800 (PST)
Date: Tue, 3 Feb 2026 13:17:25 -0800
In-Reply-To: <c01b2f81e025dd38be90d3820260c488c7eb22ce.camel@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260129011517.3545883-1-seanjc@google.com> <20260129011517.3545883-20-seanjc@google.com>
 <de05853257e9cc66998101943f78a4b7e6e3d741.camel@intel.com>
 <aYJWvKagesT3FPfI@google.com> <c01b2f81e025dd38be90d3820260c488c7eb22ce.camel@intel.com>
Message-ID: <aYJl5XoQw5In9DOr@google.com>
Subject: Re: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to
 provide a custom page allocator
From: Sean Christopherson <seanjc@google.com>
To: Rick P Edgecombe <rick.p.edgecombe@intel.com>
Cc: Kai Huang <kai.huang@intel.com>, "kvm@vger.kernel.org" <kvm@vger.kernel.org>, 
	"linux-coco@lists.linux.dev" <linux-coco@lists.linux.dev>, Xiaoyao Li <xiaoyao.li@intel.com>, 
	Yan Y Zhao <yan.y.zhao@intel.com>, 
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>, "kas@kernel.org" <kas@kernel.org>, 
	"mingo@redhat.com" <mingo@redhat.com>, "binbin.wu@linux.intel.com" <binbin.wu@linux.intel.com>, 
	"pbonzini@redhat.com" <pbonzini@redhat.com>, Isaku Yamahata <isaku.yamahata@intel.com>, 
	"ackerleytng@google.com" <ackerleytng@google.com>, 
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "sagis@google.com" <sagis@google.com>, 
	"tglx@kernel.org" <tglx@kernel.org>, "bp@alien8.de" <bp@alien8.de>, Vishal Annapurve <vannapurve@google.com>, 
	"x86@kernel.org" <x86@kernel.org>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 03, 2026, Rick P Edgecombe wrote:
> On Tue, 2026-02-03 at 12:12 -0800, Sean Christopherson wrote:
> > > E.g., I thought we might be able to use it to allocate a structure wh=
ich has
> > > "pair of DPAMT pages" so it could be assigned to 'struct kvm_mmu_page=
'.=C2=A0 But
> > > it seems you abandoned this idea.=C2=A0 May I ask why?=C2=A0 Just wan=
t to understand
> > > the reasoning here.
> >=20
> > Because that requires more complexity and there's no known use case, an=
d I
> > don't see an obvious way for a use case to come along.=C2=A0 All of the
> > motiviations for a custom allocation scheme that I can think of apply o=
nly to
> > full pages, or fit nicely in a kmem_cache.
> >=20
> > Specifically, the "cache" logic is already bifurcated between "kmem_cac=
he' and
> > "page" usage.=C2=A0 Further splitting the "page" case doesn't require m=
odifications
> > to the "kmem_cache" case, whereas providing a fully generic solution wo=
uld
> > require additional changes, e.g. to handle this code:
> >=20
> > 	page =3D (void *)__get_free_page(gfp_flags);
> > 	if (page && mc->init_value)
> > 		memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64));
> >=20
> > It certainly wouldn't be much complexity, but this code is already a bi=
t
> > awkward, so I don't think it makes sense to add support for something t=
hat
> > will probably never be used.
>=20
> The thing that the design needlessly works around is that we can rely on =
that
> there are only two DPAMT pages per 2MB range. We don't need the dynamic p=
age
> count allocations.
>=20
> This means we don't need to pass around the list of pages that lets arch/=
x86
> take as many pages as it needs. We can maybe just pass in a struct like K=
ai was
> suggesting to the get/put helpers. So I was in the process of trying to m=
orph
> this series in that direction to get rid of the complexity resulting from=
 the
> dynamic assumption.=20
>=20
> This was what I had done in response to v4 discussions, so now retrofitti=
ng it
> into this new ops scheme. Care to warn me off of this before I have somet=
hing to
> show?

That's largely orthogonal to this change.  This change is about preparing t=
he
DPAMT when S-EPT page is allocated versus being installed.  The fact that D=
PAMT
requires at most two pages versus a more dynamic maximum is irrelevant.

The caches aren't about dynamic sizes (though they play nicely with them), =
they're
about:

  (a) not having to deal with allocating under spinlock
  (b) not having to free memory that goes unused (for a single page fault)
  (c) batching allocations for performance reasons (with the caveat that I =
doubt
      anyone has measured the performance impact in many, many years).

None of those talking points change at all if KVM needs to provide 2 pages =
versus
N pages.  The max number of pages needed for page tables is pretty much the=
 same
thing as DPAMT, just with a higher max (4/5 vs. 2).  In both cases, the all=
ocated
pages may or may not be consumed for any given fault.

For the leaf pages (including the hugepage splitting cases), which don't ut=
ilize
KVM's kvm_mmu_memory_cache, I wouldn't expect the KVM details to change all=
 that
much.  In fact, they shouldn't change at all, because tracking 2 pages vers=
us N
pages in "struct tdx_pamt_cache" is a detail that is 100% buried in the TDX=
 subsystem
(which was pretty much the entire goal of my design).

Though maybe I'm misunderstanding what you have in mind?