From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6250C34D4E9
	for <linux-kernel@vger.kernel.org>; Fri, 16 Jan 2026 19:59:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1768593561; cv=none; b=gNU+QC2RvskMcS91G/TfuLjsNYJrE6XFVkZu162Q5jr0DaNe0337pyi7TO1LqAzIpuD5K7iULGWLEkb57lg7YLD3yxdc/tUSlqldxdUtZ8AFf16f41RuGFlbaV9IklW7jeQGdBmRZA5tAkqFqxFzdx+PI2jQhm2dZsfKyRmls+A=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1768593561; c=relaxed/simple;
	bh=sm9bqjRyyBx1tfK/mmWy/rukH0wRuwoGD1nMV2WjmCE=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type; b=q/dlbm/U5IA6JVGDN7akr6W8sbUWiIzGyvOka3F2/pVyKaaAkk4S1DhSiWhyAYXNKs0BD/jsO8Ak1TfB/MLSHGnc71K+0zFO3NFEG19N29Uo0WKkxBh4i1YFDFu4oJsxpNLakSpjbzHQ8cx/3/rmgPt7IHWK3b82VQme5X9WcDw=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wMnVInOM; arc=none smtp.client-ip=209.85.215.202
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wMnVInOM"
Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c54e81eeab9so1567004a12.3
        for <linux-kernel@vger.kernel.org>; Fri, 16 Jan 2026 11:59:20 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1768593560; x=1769198360; darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=CmVsGRS74zmdjEMM07LPVm6eWi6uP63Jcf4CCC+8B3s=;
        b=wMnVInOMzuKu66V/7Gl6NWjMzI7mV9TFmmb/BVtCDr2GDa2InJTlGxGlVPbhIj96SZ
         Tl5/7Z0ksW7h7gjGyXDH0umToCi4DYy1wl5kztv1fZCzhGO35Z67gZiCGbuh+Jd3q2qO
         l7RvFy448uW7qkVqjkz0aBV9DnJ84Zt+W2z8y16JnEBQlkZlTnHK7rlZ/HlHO8MsJsos
         AawRFpy3WephlnIND/kO6aD4ue0aLp5/a419YZosCV24cQMPts9f3r6DHpp2KQ7L06HN
         vLbSwya+BEY84RVwxslx4zehAejxR3PXOLrUHKIaW5odX40sFU+KqnzAog6GT4JUw85v
         Ienw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1768593560; x=1769198360;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=CmVsGRS74zmdjEMM07LPVm6eWi6uP63Jcf4CCC+8B3s=;
        b=omdYwz2TzkW/ngccfB2HkzTMl/XO7Hsk/DbESrqO0dpPDKVNRjaRSBe1bIfVWRBGio
         jdV+QunnSUyr2jzCXghHj3Kh+NhH0DKfnBZCvK+YsV4uQDO3fZ0nSMHZ3/7jwkHIeYxH
         8xYko0B2RgJeZms2BZM9ci4I2QBgz/0pS8wSwaHGpHl1krfx5TsVGwK98BeXNqAEgtkk
         hm0EVXmbXGKhFy2ANXF4XEgBTTffDqlt11LZ1JV0TdfFqBhLD0t1ZNd2VAVdTp/4FBlI
         BxhntIYTcVcbYKQptOco3wX86Hbo6p86tB5ywgNvU4PPde+bdQ1hb4v11PNpTjzKAF48
         ZOLw==
X-Forwarded-Encrypted: i=1; AJvYcCUt+0ZXrQwIAFkMIKBfAkDOITwwR+c+p98iQZto5mFKlmRDyvjLGmwwfexPqPCsKaGu4Mrsexqyqc50E2E=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy7+oRIqeZKMVGctNQTw1uCnc+YVUMv7ZrJIFBjGNCtC3aJjQub
	5shKvzK3P+k4Q/yE9LbwEZjI8of06dptECxOxdlUWl3LVvfi5H9WaxYEV2DLLi903UIWjWTiQOY
	1SIZWzA==
X-Received: from plko12.prod.google.com ([2002:a17:902:6b0c:b0:29e:fb92:99f6])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:2c0c:b0:2a0:9970:13fd
 with SMTP id d9443c01a7336-2a7176cda67mr43766535ad.43.1768593559703; Fri, 16
 Jan 2026 11:59:19 -0800 (PST)
Date: Fri, 16 Jan 2026 11:59:17 -0800
In-Reply-To: <1b236a64-d511-49a2-9962-55f4b1eb08e3@intel.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <CAEvNRgHSm0k2hthxLPg8oXO_Y9juA9cxOBp2YdFFYOnDkxpv5g@mail.gmail.com>
 <aWbkcRshLiL4NWZg@yzhao56-desk.sh.intel.com> <aWbwVG8aZupbHBh4@google.com>
 <aWdgfXNdBuzpVE2Z@yzhao56-desk.sh.intel.com> <aWe1tKpFw-As6VKg@google.com>
 <f4240495-120b-4124-b91a-b365e45bf50a@intel.com> <aWgyhmTJphGQqO0Y@google.com>
 <435b8d81-b4de-4933-b0ae-357dea311488@intel.com> <aWpyA0_r_yVewnfx@google.com>
 <1b236a64-d511-49a2-9962-55f4b1eb08e3@intel.com>
Message-ID: <aWqYlXg4CS6vxS-o@google.com>
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory
From: Sean Christopherson <seanjc@google.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Yan Zhao <yan.y.zhao@intel.com>, Ackerley Tng <ackerleytng@google.com>, 
	Vishal Annapurve <vannapurve@google.com>, pbonzini@redhat.com, linux-kernel@vger.kernel.org, 
	kvm@vger.kernel.org, x86@kernel.org, rick.p.edgecombe@intel.com, 
	kas@kernel.org, tabba@google.com, michael.roth@amd.com, david@kernel.org, 
	sagis@google.com, vbabka@suse.cz, thomas.lendacky@amd.com, 
	nik.borisov@suse.com, pgonda@google.com, fan.du@intel.com, jun.miao@intel.com, 
	francescolavra.fl@gmail.com, jgross@suse.com, ira.weiny@intel.com, 
	isaku.yamahata@intel.com, xiaoyao.li@intel.com, kai.huang@intel.com, 
	binbin.wu@linux.intel.com, chao.p.peng@intel.com, chao.gao@intel.com
Content-Type: text/plain; charset="us-ascii"

On Fri, Jan 16, 2026, Dave Hansen wrote:
> On 1/16/26 09:14, Sean Christopherson wrote:
> > If you want to assert that the pfn is compatible with TDX, then by
> > all means.  But I am NOT accepting any more KVM code that assumes
> > TDX memory is backed by refcounted struct page.  If I had been
> > paying more attention when the initial TDX series landed, I would
> > have NAK'd that too.
> I'm kinda surprised by that. The only memory we support handing into TDs
> for private memory is refcounted struct page. I can imagine us being
> able to do this with DAX pages in the near future, but those have
> 'struct page' too, and I think they're refcounted pretty normally now as
> well.
> 
> The TDX module initialization is pretty tied to NUMA nodes, too. If it's
> in a NUMA node, the TDX module is told about it and it also universally
> gets a 'struct page'.
> 
> Is there some kind of memory that I'm missing? What else *is* there? :)

I don't want to special case TDX on the backend of KVM's MMU.  There's already
waaaay too much code and complexity in KVM that exists purely for S-EPT.  Baking
in assumptions on how _exactly_ KVM is managing guest memory goes too far.

The reason I'm so hostile towards struct page is that, as evidenced by this series
and a ton of historical KVM code, assuming that memory is backed by struct page is
a _very_ slippery slope towards code that is extremely nasty to unwind later on.

E.g. see all of the effort that ended up going into commit ce7b5695397b ("KVM: TDX:
Drop superfluous page pinning in S-EPT management").  And in this series, the
constraints that will be placed on guest_memfd if TDX assumes hugepages will always
be covered in a single folio.  Untangling KVM's historical (non-TDX) messes around
struct page took us something like two years.

And so to avoid introducing similar messes in the future, I don't want KVM's MMU
to make _any_ references to struct page when it comes to mapping memory into the
guest unless it's absolutely necessary, e.g. to put a reference when KVM _knows_
it acquired a refcounted page via gup() (and ideally we'd kill even that, e.g.
by telling gup() not to bump the refcount in the first place).

> > tdh_mem_page_aug() is just an absurdly slow way of writing a PTE.  It doesn't
> > _need_ the pfn to be backed a struct page, at all.  IMO, what you're asking for
> > is akin to adding a pile of unnecessary assumptions to e.g. __set_spte() and
> > __kvm_tdp_mmu_write_spte().  No thanks.
> 
> Which part is absurdly slow?

The SEAMCALL itself.  I'm saying that TDH_MEM_PAGE_AUG is really just the S-EPT
version of "make this PTE PRESENT", and that piling on sanity checks that aren't
fundamental to TDX shouldn't be done when KVM is writing PTEs.

In other words, something like this is totally fine:

	KVM_MMU_WARN_ON(!tdx_is_convertible_pfn(pfn));

but this is not:

	WARN_ON_ONCE(!page_mapping(pfn_to_page(pfn)));